Main Page

From Sphinx

This is a wiki for information regarding the sphinx speech recognition system, with a focus on sphinx3.


Contents

Getting Started Tutorials


.

Record your voice on VoxForge

Here -- you just read 10 sentences into the microphone, and it sends the data to a database. this way we can construct good voice models. currently this is the main thing that stops Sphinx from being a useful continuous Dictation recogniser -- there are no good available models!

Do it today! Set it to your homepage! Tell your friends! :)


.

Building

Sphinx3 v 0.6.3

  • download
  • extract
  • autogen.sh
  • ./configure
  • make install

Sphinx3 v. 0.7 trunk

create base dir

cd /usr/src
mkdir sphinx

sphinxbase package

cd /usr/src/sphinx
svn co https://cmusphinx.svn.sourceforge.net/svnroot/cmusphinx/trunk/sphinxbase
cd sphinxbase
./autogen.sh --prefix=/usr
 make
 make check
 make install

Sphinx3 decoder

cd /usr/src/sphinx
svn co https://cmusphinx.svn.sourceforge.net/svnroot/cmusphinx/trunk/sphinx3
cd sphinx3
./autogen.sh --prefix=/usr
make
make check
make install

Troubleshooting

if you get the following error during sphinx3 - "make check":

Can't open perl script "/test/compare_table.pl": No such file or directory

you can copy the script from sphinxbase with this command:

# mkdir /test
# cp ../sphinxbase/test/compare_table.pl /test/


PocketSphinx

Sphinx Base

See instructions above

Pocket Sphinx

cd /usr/src/sphinx
svn co https://cmusphinx.svn.sourceforge.net/svnroot/cmusphinx/trunk/pocketsphinx
cd sphinx3
./autogen.sh --prefix=/usr
make
make check
make install

Also see http://www.speech.cs.cmu.edu/cmusphinx/moinmoin/PocketSphinxMigration

.

Documentation

Architecture

The Heiroglyphs - a book about Sphinx from The Grand Janitor. Emphasis on sphinx3.

API

Sphinx3 Doxygen - latest trunk version (3.7)

Decoding Instructions

Training Instructions


.

Software

Decoding

Training


.

Speech Data

Acoustic Models

  • Several models, including hub4, are available for download here
  • Sphinx4 ships with a few Wall Street Journal acoustic models, one 8khz version and a 16khz version. See /bld/models/edu/cmu/sphinx/model/acoustic/WSJ_8gau_13dCep_8kHz_31mel_200Hz_3500Hz subdirectory of sphinx4 checkout.

Language Models

Finite State Grammars

There are no download links, because each application needs to develop its own.

For example in an IVR, the grammar changes dynamically based on what the possible answers are for the current question being asked of the user.

The format of the grammar depends on the version of sphinx.

NOTE: Either a finite state grammar or a language model is needed, but not both.

Examples

Dictionaries

Audio Databases

These are needed for building acoustic models.

CMU Audio Databases


.

FAQ

  • Using sphinx3, how do I specify a finite state grammar instead of a language model?
    • Answer: check the -fsg|-fsgctlfile arguments, "fsg" stands for finite state grammar. Here is the sphinx2 fsg grammar format, which also works for sphinx3.
  • Do I need sphinxbase for sphinx 3.6.3?
    • Answer: NO. If you are using 3.7, then you will need sphinxbase.
  • Is sphinx3 multi-thread safe?
    • Answer: No. There can only be one Sphinx3 thread per process. The next version will most likely be re-entrant, but this requires some changes to the API.
  • What is the set of allowable phonemes in dictionaries?
    • Answer: Must match up with the set of phonemes in the acoustic models. You can determine this by looking at the 'mdef' file in the acoustic model directory.
  • What is a senone?
    • Answer: see this page and search for "senone" (cite: "Senones can be viewed as independent model components, or model building blocks...") More technically, a senone is a Gaussian Mixture Model which represents an atomic acoustic unit (a sub-phone, or an individual speech sound). It corresponds to the output density function of a single state in a Hidden Markov Model.


.

Glossary of Terms

  • Senone
    • An equivalence class which models a subphonetic event usually one state in a HMM for a phoneme. (Different phone models can share the same senone if they exhibit acoustic similarity). More Info


.

External Links

Sphinx HomePage

Official Sphinx Wiki

How Speech Recognition Works - Overview

Sphinx Powerpoint Presentations

Arthur Chan's homepage

Forums / Mailing Lists (At SourceForge)

Sphinx3 Guide

VoxForge - Database of Training Data

How Speech Recognition Works - Overview

IRC Channel

#cmusphinx on irc.freenode.net

(maintained by sunfish7@gmail.com)

related