Main Page
From Sphinx
This is a wiki for information regarding the sphinx speech recognition system, with a focus on sphinx3.
Contents |
Getting Started Tutorials
- Sphinx3 Speech Recognition Quick Start - end to end tutorial to start recognize spoken audio in audiofiles into text. Start here.
- Getting Started with SimpleRec - end to end tutorial to recognize live speech from the microphone. Based on Sphinx-3 live cross-platform speech recognition example - software example by Keith Vertanen.
- Sphinx3 Python Quick Start - Use sphinx3 from Python
.
Record your voice on VoxForge
Here -- you just read 10 sentences into the microphone, and it sends the data to a database. this way we can construct good voice models. currently this is the main thing that stops Sphinx from being a useful continuous Dictation recogniser -- there are no good available models!
Do it today! Set it to your homepage! Tell your friends! :)
.
Building
Sphinx3 v 0.6.3
- download
- extract
- autogen.sh
- ./configure
- make install
Sphinx3 v. 0.7 trunk
create base dir
cd /usr/src mkdir sphinx
sphinxbase package
cd /usr/src/sphinx svn co https://cmusphinx.svn.sourceforge.net/svnroot/cmusphinx/trunk/sphinxbase cd sphinxbase ./autogen.sh --prefix=/usr make make check make install
Sphinx3 decoder
cd /usr/src/sphinx svn co https://cmusphinx.svn.sourceforge.net/svnroot/cmusphinx/trunk/sphinx3 cd sphinx3 ./autogen.sh --prefix=/usr make make check make install
Troubleshooting
if you get the following error during sphinx3 - "make check":
Can't open perl script "/test/compare_table.pl": No such file or directory
you can copy the script from sphinxbase with this command:
# mkdir /test # cp ../sphinxbase/test/compare_table.pl /test/
PocketSphinx
Sphinx Base
See instructions above
Pocket Sphinx
cd /usr/src/sphinx svn co https://cmusphinx.svn.sourceforge.net/svnroot/cmusphinx/trunk/pocketsphinx cd pocketsphinx ./autogen.sh --prefix=/usr make make check make install
Also see http://www.speech.cs.cmu.edu/cmusphinx/moinmoin/PocketSphinxMigration
.
Documentation
Architecture
The Heiroglyphs - a book about Sphinx from The Grand Janitor. Emphasis on sphinx3.
API
Sphinx3 Doxygen - latest trunk version (3.7)
Decoding Instructions
- Sphinx3 Speech Recognition Quick Start - start here to get up and running ASAP
- Running Sphinx3 Decoder - detailed guide, explains command line options
Training Instructions
- Training Acoustic Models (Robust Group Tutorial)
- CMU Sphinx Wall Street Journal (WSJ) Training Recipe
.
Software
Decoding
Training
.
Speech Data
Acoustic Models
- hub4 ships with sphinx3 and is installed to /usr/local/share/sphinx3/model/hmm/hub4_cd_continuous_8gau_1s_c_d_dd. Follow this link for more information about hub4, and it is available for download. There is also a paper on hub4 available.
- Several models, including hub4, are available for download here
- Sphinx4 ships with a few Wall Street Journal acoustic models, one 8khz version and a 16khz version. See /bld/models/edu/cmu/sphinx/model/acoustic/WSJ_8gau_13dCep_8kHz_31mel_200Hz_3500Hz subdirectory of sphinx4 checkout.
Language Models
Finite State Grammars
There are no download links, because each application needs to develop its own.
For example in an IVR, the grammar changes dynamically based on what the possible answers are for the current question being asked of the user.
The format of the grammar depends on the version of sphinx.
NOTE: Either a finite state grammar or a language model is needed, but not both.
Dictionaries
- English Gigaword - includes both a language model and dictionary
Audio Databases
These are needed for building acoustic models.
.
FAQ
- Using sphinx3, how do I specify a finite state grammar instead of a language model?
- Answer: check the -fsg|-fsgctlfile arguments, "fsg" stands for finite state grammar. Here is the sphinx2 fsg grammar format, which also works for sphinx3.
- Do I need sphinxbase for sphinx 3.6.3?
- Answer: NO. If you are using 3.7, then you will need sphinxbase.
- Is sphinx3 multi-thread safe?
- Answer: No. There can only be one Sphinx3 thread per process. The next version will most likely be re-entrant, but this requires some changes to the API.
- What is the set of allowable phonemes in dictionaries?
- Answer: Must match up with the set of phonemes in the acoustic models. You can determine this by looking at the 'mdef' file in the acoustic model directory.
- What is a senone?
- Answer: see this page and search for "senone" (cite: "Senones can be viewed as independent model components, or model building blocks...") More technically, a senone is a Gaussian Mixture Model which represents an atomic acoustic unit (a sub-phone, or an individual speech sound). It corresponds to the output density function of a single state in a Hidden Markov Model.
.
Glossary of Terms
- Senone
- An equivalence class which models a subphonetic event usually one state in a HMM for a phoneme. (Different phone models can share the same senone if they exhibit acoustic similarity). More Info
.
External Links
How Speech Recognition Works - Overview
Sphinx Powerpoint Presentations
Forums / Mailing Lists (At SourceForge)
VoxForge - Database of Training Data
How Speech Recognition Works - Overview
IRC Channel
#cmusphinx on irc.freenode.net
(maintained by sunfish7@gmail.com)
