Hello World Decoder QuickStart Guide
From Sphinx
With this tutorial you will recognize audio files with spoken audio into text!
Contents |
Make a Base: /usr/src/sphinx/
su - cd /usr/src mkdir sphinx chmod a=rwx sphinx SPHINX_ROOT=/usr/src/sphinx echo $SPHINX_ROOT
Now you can use $SPHINX_ROOT instead of /usr/src/sphinx/ (if you're in a script, you'll need SPHINX_ROOT=/usr/src/sphinx)
From now on, make sure you put everything into $SPHINX_ROOT (unless you really like hunting thru config files changing folders)
You will need to have certain tools installed for this guide to work, on Ubuntu this command takes care of that need:
sudo apt-get install build-essential autoconf libtool automake libasound2-dev python-dev subversion sox libsox-fmt-all
Download/build sphinx3
Get sphinx3 version 0.7 (and sphinxbase), as described on Main Page#Sphinx3 v. 0.7 trunk
Version 0.7 works for all tutorials.
You should now have:
$SPHINX_ROOT/sphinx3 $SPHINX_ROOT/sphinxbase
If you get problems, use 0.6.3
Record and convert audio
Note: to set your mic to capture, you may need to use alsamixer/amixer. This worked for me:
$ amixer set "Capture" cap $ amixer set "Capture" 0
Use audacity to record three wav files in which you speak the following words (each wave has one word):
- yellow
- please
- hello
Name the files like yellowaudio.wav, pleaseaudio.wav, etc.
Now convert them to desired raw 16kHz 1-channel format by
$ sox yellowaudio.wav -r 16000 -c 1 -s -w yellowaudio16k.raw
Create control file
Create a file named ctlfile with following three lines
helloaudio pleaseaudio yellowaudio
Download language model/dictionary
$ wget http://www.inference.phy.cam.ac.uk/kv227/lm_giga/lm_giga_5k_nvp_3gram.zip $ unzip lm_giga_5k_nvp_3gram.zip
language model is blah.arpa (textfile - %prob of each word & pairs/triples).
Create binary dump file
As described here, sphinx expects the language model in the form of a "Binary Dump File".
- Download and install the lm3g2dmp utility:
$ cd /usr/src/sphinx $ svn co https://cmusphinx.svn.sourceforge.net/svnroot/cmusphinx/trunk/share/lm3g2dmp $ cd lm3g2dmp $ make
- Convert to .DMP format which Sphinx will recognise
$ cd $SPHINX_ROOT/lm3g2dmp $ lm3g2dmp $SPHINX_ROOT/lm_giga_5k_nvp_3gram/lm_giga_5k_nvp_3gram.arpa $SPHINX_ROOT/lm_giga_5k_nvp_3gram
After finishes, there will be a new file in the directory: lm_giga_5k_nvp_3gram.arpa.DMP
Create config file
Create a file named cfgfile with
-samprate 16000 -hmm $SPHINX_ROOT/sphinx3/model/hmm/hub4_cd_continuous_8gau_1s_c_d_dd -dict $SPHINX_ROOT/lm_giga_5k_nvp_3gram/lm_giga_5k_nvp.sphinx.dic -fdict $SPHINX_ROOT/lm_giga_5k_nvp_3gram/lm_giga_5k_nvp.sphinx.filler -lm $SPHINX_ROOT/lm_giga_5k_nvp_3gram/lm_giga_5k_nvp_3gram.arpa.DMP
NOTE: the hub4_cd_continuous_8gau_1s_c_d_dd directory is installed along with the sphinx3 decoder in earlier step
Run
Run the following command
$ sphinx3_livepretend ctlfile . cfgfile
NOTE: the "." tells it to look for audio files in current directory, so make sure the .raw audio files are there
Output
There will be lots of output, but the important bits will look something like this
FWDVIT: (helloaudio) FWDVIT: leave (pleaseaudio) FWDVIT: yellow (yellowaudio)
- For the file helloaudio.raw, the engine did not recognize anything. This is actually to be expected, because if you open up lm_giga_5k.sphinx.dic, you wont find hello!
- Mis-recognized "please" as "leave" -- your mileage may vary
- Correctly recognized yellow!
A ScriptFile to do everything
#!/bin/bash # Script by Sam SunFish7@Gmail.com SPHINX_ROOT=/usr/src/sphinx # script assumes we're in $SPHINX_ROOT/tut/ (tut could be anything) # Put your source .wav files in ./wav/wavin/*.wav and this script will decode them # You must have: # ../sphinxbase/ # ../sphinx3/ # ../lm3g2dmp/ # ../lm_giga_5k_nvp_3gram/ echo "----------------Processing source .wav files:----------------" cd ./wav/ ls ./wavin/ &> /dev/stdout | tee wavfiles.txt #send this to screen too ls ./wavin/ | sed 's/\.wav$//g' > wavfiles_noext.txt mkdir tmp mkdir raw for thiswav in $( < wavfiles_noext.txt ) do sox ./wavin/$thiswav.wav -r 16000 -c 1 -s -w ./tmp/$thiswav.raw done cd .. echo "------------------Making dump file:-------------------" # creates ./dmp/lm_giga_5k_nvp_3gram.arpa.DMP mkdir dmp ../lm3g2dmp/lm3g2dmp \ ../lm_giga_5k_nvp_3gram/lm_giga_5k_nvp_3gram.arpa \ ./dmp/ echo "-------------------Decoding:------------------" SRATE="-samprate 16000" ALGO="-hmm ../sphinx3/model/hmm/hub4_cd_continuous_8gau_1s_c_d_dd" DICT="-dict ../lm_giga_5k_nvp_3gram/lm_giga_5k_nvp.sphinx.dic" FILLER="-fdict ../lm_giga_5k_nvp_3gram/lm_giga_5k_nvp.sphinx.filler" DMP="-lm ./dmp/lm_giga_5k_nvp_3gram.arpa.DMP" echo "$SRATE $ALGO $DICT $FILLER $DMP" > _CFG ../sphinx3/src/programs/sphinx3_livepretend \ ./wav/wavfiles_noext.txt \ ./wav/raw/ \ _CFG \ &> /dev/stdout | tee dump.txt echo "-----------------Recognised words were:--------------" grep FWDVIT dump.txt exit 1
