Hello World Decoder QuickStart Guide

From Sphinx

With this tutorial you will recognize audio files with spoken audio into text!

Contents

Make a Base: /usr/src/sphinx/

su -
cd /usr/src
mkdir sphinx
chmod a=rwx sphinx
SPHINX_ROOT=/usr/src/sphinx
echo $SPHINX_ROOT

Now you can use $SPHINX_ROOT instead of /usr/src/sphinx/ (if you're in a script, you'll need SPHINX_ROOT=/usr/src/sphinx)

From now on, make sure you put everything into $SPHINX_ROOT (unless you really like hunting thru config files changing folders)

You will need to have certain tools installed for this guide to work, on Ubuntu this command takes care of that need:

sudo apt-get install build-essential autoconf libtool automake libasound2-dev python-dev subversion sox libsox-fmt-all bison

Download/build sphinx3

Get sphinx3 version 0.7 (and sphinxbase), as described on Main Page#Sphinx3 v. 0.7 trunk

Version 0.7 works for all tutorials.

You should now have:

$SPHINX_ROOT/sphinx3
$SPHINX_ROOT/sphinxbase

If you get problems, use 0.6.3

Record and convert audio

Note: to set your mic to capture, you may need to use alsamixer/amixer. This worked for me:

$ amixer set "Capture" cap
$ amixer set "Capture" 0

Use audacity to record three wav files in which you speak the following words (each wave has one word):

  • yellow
  • please
  • hello

Name the files like yellowaudio.wav, pleaseaudio.wav, etc.

Now convert them to desired raw 16kHz 1-channel format by

$ sox yellowaudio.wav -r 16000 -c 1 -s -w yellowaudio.raw
$ sox pleaseaudio.wav -r 16000 -c 1 -s -w pleaseaudio.raw
$ sox helloaudio.wav -r 16000 -c 1 -s -w helloaudio.raw

Create control file

Create a file named ctlfile with following three lines

helloaudio
pleaseaudio
yellowaudio

Download language model/dictionary

$ wget http://www.inference.phy.cam.ac.uk/kv227/lm_giga/lm_giga_5k_nvp_3gram.zip	
$ unzip lm_giga_5k_nvp_3gram.zip

language model is blah.arpa (textfile - %prob of each word & pairs/triples).

Create binary dump file

As described here, sphinx expects the language model in the form of a "Binary Dump File".

  • Download and install the lm3g2dmp utility:
$ cd /usr/src/sphinx
$ svn co https://cmusphinx.svn.sourceforge.net/svnroot/cmusphinx/trunk/share/lm3g2dmp
$ cd lm3g2dmp
$ make
  • Convert to .DMP format which Sphinx will recognise
$ cd $SPHINX_ROOT/lm3g2dmp
$ ./lm3g2dmp $SPHINX_ROOT/lm_giga_5k_nvp_3gram/lm_giga_5k_nvp_3gram.arpa $SPHINX_ROOT/lm_giga_5k_nvp_3gram

After finishes, there will be a new file in the directory: lm_giga_5k_nvp_3gram.arpa.DMP

Create config file

Create a file named cfgfile with

-samprate 16000
-hmm $SPHINX_ROOT/sphinx3/model/hmm/hub4_cd_continuous_8gau_1s_c_d_dd
-dict $SPHINX_ROOT/lm_giga_5k_nvp_3gram/lm_giga_5k_nvp.sphinx.dic
-fdict $SPHINX_ROOT/lm_giga_5k_nvp_3gram/lm_giga_5k_nvp.sphinx.filler
-lm $SPHINX_ROOT/lm_giga_5k_nvp_3gram/lm_giga_5k_nvp_3gram.arpa.DMP

NOTE: the hub4_cd_continuous_8gau_1s_c_d_dd directory is installed along with the sphinx3 decoder in earlier step

What does this all mean?

Run

Run the following command

$ sphinx3_livepretend ctlfile . cfgfile

NOTE: the "." tells it to look for audio files in current directory, so make sure the .raw audio files are there

Output

There will be lots of output, but the important bits will look something like this

FWDVIT: (helloaudio)             
FWDVIT: leave (pleaseaudio)     
FWDVIT: yellow (yellowaudio)     
  • For the file helloaudio.raw, the engine did not recognize anything. This is actually to be expected, because if you open up lm_giga_5k.sphinx.dic, you wont find hello!
  • Mis-recognized "please" as "leave" -- your mileage may vary
  • Correctly recognized yellow!

A ScriptFile to do everything

#!/bin/bash

# Script by Sam SunFish7@Gmail.com

SPHINX_ROOT=/usr/src/sphinx

# script assumes we're in $SPHINX_ROOT/tut/   (tut could be anything)

# Put your source .wav files in ./wav/wavin/*.wav and this script will decode them
# You must have: 
#       ../sphinxbase/
#       ../sphinx3/
#       ../lm3g2dmp/
#       ../lm_giga_5k_nvp_3gram/

echo "----------------Processing source .wav files:----------------"
cd ./wav/
ls ./wavin/ &> /dev/stdout | tee wavfiles.txt         #send this to screen too
ls ./wavin/ | sed 's/\.wav$//g' > wavfiles_noext.txt

mkdir tmp
mkdir raw

for thiswav in $( < wavfiles_noext.txt )
do
	sox ./wavin/$thiswav.wav   -r 16000   -c 1  -s -w ./tmp/$thiswav.raw
done

cd ..


echo "------------------Making dump file:-------------------"
# creates ./dmp/lm_giga_5k_nvp_3gram.arpa.DMP
mkdir dmp
../lm3g2dmp/lm3g2dmp \
../lm_giga_5k_nvp_3gram/lm_giga_5k_nvp_3gram.arpa \
./dmp/


echo "-------------------Decoding:------------------"
SRATE="-samprate 16000"
ALGO="-hmm   ../sphinx3/model/hmm/hub4_cd_continuous_8gau_1s_c_d_dd"
DICT="-dict  ../lm_giga_5k_nvp_3gram/lm_giga_5k_nvp.sphinx.dic"
FILLER="-fdict ../lm_giga_5k_nvp_3gram/lm_giga_5k_nvp.sphinx.filler"
DMP="-lm    ./dmp/lm_giga_5k_nvp_3gram.arpa.DMP"

echo "$SRATE $ALGO $DICT $FILLER $DMP" > _CFG

../sphinx3/src/programs/sphinx3_livepretend  \
	./wav/wavfiles_noext.txt \
	./wav/raw/ \
	_CFG \
&> /dev/stdout | tee dump.txt

echo "-----------------Recognised words were:--------------"
grep FWDVIT dump.txt

exit 1

Part 2: Using a custom language model

Sphinx3 Speech Recognition Quick Start - Part 2

related