Explanation of Parameters to sphinx3 livepretend

From Sphinx

  • Acoustic model: Get phonemes from raw audio
-hmm $SPHINX_ROOT/sphinx3/model/hmm/hub4_cd_continuous_8gau_1s_c_d_dd

file contains statistical characteristics (HMMs - hidden Markov models) for each phoneme: ay, sh, ah,...

use SphinxTrain to generate it from {RawAudio + transcript}. #here's how


  • Lexical model: Phonetic dict - Get words from phonemes
-dict $SPHINX_ROOT/lm_giga_5k_nvp_3gram/lm_giga_5k_nvp.sphinx.dic
-fdict $SPHINX_ROOT/lm_giga_5k_nvp_3gram/lm_giga_5k_nvp.sphinx.filler

Typical line of blah.dic: aviation ey v iy ey sh ah n Filler is for sounds like coughing, umming etc


  • Language Model: %prob of each word & pairs/triples
-lm $SPHINX_ROOT/lm_giga_5k_nvp_3gram/lm_giga_5k_nvp_3gram.arpa.DMP

Comes from blah.arpa textfile - looks like this.

Have to convert .arpa to .DMP so Sphinx can use it.

$ lm3g2dmp $SPHINX_ROOT/lm_giga_5k_nvp_3gram/lm_giga_5k_nvp_3gram.arpa $SPHINX_ROOT/lm_giga_5k_nvp_3gram


More details at http://cmusphinx.sourceforge.net/sphinx3/doc/s3_description.html#sec_decoverview


here's how

Can someone help on this section? I'm trying to pull it out of http://www.speech.cs.cmu.edu/sphinx/tutorial.html


under dev.........

Have to feed these into SphinxTrain:

  1. The acoustic signals for everything you want to train - .WAV files
  2. The corresponding transcript file
  3. dictionary (language & filler)

and SphinxTrain will chunk out a Model-Index file.

(RawAudio Tarball of CMU students speaking random words & numbers into mic)

  • Convert into set of 13D feature-vectors (MFCCs)
perl scripts_pl/make_feats.pl  -ctl etc/rm1_train.fileids
  • Now process them some more:
perl scripts_pl/RunAll.pl
  • This populates
./model_parameters/rm1.cd_cont_1000_8/(your HMMs)

(ie the parameters of the final 8 Gaussian/state 3-state CD-tied acoustic models (HMMs) with 1000 tied states )

./model_architecture/rm1.1000.mdef 

Model-Index file for these models (used by the system to associate the appropriate set of HMM parameters with the HMM for each sound unit you are modeling.)

DECODING

related