Explanation of Parameters to sphinx3 livepretend
From Sphinx
- Acoustic model: Get phonemes from raw audio
-hmm $SPHINX_ROOT/sphinx3/model/hmm/hub4_cd_continuous_8gau_1s_c_d_dd
file contains statistical characteristics (HMMs - hidden Markov models) for each phoneme: ay, sh, ah,...
use SphinxTrain to generate it from {RawAudio + transcript}. #here's how
- Lexical model: Phonetic dict - Get words from phonemes
-dict $SPHINX_ROOT/lm_giga_5k_nvp_3gram/lm_giga_5k_nvp.sphinx.dic -fdict $SPHINX_ROOT/lm_giga_5k_nvp_3gram/lm_giga_5k_nvp.sphinx.filler
Typical line of blah.dic: aviation ey v iy ey sh ah n Filler is for sounds like coughing, umming etc
- Language Model: %prob of each word & pairs/triples
-lm $SPHINX_ROOT/lm_giga_5k_nvp_3gram/lm_giga_5k_nvp_3gram.arpa.DMP
Comes from blah.arpa textfile - looks like this.
Have to convert .arpa to .DMP so Sphinx can use it.
$ lm3g2dmp $SPHINX_ROOT/lm_giga_5k_nvp_3gram/lm_giga_5k_nvp_3gram.arpa $SPHINX_ROOT/lm_giga_5k_nvp_3gram
More details at http://cmusphinx.sourceforge.net/sphinx3/doc/s3_description.html#sec_decoverview
here's how
Can someone help on this section? I'm trying to pull it out of http://www.speech.cs.cmu.edu/sphinx/tutorial.html
under dev.........
Have to feed these into SphinxTrain:
- The acoustic signals for everything you want to train - .WAV files
- The corresponding transcript file
- dictionary (language & filler)
and SphinxTrain will chunk out a Model-Index file.
- Download RM1
(RawAudio Tarball of CMU students speaking random words & numbers into mic)
- Convert into set of 13D feature-vectors (MFCCs)
perl scripts_pl/make_feats.pl -ctl etc/rm1_train.fileids
- Now process them some more:
perl scripts_pl/RunAll.pl
- This populates
./model_parameters/rm1.cd_cont_1000_8/(your HMMs)
(ie the parameters of the final 8 Gaussian/state 3-state CD-tied acoustic models (HMMs) with 1000 tied states )
./model_architecture/rm1.1000.mdef
Model-Index file for these models (used by the system to associate the appropriate set of HMM parameters with the HMM for each sound unit you are modeling.)
DECODING
