Acoustic Model Discussions

Flat
issues when using my own acoustic model in SPHINX4
User: meaita
Date: 8/23/2009 2:00 am
Views: 5208
Rating: 5

Hello, 
I want to recognize Mandarin with SPHINX4 ,I have made my own acoustic model (digits in mandarin utterance),and then I modified the Hellodigits demo ,using the model I made to recognize some digits, but the result is a little disappointing, for example ,the Input is "4 2 3 6 5 8 9 3 2 4 6 ",the output is always "8 9 3 2 4 6...",that to say,the result always misses the beginning several digits, Why? 
my train database includes 130 sentences in all,each sentence inculdes 10 digits, the whole data is about 20 mins; I am not sure whether my train database is reasonable, Any suggestions? 
 
Thanks very much for any answer! 
Meaita

--- (Edited on 8/23/2009 2:00 am [GMT-0500] by meaita) ---

Re: issues when using my own acoustic model in SPHINX4
User: kmaclean
Date: 8/26/2009 7:23 pm
Views: 2514
Rating: 7

Hi meaita,

I am not sure if this applies to Sphinx, but the Julius speech recognition engine uses the first few seconds at startup to get the average volume of the speech to be recognized...

From a previous post (One word grammar, always recognized?):

[...] Julian takes the cepstral mean of the last 5 seconds of speech as the initial cepstral mean at the beginning of each input.  So Julian looks at the previous 5 seconds of speech to get an average (cepstral mean) in order to recognize speech.  That is why in Julian's default configuration it never recognizes what you say for the first few utterances, as it tries to figure out this average.

You can get around this by using "-cmnsave filename"  to record a representative average for your environment, and then use "-cmnload filename" and "-cmnnoupdate" to use then cmn you saved and not try to recalculate it on the fly.  Theoretically your confidence scores should start looking reasonable, and you should be able to determine whether a word is in your grammar or not.

Is there a CMN parameter in Sphinx?

Ken

--- (Edited on 8/26/2009 8:23 pm [GMT-0400] by kmaclean) ---

PreviousNext