Speech Recognition Engines

Nested
Sphinx3 problem
User: jcwang
Date: 2/4/2009 10:49 pm
Views: 36016
Rating: 7

Hi,

 

I am new to sphinx. I have downloaded the latest release version for sphinx3 (0.8) and the nightly build for sphinxTrain together with the an4.

 

When I run perl scripts_pl/decode/slave.pl, I run into the following problem:

 

SYSTEM_ERROR: "lm_3g_dmp.c", line 1270: fopen(/scratch/SphinxTutorial/an4/etc/an4.lm.DMP,rb) failed

; No such file or directory

 

In /scratch/SphinxTutorial/an4/etc/, the only *.DMP file is  an4.ug.lm.DMP, and I cannot find an4.lm.DMP anywhere. Does anyone know how an4.lm.DMP should be generated? 
Also, I am wondering what's the difference between an4_train.fileids an an4_test.fileids (and similarly an4_train.transcript and an4_test.transcript)? If I were to create my own an4_train.transcript, how would an4_test.transcript be generated?
Any help would be most appreciated. Please let me know there is any additional information I should provide to clarify my questions above.
Thank you very much for your help!
regards,
Jimmy

 

 

 

--- (Edited on 2/4/2009 10:49 pm [GMT-0600] by jcwang) ---

Re: Sphinx3 problem
User: nsh
Date: 2/5/2009 12:55 am
Views: 108
Rating: 6

>and I cannot find an4.lm.DMP anywhere.

It seems you didn't call setup_tutorial.pl from sphinx3 correctly. Anyhow, you can use an4.ug.lm.DMP if you want, just rename it or change the name in etc/sphinx_decode.cfg.

> Also, I am wondering what's the difference between an4_train.fileids an an4_test.fileids (and similarly an4_train.transcript and an4_test.transcript)? If I were to create my own an4_train.transcript, how would an4_test.transcript be generated?

Test files are used for testing, train files for training. They shouldn't overlap since otherwise there will be possibility of overtraining, the model will be closely bounded to train data. Usually testing goes on undependant data.The size of test data is usually 1/10 of the train data.

> If I were to create my own an4_train.transcript, how would an4_test.transcript be generated?

The same way as trian.transcript. Just listen for the contents of the test files and create the transcription.

--- (Edited on 2/5/2009 12:55 am [GMT-0600] by nsh) ---

Re: Sphinx3 problem
User: jcwang
Date: 2/5/2009 2:19 am
Views: 101
Rating: 7

Hi Nsh,

 

Thank you very much for your prompt reply. I really appreciate it. I re-run the setup_tutorial.pl, and the problem indeed goes away :). Thanks!

 

I have another question. Assuming that I have pocketsphinx running on a server. If I produced my own accoustic model and dictionary files after going through the sphinxTrain as described in 

http://www.speech.cs.cmu.edu/sphinx/tutorial.html

 

would you know what are the files I should copy over from an4/ to pocketsphinx/? I have searched around, but couldn't find such info. If you know of any link describing this, please let me know as well. BTW, my intention is to replace the default acoustic model/dictionary that comes with pocketSphinx with my own set (with very limited number of words/phrases).

 

Again, thank you very much for your help! I really appreciate it.

 

regards,

Jimmy

--- (Edited on 2/5/2009 2:19 am [GMT-0600] by jcwang) ---

Re: Sphinx3 problem
User: nsh
Date: 2/5/2009 7:54 pm
Views: 123
Rating: 5

> would you know what are the files I should copy over from an4/to pocketsphinx/?

You shouldn't copy them. About files, the model is inside model_parameters in the folder like db_name.cd_cont_3000 or db_name.mllt_cd_cont_3000. There are mdef, feat.params, variances, noisedict, everything for -hmm pocketsphinx param.
The language model (-lm) and the dictionary (-dict) are in etc of course, that's all you need to run pocketsphinx.

>my intention is to replace the default acoustic model/dictionary that comes with pocketSphinx

If you are going to recognize few English words, I'd consider using existing models. It's not that trivial to train good model yourself. Just replace the grammar and the dictionary.

 

--- (Edited on 2/5/2009 7:54 pm [GMT-0600] by nsh) ---

Re: Sphinx3 problem
User: jcwang
Date: 2/5/2009 8:26 pm
Views: 87
Rating: 5

Hi Nsh,

 

Again, thank you so much for your help!

 

I tried to decode using pocketsphinx_continous as follows:

 

S2CONTINUOUS=/usr/local/bin/pocketsphinx_continuous

HMM=//source/PocketSphinx/myTest1/model_parameters/myTest1.cd_cont_1000/

LMFILE=/source/sphinxInfo/trainInfo/etc/7677.lm.DMP

DICT=/source/sphinxInfo/trainInfo/etc/7677.dic

echo "<executing $S2CONTINUOUS, please wait>"

$S2CONTINUOUS \

        -fwdflat no -bestpath no \

        -lm ${LMFILE} \

        -dict ${DICT} \

        -hmm ${HMM} \

        -samprate 8000 \

        -nfft 256 $@

However, I am getting the following error:

 

INFO: s2_semi_mgau.c(1120): Reading S3 mixture gaussian file '//source/PocketSphinx/myTest1/model_parameters/myTest1.cd_cont_1000//means'

FATAL_ERROR: "s2_semi_mgau.c", line 1150: //source/PocketSphinx/myTest1/model_parameters/myTest1.cd_cont_1000//means: #codebooks (360) != 1

 

I searched around, but I am really clueless of what this means. Would you know what might be wrong? Sorry - I know that this is a very vague question. If there is any additional information I can provide to clarify the question, please let me know.
BTW, I am trying to get the system to decode a bunch of names, both American as well as foreign names. I did try to the default acoustic model with my own dictionary file, but the result is not very good for foreign names. Hence I thought that if I could do my own recording/training, the result is hopefully better. Any suggestions? (BTW, the list of phrases would be strictly names, around 50-60, nothing else).
Again, thanks for all the help!
regards,
Jimmy

 

--- (Edited on 2/5/2009 8:26 pm [GMT-0600] by jcwang) ---

Re: Sphinx3 problem
User: jcwang
Date: 2/6/2009 12:22 am
Views: 68
Rating: 6

Hi,

 

Just want to add that I generated an adapted acoustic model and the result is pretty good. Previously, I used the default acoustic model with my own dictionary file, and the accuracy rate (in recognizing the 10 names I entered) is about 50%. With the adapted model, it's about 80% accurate! 

 

The remaining the two names that Pocketsphinx couldn't decode correctly are both foreign, and are harder to pronounce.

 

If possible, I'd still like to try out my own acoustic model. If anyone knows what went wrong (please see my posting above), or have any suggestion on ways for developing reliable custom acoustic model for names, please let me know. Thank you very much for your help!

 

regards,

Jimmy

--- (Edited on 2/6/2009 12:22 am [GMT-0600] by jcwang) ---

Re: Sphinx3 problem
User: nsh
Date: 2/6/2009 3:17 am
Views: 96
Rating: 6

> FATAL_ERROR: "s2_semi_mgau.c", line 1150: //source/PocketSphinx/myTest1/model_parameters/myTest1.cd_cont_1000//means: #codebooks (360) != 1


You forgot -feat 1s_c_d_dd. You don't need -nfft, it's automatically taken from feat.params.

> Just want to add that I generated an adapted acoustic model and the result is pretty good. Previously, I used the default acoustic model with my own dictionary file, and the accuracy rate (in recognizing the 10 names I entered) is about 50%. With the adapted model, it's about 80% accurate! 

Well, for a few names it must be 98% accurate, not 80%. Did you change the feat params like -upperf 3500 -lowerf 200 -nfilt 31. You need to change this for 8 kHz. To make sure you do everything correctly describe what you did more precisely and describe the results - how big is your testing set, how big is it's vocabulary, what is the WER.

> The remaining the two names that Pocketsphinx couldn't decode correctly are both foreign, and are harder to pronounce.

 You sometimes need to correct the dictionary as well.

--- (Edited on 2/6/2009 3:17 am [GMT-0600] by nsh) ---

Re: Sphinx3 problem
User: jcwang
Date: 2/6/2009 4:43 pm
Views: 145
Rating: 7

Hi Nsh,

 

Thank you very much for your help!

 

For the adapted model, the following is what I did. If I did anything incorrect or any improvement is needed, please do not hesitate to let me know:

1. Prepared the required files and recordings:

myTest1.dic  

myTest1.listoffiles  

myTest1.transcription  

myTest1.txt and

myTest1_0001.raw to myTest9_0001.raw.

The dictionary file contained the following:

AGGARWAL        AH G AA R W AH L

BIRAJA  B AY R AE JH AH

CHIOU   CH AY UW

DEEPALI D IY EH P AH L IY

DEGLURKAR       D IH G L AH R K AH R

DEVULAPALLI     D IH V Y UW L AH P AE L IY

HENRY   HH EH N R IY

JIMMY   JH IH M IY

JOON    JH UW N

KEVIN   K EH V IH N

LEE     L IY

RUCHI   R AH CH IY

SKALAHAS        S K AE L AH HH AH Z

SOUMYA  S AW M AY AH

WANG    W AE NG

YE      Y IY

YE(1)   Y EH

YEH     Y EH

YINGQING        Y IH N G K AH NG

The transcript file contains:
<s> JIMMY WANG </s>
<s> SOUMYA SKALAHAS </s>
<s> RUCHI AGGARWAL </s>
<s> KEVIN CHIOU </s>
<s> DEEPALI DEGLURKAR </s>
<s> BIRAJA DEVULAPALLI </s>
<s> JOON LEE </s>
<s> YINGQING YE </s>
<s> HENRY YEH </s>
2. Then, I basically followed the step as described in:

http://www.speech.cs.cmu.edu/cmusphinx/moinmoin/AcousticModelAdaptation

to generate the adapted model as follows:
at myAdapt/
a) cp -a /usr/local/share/pocketsphinx/model/hmm/wsj1 .
b) /source/PocketSphinx/sphinxbase0.4.1/src/sphinx_fe/sphinx_fe `cat /source/PocketSphinx/myAdapt/wsj1/feat.params ` -samprate 16000 -c /source/PocketSphinx/myAdapt/myTest1.listoffiles -di . -do . -ei raw -eo mfc -raw yes

 

c) pocketsphinx_mdef_convert -text /source/PocketSphinx/myAdapt/wsj1/mdef /source/PocketSphinx/myAdapt/wsj1/mdef.txt
d) /source/PocketSphinx/SphinxTrain/bin.i686-pc-linux-gnu/bw -hmmdir /source/PocketSphinx/myAdapt/wsj1/ -moddeffn /source/PocketSphinx/myAdapt/wsj1/mdef.txt -ts2cbfn .semi. -feat s2_4x -cmn current -agc none -dictfn /source/PocketSphinx/myAdapt/myTest1.dic -ctlfn /source/PocketSphinx/myAdapt/myTest1.listoffiles -lsnfn /source/PocketSphinx/myAdapt/myTest1.transcription -accumdir .
e) cp -a wsj1/ wsj1adapt
f) /source/PocketSphinx/SphinxTrain/bin.i686-pc-linux-gnu/map_adapt -meanfn wsj1/means -varfn wsj1/variances -mixwfn wsj1/mixture_weights -tmatfn wsj1/transition_matrices -accumdir . -mapmeanfn wsj1adapt/means -mapvarfn wsj1adapt/variances -mapmixwfn wsj1adapt/mixture_weights -maptmatfn wsj1adapt/transition_matrices
g) /source/PocketSphinx/SphinxTrain/bin.i686-pc-linux-gnu/mk_s2sendump -pocketsphinx yes -moddeffn wsj1adapt/mdef.txt -mixwfn wsj1adapt/mixture_weights -sendumpfn wsj1adapt/sendump
h). run pocketsphinx_continuous using the adapted model as follows:
S2CONTINUOUS=/usr/local/bin/pocketsphinx_continuous
HMM=/source/PocketSphinx/myAdapt/wsj1adapt
LMFILE=/source/sphinxInfo/trainInfo/etc/7677.lm.DMP
DICT=/source/sphinxInfo/trainInfo/etc/7677.dic
echo "<executing $S2CONTINUOUS, please wait>"
$S2CONTINUOUS \
        -fwdflat no -bestpath no \
        -lm ${LMFILE} \
        -dict ${DICT} \
        -hmm ${HMM} \
        -samprate 8000 \
        -nfft 256 $@

 

 

where 7677.lm is as follow:

Language model created by QuickLM on Tue Feb  3 19:22:00 EST 2009

Copyright (c) 1996-2000

Carnegie Mellon University and Alexander I. Rudnicky

 

This model based on a corpus of 9 sentences and 20 words

The (fixed) discount mass is 0.5

 

\data\

ngram 1=20

ngram 2=27

ngram 3=18

 

\1-grams:

-0.9031 </s> -0.3010

-0.9031 <s> -0.2430

-1.8573 AGGARWAL -0.2430

-1.8573 BIRAJA -0.2950

-1.8573 CHIOU -0.2430

-1.8573 DEEPALI -0.2950

-1.8573 DEGLURKAR -0.2430

-1.8573 DEVULAPALLI -0.2430

-1.8573 HENRY -0.2950

-1.8573 JIMMY -0.2950

-1.8573 JOON -0.2950

-1.8573 KEVIN -0.2950

-1.8573 LEE -0.2430

-1.8573 RUCHI -0.2950

-1.8573 SKALAHAS -0.2430

-1.8573 SOUMYA -0.2950

-1.8573 WANG -0.2430

-1.8573 YE -0.2430

-1.8573 YEH -0.2430

-1.8573 YINGQING -0.2950

 

\2-grams:

-1.2553 <s> BIRAJA 0.0000

-1.2553 <s> DEEPALI 0.0000

-1.2553 <s> HENRY 0.0000

-1.2553 <s> JIMMY 0.0000

-1.2553 <s> JOON 0.0000

-1.2553 <s> KEVIN 0.0000

-1.2553 <s> RUCHI 0.0000

-1.2553 <s> SOUMYA 0.0000

-1.2553 <s> YINGQING 0.0000

-0.3010 AGGARWAL </s> -0.3010

-0.3010 BIRAJA DEVULAPALLI 0.0000

-0.3010 CHIOU </s> -0.3010

-0.3010 DEEPALI DEGLURKAR 0.0000

-0.3010 DEGLURKAR </s> -0.3010

-0.3010 DEVULAPALLI </s> -0.3010

-0.3010 HENRY YEH 0.0000

-0.3010 JIMMY WANG 0.0000

-0.3010 JOON LEE 0.0000

-0.3010 KEVIN CHIOU 0.0000

-0.3010 LEE </s> -0.3010

-0.3010 RUCHI AGGARWAL 0.0000

-0.3010 SKALAHAS </s> -0.3010

-0.3010 SOUMYA SKALAHAS 0.0000

-0.3010 WANG </s> -0.3010

-0.3010 YE </s> -0.3010

-0.3010 YEH </s> -0.3010

-0.3010 YINGQING YE 0.0000

 

\3-grams:

-0.3010 <s> BIRAJA DEVULAPALLI

-0.3010 <s> DEEPALI DEGLURKAR

-0.3010 <s> HENRY YEH

-0.3010 <s> JIMMY WANG

-0.3010 <s> JOON LEE

-0.3010 <s> KEVIN CHIOU

-0.3010 <s> RUCHI AGGARWAL

-0.3010 <s> SOUMYA SKALAHAS

-0.3010 <s> YINGQING YE

-0.3010 BIRAJA DEVULAPALLI </s>

-0.3010 DEEPALI DEGLURKAR </s>

-0.3010 HENRY YEH </s>

-0.3010 JIMMY WANG </s>

-0.3010 JOON LEE </s>

-0.3010 KEVIN CHIOU </s>

-0.3010 RUCHI AGGARWAL </s>

-0.3010 SOUMYA SKALAHAS </s>

-0.3010 YINGQING YE </s>

 

\end\

and 7677.dic is:
AGGARWAL        AH G AA R W AH L
BIRAJA  B AY R AE JH AH
CHIOU   CH AY UW
DEEPALI D IY EH P AH L IY
DEGLURKAR       D IH G L AH R K AH R
DEVULAPALLI     D IH V Y UW L AH P AE L IY
HENRY   HH EH N R IY
JIMMY   JH IH M IY
JOON    JH UW N
KEVIN   K EH V IH N
LEE     L IY
RUCHI   R AH CH IY
SKALAHAS        S K AE L AH HH AH Z
SOUMYA  S AW M AY AH
WANG    W AE NG
YE      Y IY
YE(1)   Y EH
YEH     Y EH
YINGQING        Y IH N G K AH NG
The way I test was to speak the names and see what got printed on the console. So far, all the names were identified correctly, except for the following two: BIRAJA DEVULAPALLI and DEEPALI DEGLURKAR. 
BTW, for my own acoustic model (not the adapted one), I setup the test script as follows:

 

S2CONTINUOUS=/usr/local/bin/pocketsphinx_continuous

HMM=//source/PocketSphinx/myTest1/model_parameters/myTest1.cd_cont_1000/

LMFILE=/source/sphinxInfo/trainInfo/etc/7677.lm.DMP

DICT=/source/sphinxInfo/trainInfo/etc/7677.dic

$S2CONTINUOUS \

        -fwdflat no -bestpath no \

        -lm ${LMFILE} \

        -dict ${DICT} \

        -hmm ${HMM} \

        -samprate 8000 \

        -feat 1s_c_d_dd

 

 

Unfortunately, but I am still getting the same error:

INFO: s2_semi_mgau.c(1120): Reading S3 mixture gaussian file '//source/PocketSphinx/myTest1/model_parameters/myTest1.cd_cont_1000//means'

FATAL_ERROR: "s2_semi_mgau.c", line 1150: //source/PocketSphinx/myTest1/model_parameters/myTest1.cd_cont_1000//means: #codebooks (360) != 1

It would be great if I could use my own acoustic model as the size is almost 1/10 of the adapted model.
Sorry for the long post, and again, thank you very much for all the info and help! 
regards,
Jimmy

 

 

--- (Edited on 2/6/2009 4:43 pm [GMT-0600] by jcwang) ---

Re: Sphinx3 problem
User: nsh
Date: 2/6/2009 5:09 pm
Views: 72
Rating: 7

I suppose it will be easier for both of us if you could just pack all your files in archive and upload them somewhere.

From a quick look: fwdflat and bestpath greatly improve accuracy, there is no need to disable them.

 

--- (Edited on 2/6/2009 5:09 pm [GMT-0600] by nsh) ---

Re: Sphinx3 problem
User: jcwang
Date: 2/6/2009 10:03 pm
Views: 104
Rating: 6

Hi Nsh,
 
Thanks for your response. I am having difficulties in finding a place where I can upload the file. The tar.gz file is rather large. Please let me know if you know of any place I can upload the file to. In the meantime, I'll continue to search around.
 
thanks,
Jimmy

--- (Edited on 2/6/2009 10:03 pm [GMT-0600] by jcwang) ---

PreviousNext