VoxForge
Hi!
We (simon) have basically been using the voxforge script (transformed to C++ code) to create the speech model from the users input files.
Yesterday, we got a suggestion to switch the model type to MFCC_0_D_A (which means that the model uses 39 features instead of just 25). According to someone at the SPSC Graz this would especially improve the model when the training data uses more than one microphone.
Moreover, he suggested to use HHEds MU command to add more GMMs to the final model. I implemented the suggestions in simon and the improvement in recognition rate was drastic (in my tests).
Maybe you could try to change the model creation procedure for the voxforge model and see if this improves recognition rates there as well?
Steps to take if you want to try it out:
Change your model type to MFCC_0_D_A, adjust your prototype to use 39 features and add a few new steps after hmm15:
Use HHEd like this:
HHEd -A -D -T 1 -H hmm15/macros -H hmm15/hmmdefs -M hmm16 gmm1.hed tiedlist
Where gmm1.hed contains:
MU 4 {*.state[2-4].mix}
Re-estimate hmm16 twice, and repeat (technically for as long as you see recognition rates improve).
You can find the simon implementation here:
http://speech2text.git.sourceforge.net/git/gitweb.cgi?p=speech2text/speech2text;a=blob;f=simonlib/speechmodelcompilation/modelcompilationmanager.cpp;h=0afd877a9a862fe62cccd601df4526efeec653c8;hb=55a0ee83ae010547de134da38cbdc713eba85cae
http://speech2text.git.sourceforge.net/git/gitweb.cgi?p=speech2text/speech2text;a=tree;f=simond/scripts;hb=55a0ee83ae010547de134da38cbdc713eba85cae
Greetings,
Peter
--- (Edited on 3/6/2010 8:05 am [GMT-0600] by bedahr) ---
You know about HLDA and HMMIRest, don't you?
--- (Edited on 3/6/2010 21:04 [GMT+0300] by nsh) ---
No idea what so ever :)
As I already stated on multiple occasions: I have no special education in signal processing. Could you elaborate?
Greetings,
Peter
--- (Edited on 3/6/2010 1:18 pm [GMT-0600] by bedahr) ---
You can get some accuracy improvement with them as well. HTKbook describes those methods in detail.
--- (Edited on 3/8/2010 23:30 [GMT+0300] by nsh) ---
Hi Peter,
>we got a suggestion to switch the model type to MFCC_0_D_A
Thanks. I'm updating the backend acoustic model scripts and I'll see what I can do to test this out,
Ken
--- (Edited on 3/8/2010 11:15 pm [GMT-0500] by kmaclean) ---
Hi Ken!
Thanks for testing this! You can use sam to test the models if you want to: http://simon-listens.blogspot.com/2009/08/sam.html (simon git build needed)
Hi nsh!
Thanks for the input, I will have a look...
Greetings,
Peter
--- (Edited on 3/9/2010 1:46 am [GMT-0600] by bedahr) ---
Can you elaborate on how to change prototype to 39 features? Does adding these changes markedly increase recognition rates?
Thanks
Chri
--- (Edited on 4/17/2010 10:28 am [GMT-0500] by Visitor) ---
Hi!
Use this prototype:
http://pastebin.com/wHRzzhuX
Greetings,
Peter
--- (Edited on 4/17/2010 1:41 pm [GMT-0500] by bedahr) ---
Thanks. I was able to complete a test run with it. Will try it out tomorrow.
Chri
--- (Edited on 4/17/2010 10:02 pm [GMT-0500] by cmisip) ---
I noticed that on one of the links in your original message (the simon implementation) there is a gmm2.hed and a gmm3.hed.
In addition to changing proto and the config file and adding gmm1.hed, I ran the following:
HHEd -A -D -T 1 -H hmm15/macros -H hmm15/hmmdefs -M hmm16 gmm1.hed tiedlist
HERest -A -D -T 1 -T 1 -C config -I wintri.mlf -s stats -t 250.0 150.0 3000.0 -S train.scp -H hmm16/macros -H hmm16/hmmdefs -M hmm17 tiedlist
HERest -A -D -T 1 -T 1 -C config -I wintri.mlf -s stats -t 250.0 150.0 3000.0 -S train.scp -H hmm17/macros -H hmm17/hmmdefs -M hmm18 tiedlist
If the above is correct, should I do it again for gmm2.hed and gmm3.hed ?
Thanks
Chris
--- (Edited on 4/17/2010 10:21 pm [GMT-0500] by cmisip) ---