Acoustic Model Discussions

Flat
recognition using nightly builds
User: sagarvenkata
Date: 2/19/2010 5:30 am
Views: 5863
Rating: 1

hi all,

I am using the nightly build acoustic model provided in the site and created the language model using the master prompts in the following way..

HLStats -o -s 'SENT-START' 'SENT-END' -b master_bg ../HTK_AcousticModel/wlist ../HTK_AcousticModel/words.mlf

HBuild -b -s 'SENT-START' "SENT-END' -n master_bg ../HTK_AcousticModel/wlist master_wdnet

I tested them using the master test prompts and the recognition is very poor.

Am I doing something wrong here?

thanks in advance.

Regards,

Sagar.

 

--- (Edited on 2/19/2010 5:30 am [GMT-0600] by sagarvenkata) ---

Re: recognition using nightly builds
User: kmaclean
Date: 2/21/2010 8:41 pm
Views: 106
Rating: 2

>created the language model using the master prompts

If you are using Julius 3.x you need a forward 2-gram and a reverse word 3-gram

Julius-4 can do recognition with forward N-gram or a backward N-gram.

--- (Edited on 2/21/2010 9:41 pm [GMT-0500] by kmaclean) ---

Re: recognition using nightly builds
User: sagarvenkata
Date: 2/21/2010 10:56 pm
Views: 109
Rating: 2

hi,

thanks for the reply.

I am using back off 2-gram model built from master prompts now usinng LBuild and using HDecode for decoding. The  acoustic model is cross-phone tied mixup state model.

for decoding im using

the 's' value of 5.0

the 'p' value of 0.0

i am getting word accuracy of 70% on test prompts. how can i improve it?

i need accuracy of 90% atleast.

 

Thanks&Regards,

Sagar

--- (Edited on 2/21/2010 10:56 pm [GMT-0600] by sagarvenkata) ---

Re: recognition using nightly builds
User: kmaclean
Date: 2/26/2010 1:14 pm
Views: 146
Rating: 1

>i am getting word accuracy of 70% on test prompts. how can i

>improve it?

From the HTK Book:

3.4.1     Step 11 - Recognising the Test Data
[...]
The options -p and -s set the word insertion penalty and the grammar scale factor, respectively. The word insertion penalty is a fixed value added to each token when it transits from the end of one word to the start of the next. The grammar scale factor is the amount by which the language model probability is scaled before being added to each token as it transits from the end of one word to the start of the next. These parameters can have a significant effect on recognition performance and hence, some tuning on development test data is well worthwhile.

I not very familiar with HDecode, but in order to test the VoxForge acoustic model with HVite, I just used trial and error, and came up with: -p 0.0 -s 5.0.

--- (Edited on 2/26/2010 2:14 pm [GMT-0500] by kmaclean) ---

Re: recognition using nightly builds
User: Visitor
Date: 2/27/2010 11:11 am
Views: 108
Rating: 1

hi,

i have tried with different s,p s=0, p=12 seems to give best result..

with an increase from 66% to 70%. nothing more.

So to increase the recognition what else can i do?

 

Thanks & Regards,

Sagar

--- (Edited on 2/27/2010 11:11 am [GMT-0600] by Visitor) ---

Re: recognition using nightly builds
User: kmaclean
Date: 3/8/2010 11:37 pm
Views: 2584
Rating: 2

>So to increase the recognition what else can i do?

Use the release build rather than the nightly build (which still needs some cleanup).

If you still want to use the nightly build, review the forced alignment logs (the HVite output in Step 8) and remove any audio that has a high "No tokens survived to final node of network at beam" entries.  If this improves recognition, please let us know what you removed.

thanks,

Ken

--- (Edited on 3/9/2010 12:37 am [GMT-0500] by kmaclean) ---

PreviousNext