VoxForge
Hi everyone,
I have a strange problem using HInit for training acoustic models used for an isolated word recognition task.
For the understanding: I've recorded many wav snippets (~700), which each contain 3 to 6 spoken words (a total of about 50 different words with short pauses between each word). Each snippet starts and ends with a little silence and all these snippets are labeled by hand.
So now I try to start my training with HInit but HInit is saying that the seen observation sequences for each word are very low (mostly between 2 and 3, which is too low! There are much more that this). What I guess is that HInit just sees the first observation directly after "sil" and ignores the rest but I realy don't know where the problem is.
Here you can see a lable file as an example:
0 106669124 sil
106669124 177054378 television
178145622 246348387 domain
253441475 311823041 walk
326009217 396394470 stop
431859908 511520737 monday
539347465 610823963 play
610823963 710400000 sil
And an example for the HInit command, which I execute:
HInit -A -D -T 1 -C config -S trainlist.txt -M hmm_models/hmm1/ -H hmm_models/hmm0/macros -H hmm_models/hmm0/hmm_walk -l gehen -L lab/walk
>> ERROR [+2121] HInit: Too Few Observation Sequences [2]
I used HInit a couple of time before but just with one word per snippet.
And to prevent some suggestions: No, I do not want to switch to a phonem recognition task for this case.
Would be very great if someone has an idea where the problem is! Thanks a lot in advance
--- (Edited on 9/24/2010 10:24 am [GMT-0500] by Visitor) ---
>I've recorded many wav snippets (~700), which each contain 3 to 6 spoken words
Have you tried searching through the HTK email archives?
--- (Edited on 9/24/2010 8:45 pm [GMT-0400] by kmaclean) ---
Hi,
thanks for your reply but I found the problem. My HTK labels just had one place to much, so I multiplied the seconds by 10^8 instead of 10^7. That always such silly and simple mistakes can make one despair... ;-)
Cheer
--- (Edited on 9/28/2010 8:51 am [GMT-0500] by Visitor) ---
Hi morino... i am a new user in htk...could you please tell me how did you get the label file for the speech. The example of the label file you wrote is exactly I need. Words spoken with timing information. I would really appreciate if you could point me towards the right direction.
Thank you..
--- (Edited on 4/5/2011 9:54 am [GMT-0500] by cvani) ---
You can get the label file(.lab) with the "HSlab"
--- (Edited on 9/28/2011 2:52 am [GMT-0500] by Visitor) ---
There is an option -m which by default is 3..
if you set is as 1 it should work
hope I could help
--- (Edited on 11/21/2011 10:50 am [GMT-0600] by Visitor) ---
Dear threre
I have the same problem
ERROR [+2121] HInit: Too Few Observation Sequences [2]
in gesture recognition which I can not solve.
I have seen the answer of Label in HSlab but do not understand how to apply.
Thank you
--- (Edited on 6/6/2012 4:36 am [GMT-0500] by ) ---
Would you please explain the processure due to which your problem got solved.....I am really feeling frustated due to this problem..please help..
--- (Edited on 2/22/2013 5:01 am [GMT-0600] by Visitor) ---
Hi,
I got the same problem, can u help me plz?
this is error:
Error[+2121] HInit: too few observation sequences [1]
Fatal Error - Terminating program HInit.
I get 879 .wav.
--- (Edited on 6/6/2017 5:20 pm [GMT-0500] by ) ---