VoxForge
Here is a new experiment: in the resuts I have reported in the previous posts I have used nsh's language model which has very low perplexity (6.1988, measured by HTK's LPlex). Such perplexity is unusual for LVCSR so I have trained another language model. I still used the testing data for training, but used a cutoff value of one for both bigrams and trigrams. This means that only n-grams with more than one occurence in the data are included in the model.
The resulting perplexity is 96.8261 which leads to lower recognition accuracy but the quality of the acoustic model should play a larger role in the accuracy difference. The results support this:
Speaker independent:
SENT: %Correct=35.35 [H=35, S=64, N=99]
WORD: %Corr=75.60, Acc=68.81 [H=635, D=7, S=198, I=57, N=840]
Adpated:
SENT: %Correct=37.37 [H=37, S=62, N=99]
WORD: %Corr=82.14, Acc=77.14 [H=690, D=7, S=143, I=42, N=840]
Off topic question: how did you manage to set the nonproportional font for the results in your first post? Putting <pre> text </pre> does not work for me and I do not see any buttons for that.
--- (Edited on 22.04.2009 11:39 [GMT+0200] by tpavelka) ---
One more: I have used only 100 sentences (insted of the 6000 used in the previous experiments) for adaptation. The results:
SENT: %Correct=39.39 [H=39, S=60, N=99]
WORD: %Corr=82.14, Acc=76.67 [H=690, D=9, S=141, I=46, N=840]
--- (Edited on 22.04.2009 12:49 [GMT+0200] by tpavelka) ---
OK I'll also try new Language model. I don't use -h option in HDeceode, what does this mean? I searched HTK Book but couldn't find enough description for it
--- (Edited on 4/22/2009 6:10 am [GMT-0500] by Visitor) ---
The -h option is described in the reference section for HERest, HTKBook page 253. If I understand it correctly you can train XForms for different users based on filenames. E.g. in my case the mask is
*\\%%%%*.mfcÂ
so from
D:\work\develop\speech\LASER\MFCC_0_D_25_10\VoxForge\ralfherzog-20070803-cc-01.mfc
it will choose the name "ralf" for the XForm. I guess you can train several XForms in one pass and in the case of HDecode you can choose which XForm to apply. In my experiment I had only one user, but HERest crashed if I did not set the -h option.
I was not able to find any documentation for HDecode so anything I say about it is guesswork sometimes supported by experiments (e.g. the -m switch).
Yes, and thanks for the formating tip, I did not notice it because preformated text looks differently in the editing window (In Opera at least) than in the final version.
--- (Edited on 22.04.2009 14:31 [GMT+0200] by tpavelka) ---
--- (Edited on 22.04.2009 14:35 [GMT+0200] by tpavelka) ---
Hi tpavelka,
>Ok, here is my whole "training recipe" including models (zip, 75MB). As
>these thing usually are it is a kind of a mess so if you have any questions,
>just post them here.
If you want, I can put this package in the VoxForge Subversion repository, and put a link to it in the VoxForge Adaptation Tutorial (which obviously needs updating...).
To do so, I would need your permission to add GPL (or compatible) license to the package (in your name) - please let me know if this is OK.
thanks,
Ken
--- (Edited on 4/22/2009 12:17 pm [GMT-0400] by kmaclean) ---
The package that I have uploaded for Rauf is just a snapshot of my directory with VoxForge experiments. It is a big mess, I do not think it is a good idea to publish it, because if someone tries to actually use it, that person will spend a lot of time just trying to understand what does what. There are experiments that call functions that have chaged when I added new features, there is not a single line of comment...
I think there are two possible solutions:
1) What I have is just another HTK training recipe, basically it is the same thing that you have on the VoxForge server or what is described in the tutorial section of HTK. The only advantage of my approach is that it is written entirelly in Perl. If you think that it is useful to have yet another HTK training recipe I can clean it up, comment it, add some sample data to show that it actually works and then we can GPL it and publish it.
2) We can add new tutorials. I can help with that, but I need someone to do a review because I do a lot of mistakes (or maybe we could write it like a wiki? I do not know how publishing of articles works in VoxForge). I think the following should be added to the VoxForge tutorials:
--- (Edited on 22.04.2009 21:06 [GMT+0200] by tpavelka) ---
We can also add tutorial about training cross-word triphones for using with HDecode, continuous speech task
I also created windows batch file and several tools that completely automates training process, all you need to do just copy speech wave files and execute the batch file. If you want I also can comment it and test for several cases and then upload.
--- (Edited on 4/23/2009 12:23 am [GMT-0500] by Visitor) ---
These are my configuration parameters for coding wave files of training, adaptation and test data. All of them are 16 Khz and 16 bit , mono
SOURCEFORMAT = WAV
TARGETKIND = MFCC_0_D_A
SAVECOMPRESSED = T
SAVEWITHCRC = T
WINDOWSIZE = 250000.0
TARGETRATE = 100000.0
NUMCEPS = 12
USEHAMMING = T
PREEMCOEF = 0.97
NUMCHANS = 26
CEPLIFTER = 22
ENORMALISE = F
Is it ok?
--- (Edited on 4/23/2009 1:33 am [GMT-0500] by Visitor) ---