Acoustic Model Discussions

Nested
Recognition rate sharply declined after adaption
User: bedahr
Date: 2/2/2010 9:59 am
Views: 9238
Rating: 5

Hi!


I am trying to teach simon to adapt a speaker independant speech model and ran into a strange problem. When I adapt the speaker independant speech model from voxforge with samples of my own voice the recognition rate gets worse - a LOT worse.


In numbers:

Unadapted Voxforge model: 73,1745 %

Model created only from my own samples: 77,5397 %

Adapted Voxforge model: 58,1122 %

 

It doesn't look like an obvious mistake because the words that aren't recognized correctly are not completely out of place (House instead of Nose, Jeans instead of Chin, etc.).

I uploaded the sam build log of the adaption process here:

http://pastebin.com/m4253a63c

The commands:

/usr/local/bin/HHEd" -H "/home/bedahr/.kde4/share/apps/simon/model/basemacros" -H "/home/bedahr/.kde4/share/apps/simon/model/basehmmdefs" -M "/tmp/kde-bedahr/sam/internalsamuser/compile//classes/" "/tmp/kde-bedahr/sam/internalsamuser/compile//regtree.hed" "/home/bedahr/.kde4/share/apps/simon/model/basetiedlist"


"/usr/local/bin/HERest" -C "/usr/share/apps/simon/scripts/config" -C "/usr/share/apps/simon/scripts/config.global" -I "/tmp/kde-bedahr/sam/internalsamuser/compile//adaptPhones.mlf" -S "/tmp/kde-bedahr/sam/internalsamuser/compile//train.scp" -H "/tmp/kde-bedahr/sam/internalsamuser/compile/classes/basemacros" -u a -J "/tmp/kde-bedahr/sam/internalsamuser/compile//classes" -K "/tmp/kde-bedahr/sam/internalsamuser/compile//xforms" mllr1 -H "/tmp/kde-bedahr/sam/internalsamuser/compile//classes/basehmmdefs" "/home/bedahr/.kde4/share/apps/simon/model/basetiedlist"

"/usr/local/bin/HERest" -a -C "/usr/share/apps/simon/scripts/config" -C "/usr/share/apps/simon/scripts/config.rc" -I "/tmp/kde-bedahr/sam/internalsamuser/compile//adaptPhones.mlf" -S "/tmp/kde-bedahr/sam/internalsamuser/compile//train.scp" -H "/tmp/kde-bedahr/sam/internalsamuser/compile/classes/basemacros" -u a -J "/tmp/kde-bedahr/sam/internalsamuser/compile//xforms" mllr1 -J "/tmp/kde-bedahr/sam/internalsamuser/compile//classes" -K "/tmp/kde-bedahr/sam/internalsamuser/compile//xforms" mllr2 -H "/tmp/kde-bedahr/sam/internalsamuser/compile//classes/basehmmdefs" "/home/bedahr/.kde4/share/apps/simon/model/basetiedlist"

config.rc: http://pastebin.com/m1b4a9daa

I am using HADAPT:SAVESPKRMODELS and load the resulting models with Julius for the recognition.


It probably is something trivial but I just can't find it. Any ideas?

 

Greetings,

Peter

--- (Edited on 2/2/2010 9:59 am [GMT-0600] by bedahr) ---

Re: Recognition rate sharply declined after adaption
User: Visitor
Date: 2/3/2010 3:06 am
Views: 76
Rating: 7

30 regression classes is probably too much for 250 sentences since it's a small amount of data.

As for accuracy changes, the problem could be everywhere. You need to compare accuracy change on adapatation data first, what is the accuracy of recognition of adatation data with voxforge model, what is the accuracy of recognition of adaptation data with adapted model.

 

--- (Edited on 2/3/2010 3:06 am [GMT-0600] by Visitor) ---

Re: Recognition rate sharply declined after adaption
User: bedahr
Date: 2/3/2010 4:37 am
Views: 102
Rating: 6

Hi!


Should I decrease the amount of regression classes? What would be a reasonable number? What are the advantages / disadvantages of higher / lower numbers?

 

If I understand the second paragraph correctly, I think I already stated that information in my inital post.

Recognition rate of the adaption data when using the unadapted Voxforge model: 73,1745 %

Recognition rate of the adaption data when using only the adaption data itself to create the model: 77,5397 %

Recognition rate of the adaption data when using the adapted voxforge model: 58,1122 %

 

Please let me know if you need any further information.


Greetings,

Peter

 

ps.: If you need the complete model to diagnose the issue, I could upload it somewhere...

--- (Edited on 2/3/2010 4:38 am [GMT-0600] by bedahr) ---

Re: Recognition rate sharply declined after adaption
User: nsh
Date: 2/3/2010 6:36 am
Views: 494
Rating: 7

yes its way better just to upload all required files somewhere.

--- (Edited on 2/3/2010 15:36 [GMT+0300] by nsh) ---

Re: Recognition rate sharply declined after adaption
User: bedahr
Date: 2/3/2010 2:14 pm
Views: 397
Rating: 7

Ok, you can find an archive containing all the needed files in this archive (lzma compressed tar archive, 24MB).

<EDIT: there seemed to be something wrong with the link, look at the and of the post>

I hope the files are self explanatory. I forgot the julius vocabulary file (and only realized it after uploading the whole archive) but the compiled version (dfa, dict) is included as is the HTK lexicon.


I recreated the lexicon / vocabulary because I thought that maybe the voxforge dictionary had changed since I used it to create the target dictionary for this system (and that the base model was trained with that different phoneme set). Sadly this reduced recognition rates even further (to something in the 60s) so don't be alarmed if your score is lower than the one I posted earlier (the adapted model is still the one with the lowest score, tough).

 

Greetings,

Peter

 

//EDIT:

http://www.megaupload.com/?d=GB3QGRCX

http://www.file-upload.cc/www/?a=d&i=qXkU6DH7im

--- (Edited on 2/3/2010 2:31 pm [GMT-0600] by bedahr) ---

Re: Recognition rate sharply declined after adaption
User: nsh
Date: 2/3/2010 8:37 pm
Views: 135
Rating: 6

With some magic it start working, not sure what I did


My files are here:

http://www.mediafire.com/download.php?bzll2znnmfd

 

====================== HTK Results Analysis =======================
  Date: Thu Feb  4 05:29:06 2010
  Ref : wordstest.mlf
  Rec : recout.mlf
------------------------ Overall Results --------------------------
SENT: %Correct=20.77 [H=27, S=103, N=130]
WORD: %Corr=41.54, Acc=-42.31 [H=54, D=0, S=76, I=109, N=130]
===================================================================
====================== HTK Results Analysis =======================
  Date: Thu Feb  4 05:29:26 2010
  Ref : wordstest.mlf
  Rec : recout.mlf
------------------------ Overall Results --------------------------
SENT: %Correct=27.69 [H=36, S=94, N=130]
WORD: %Corr=47.69, Acc=-26.92 [H=62, D=0, S=68, I=97, N=130]
===================================================================
====================== HTK Results Analysis =======================
  Date: Thu Feb  4 05:29:46 2010
  Ref : wordstest.mlf
  Rec : recout.mlf
------------------------ Overall Results --------------------------
SENT: %Correct=49.23 [H=64, S=66, N=130]
WORD: %Corr=64.62, Acc=19.23 [H=84, D=0, S=46, I=59, N=130]
===================================================================

--- (Edited on 2/4/2010 05:37 [GMT+0300] by nsh) ---

Re: Recognition rate sharply declined after adaption
User: bedahr
Date: 2/4/2010 4:18 am
Views: 148
Rating: 7

Thanks, thats great!

However, this model is just a test model to implement / test / improve the model adaption process in simon. I'd really need to know what did the trick.

Could you maybe post (the relevant part of) your bash history or something somewhere?

Greetings,

Peter

--- (Edited on 2/4/2010 4:18 am [GMT-0600] by bedahr) ---


Ok I tested the simon generated models with your test system and get even better results:

Static model:

====================== HTK Results Analysis =======================
  Date: Thu Feb  4 13:52:48 2010
  Ref : wordstest.mlf
  Rec : recout.mlf
------------------------ Overall Results --------------------------
SENT: %Correct=20.77 [H=27, S=103, N=130]
WORD: %Corr=41.54, Acc=-42.31 [H=54, D=0, S=76, I=109, N=130]
===================================================================

Adapted model:

====================== HTK Results Analysis =======================
  Date: Thu Feb  4 13:52:53 2010
  Ref : wordstest.mlf
  Rec : recout.mlf
------------------------ Overall Results --------------------------
SENT: %Correct=53.08 [H=69, S=61, N=130]
WORD: %Corr=67.69, Acc=26.15 [H=88, D=0, S=42, I=54, N=130]
===================================================================

Dynamic model (only my own recordings):

====================== HTK Results Analysis =======================
  Date: Thu Feb  4 13:52:58 2010
  Ref : wordstest.mlf
  Rec : recout.mlf
------------------------ Overall Results --------------------------
SENT: %Correct=62.31 [H=81, S=49, N=130]
WORD: %Corr=76.15, Acc=42.31 [H=99, D=0, S=31, I=44, N=130]
===================================================================

 

Which is strange. Because when testing the models with julius I get much better results.

The sam test system is a bit different from the HTK system as it determines the recognition confidence score of the result and uses this to calculate overall recognition rate. If the sample contains "Boy" and the results are "Coy: 100%", "Boy: 50%" then sam will count the 50% recognition rate (but mark it as not recognized). Maybe this is why the adapted model doesn't score as well?

Still with Julius I get more sentence errors with the adapted model than with the unadapted base model. Maybe there is something wrong with my Julius configuration?

Greetings,

Peter

--- (Edited on 2/4/2010 7:06 am [GMT-0600] by bedahr) ---

Re: Recognition rate sharply declined after adaption
User: nsh
Date: 2/5/2010 6:54 am
Views: 89
Rating: 7

yes its some issue with cooperation between julius and htk. if ypou dump model with global transform the increase will be significant. with many regression classes it decreases. it might be missing normalization of the model built from transforms.

--- (Edited on 2/5/2010 15:54 [GMT+0300] by nsh) ---

Re: Recognition rate sharply declined after adaption
User: bedahr
Date: 2/5/2010 12:27 pm
Views: 111
Rating: 6

Hello nsh!


First of all thank you very much for your help here it is very  appreciated.

Could you elaborate a bit on your last post? What increase will be significant? What will decrease with many regression classes?


Is there workaround? Is this to be considered a bug? If yes: of which project (HTK or Julius)? Is there a workaround available or is this a "defect by design"?

Any further information would be really helpful.


Greetings,

Peter

--- (Edited on 2/5/2010 12:27 pm [GMT-0600] by bedahr) ---

 

Some tests of different Julius options (all with the same, unadapted basemodel):

My default options (pretty much default): 62,9113

With -rawe, -enormal and -escale 0.1: 62,9112

With -cvn: 17,9956 %

With -sscalc: 61,138 %

Disabling my non standard word insertion penalties [subsequent models have this modification enabled]: 64,0642 %

Enabling beam width 1000 (-b) [subsequent models have this modification enabled]: 64,7489 %


The following options made no difference: -looktrellis, (-spsegment crashed julius), -lattice, -m, -s, -sb, different gprune methods, different iwcd1 methods.

 

Using this jconf and the adapted model I got a recognition rate of 53,0168 %. Using -htkconf it increased to 54,2807 %.

Using -cvn (lol): 1,62873 %

-rawe, -escale and -enormal have no effect.

 

For reference, the user dependant model: 69,9924 %.

 

Greetings,

Peter

--- (Edited on 2/5/2010 1:19 pm [GMT-0600] by bedahr) ---

Re: Recognition rate sharply declined after adaption
User: nsh
Date: 2/5/2010 2:36 pm
Views: 115
Rating: 7

> Could you elaborate a bit on your last post? What increase will be significant? What will decrease with many regression classes?

My current results:


Unadapted

------------------------ Overall Results --------------------------
SENT: %Correct=36.92 [H=48, S=82, N=130]
WORD: %Corr=36.92, Acc=36.92 [H=48, D=0, S=82, I=0, N=130]

MLLR 1 dumped after global transform

SENT: %Correct=70.77 [H=92, S=38, N=130]
WORD: %Corr=70.77, Acc=70.77 [H=92, D=0, S=38, I=0, N=130]

5-classes adaptation

SENT: %Correct=15.38 [H=20, S=110, N=130]
WORD: %Corr=15.38, Acc=15.38 [H=20, D=0, S=110, I=0, N=130]


1-class adaptation is the same as global. 2-classes adaptation gives about 67%. So I think it's some normalization issue.


My files including logs are here:

http://www.mediafire.com/download.php?kmyzjzoyhnz

--- (Edited on 2/5/2010 23:36 [GMT+0300] by nsh) ---

PreviousNext