Acoustic Model Discussions

Flat
Single word recognition, improving accuracy for short words
User: heshiming
Date: 8/26/2013 8:23 am
Views: 3197
Rating: 6

Dear Community,

I'm building a speech recognition module for use with dictionary software. It needs to be able to recognize single word or short phrase input.

The dictionary contains 50k words. I'm using Julius as the decoder. It's fixed grammar, just one word per sentence, 50k choices of words. I followed the voxforge tutorials to train the model with HTKTools (up to hmm15). Training corpora are gathered from online dictionary, and voxforge.

The result is acceptable except for one problem. I discovered that Julius did some fantastic job on recognizing only long words. Words like CATASTROPHIC, CHARACTERISTIC, REVOLUTIONARY, I get less than 10% WER (not accurate, I suppose it's actually even less than 10%).

But for shorter words, a corpus of MISTAKE typically yields BISCUIT or MYSTIQUE as first choice. A corpus of CATALOG typically yields KERNEL, PARADOR. Sometimes the correct word may appear in the 5th or 10th result, but with an extremely low score. WER for such words can be well over 80%.

For even shorter words, such as CAT, DOG, WER is close to 100%. Typical output for CAT includes TACT, PACT, TAPPED, TAXED. Typical output for DOG includes BOG, GOD, BOGGED. You can see that the output are very similar to the correct word, but not quite the same.

Can anyone provide an educated guess on why accuracy falls sharply as words getting shorter?

I can see that shorter words have less triphones, but is there still room for improvement? Does this mean that I don't have enough training corpus for shorter words? Could anyone share some insights on how to improve accuracy for shorter words?

Thank you.

--- (Edited on 8/26/2013 8:23 am [GMT-0500] by heshiming) ---

PreviousNext