VoxForge
Hi,
I followed the walk-through on the voxforge site, now I know the precedures to bulid a speech recognition application with grammer.
But what I want to do is just a single word recongition, for example, I speak "WHAT", then I will get a result whether this audio is my key word "WHAT" or not.
I trained the monophones and tied triphones, and I extracted the parameters corresponding to my key word "WHAT" from hmmdef file and built a new single word hmmdef file.
Then what should I do next? I think I might be able to get a likelihood output if I input a audio file, then I can set a threshold or something to classify the result. Can I use HVite to calcute likelihood without a grammer? or what grammer I need?
Any suggestions will be appreciated! Thanks!
--- (Edited on 8/23/2014 10:08 am [GMT-0500] by taa199) ---
You can learn a theory of keyword spotting from this thesis:
http://eprints.qut.edu.au/37254/1/Albert_Thambiratnam_Thesis.pdf
in short for keyword spotting you need not just a word model but also a garbage model. You need to train garbage model in addition to keyword model (it can be monophone model with single statetrained on some amount of speech).
Then you can write a grammar where garbage and keyword are alternative. By balancing weights of keyword and garbage in the grammar you can emulate detection threshold.
If you want to try existing implementation of keyword spotting you can check latest pocketsphinx sources
http://github.com/cmusphinx/pocketsphinx
If you build and isntall it, keyword spotting is easy:
pocketsphinx_continuous -infile file.wav -keyword "ok google" -kws_threshold 1e-20
It will print you keywords detected and filter garbage
--- (Edited on 8/23/2014 20:12 [GMT+0400] by nsh) ---