VoxForge
Hi,
I'm trying to build a command and control app with Sphinx4 which should recognize numbers and yes/no (for many people, not only for me). Before I started with the project, I was convinced that this task should be not that hard. At present, I achieve an accuracy of ~ 70 % (for my voice only). I think that is for 12 words very bad.
Could someone explain me what I can actually expect? Ok, they mentioned in the wiki that one cannot expect a great accuracy, but does that apply for my small use case, too?
Where exactly is the bottle neck?
Maybe someone with more experience could give me some hints?
I already posted a more specific question one StackOverflow [1] and the suggestions were helpful, but overall not what I was hoping for.
Where is the point of 200 hours of speech, when the model fails for such simple cases? Again, maybe I have the wrong expections, because I haven't worked in the field of speech recognition yet.
I would be very grateful for some enlightening words :)
Best,
Sebastian
[1] http://stackoverflow.com/questions/43450934/cmusphinx-german-command-control-app-bad-accuracy?noredirect=1#comment73958974_43450934