VoxForge
Hi guys
I have two questions ?
1) I'm creating an acoustic model to recognize a phone number a name and a surname.
I suppose this was very simple, but now i can't recognize nothing.
I have a WER = 10% but i have bad result and my dubt is:
For this kind of application is better create an acustic model where I have a number or a name for each wav file or is correct put into the same .wav file all numbers and another with all names?
I follow the second method.
Have any suggestion?
2) If I solve the first problem, for this kind of application is better using a grammar or the ngram is the best way?
Thank you
> For this kind of application is better create an acustic model where I have a number or a name for each wav file or is correct put into the same .wav file all numbers and another with all names?
> I have a WER = 10% but i have bad result and my dubt is
You are welcome to share the data so you could get more detailed help on this problem
> If I solve the first problem, for this kind of application is better using a grammar or the ngram is the best way?
my model is this:
https://www.dropbox.com/sh/5hhiaas2oehial9/AABPUycYKAe8CTcjJqFAgEeUa
but i think the problem come from the wav file because I have in one file all number and in another one all letters.
Sorry but I have also another question:
I have another model containing 15 sentences. using it with the Ngram works very well. I can say few word present in a sentence and few from another and the recognition is well. This same model used with the grammar does not recognize anything. I can't understand why, and especially if there is a way to solve because I think is a little bit strange.
Well, you need more data but I don't think it's a core issue. Using all letters is also ok probably.
How exactly are you using the model? How did you get no results? Do you use pocketsphinx_continuous or what?
I'm using in sphinx4.
I test this model changing configuration in HelloWorldNgram, in the HelloWorld with grammar and also trying with one the file .wav used for the training in the Transcriber modifying for run in the ngram manner and with grammar.
whit ngram result are wrong always.
with grammar no result or better white space (not null)
I think the problem is the model but I hope there is a method or a setting who can give me best result.
What do you think about?
> I'm using in sphinx4.
Thank you now I'm trying with the lastest version.
But why if i say only one word with ngram sphinx recognize me well and if I do the same thing with grammar this doesn't happen. Is possible this can depend on the decoder settings and in the search manager? What do you think if I try to change this settings is possible use WordPruningSearchManager with grammar?
> But why if i say only one word with ngram sphinx recognize me well and if I do the same thing with grammar this doesn't happen
No idea, you are asking questions without providing the data (audio and code) to reproduce your problem, it's hard to help you in that case since I can't access files on your computer.
> Is possible this can depend on the decoder settings and in the search manager?
Yes, it is possible
> What do you think if I try to change this settings is possible use WordPruningSearchManager with grammar?
It is not a trivial task that requires understanding of sphinx4 internals. It's better to use latest version and leave search managers as they are, accuracy should be good with it.
There is my good model not containing numbers or letters only words...you can test the things i said you before about the grammar.
The new version I don't know why but recognize only a word and this is not spelled into the wav file.
Probably, sorry surely I make a mistake :)
https://www.dropbox.com/sh/llx61cjwnkbw60y/AABhkTiVoab1M1GgFtEYghbEa