VoxForge
Hi!
I'm member of a college group that performs researches about Speech Recognition, from a brazilian university. We got a database with phonetic models ready to be processed, and we have a good knowledge of the HTK toolkit. But we're having difficulties with the process of speech recognition with speaker independent acoustic model.
So I want to receive some indications of good material about this theme, because we haven't sucess in our attempt. We believe that a good reference material can help us in our research, and facilitate the understanding of the database that we have available.
Thanks.
Eduardo Ruas
Hi Eduardo,
>So I want to receive some indications of good material about this
>theme,
I am not sure what you are looking for...
Ken
Hi Ken!
Our research so far has been based on the VoxForge AM Create Tutorial, however this tutorial is to create a database. I already have a database of Monophones, but for some reason I do not know, I can not use it.
Google did not respond very well to our need, we could not find material about the use of an existing database, we found only methods to create a database. So I wanted some tips on a specific method of how to use a database already exists.
In short, we need indications of books and papers with use methods of Speaker Independent AM, if possible applied directly to HTK. Thus perhaps we could understand the theory and algorithm, in order to be able to use the available data, without having to create another database. If you have information on how to use trained data to create a dictionary, they are also welcome!
Thanks for the attention.
Eduardo
>Our research so far has been based on the VoxForge AM Create
>Tutorial, however this tutorial is to create a database.
I am not sure what you mean here... the VoxForge acoustic model creation tutorial is a tutorial to create _acoustic models_ (which are not usually referred to as a 'database') in HTK format , so they can be used with the Julius speech recognition engine.
I think you are looking for the VoxForge tutorial on how to _adapt_ an acoustic model
> already have a database of Monophones, but for some reason I do not
>know, I can not use it.
The acoustic model creation process can be tricky because a problem in an earlier step only show up as a problem in a later step, making it very difficult to figure out the cause... sometimes, it is easier to restart the whole process.
One thing to note is that the triphone creation steps in the VoxForge tutorial are for English only (see my post in the thread).
>books and papers with use methods of Speaker Independent AM, if
>possible applied directly to HTK.
Look up the the HTK book, Googling "HTK acoustic models" brings up many references, including this paper: On Developing Acoustic Models Using HTK.
However, don't think trying to understand the theory at this point will help - it is very involved mathematics/statistics... best to get comfortable with the HTK tools first.
Ken
Hi Ken
The link that you have specified for creating speaker independent acoustic model
http://www.voxforge.org/home/dev/acousticmodels/linux/create/htkjulius/tutorial/download
says that the method is used for creating speaker dependent acoustic model in the tutorial first page. Also scripts are available to create the hmm models which also specify at the end of the tutorial that they are suitable for only the specific speaker.
http://www.voxforge.org/home/dev/acousticmodels/windows/create/htkjulius/how-to/script
"Note: the hmmdefs file shown here was trained with my voice. It will not work with your voice. Your file will have a the same structure as this one, but the statistics will be different."
Thanks
>Also scripts are available to create the hmm models which also specify at
>the end of the tutorial that they are suitable for only the specific speaker.
The process described in the VoxForge Tutorials are for creating a speaker dependent acoustic model. If you want to create a speaker independent acoustic model, you need to train an acoustic model with *lots* of speech from many *different* speakers.
>So the difference is only with different speakers and the procedure is same!
That is the approach I have been using on the VoxForge site.
I am sure there are subtle tweaks that can be done to improve on performance. You might want to look at Keither Vertanen's HTK Wall Street Journal Training Recipe for another approach.
Thank you. The method specified for adapting the speaker independent acoustic model for our voice can also be made speaker independent?The acoustic model prepared b VoxForge is huge and I would want to use it. I need my speech recognition system to recognize proper nouns(those not available in the dictionary too) which should also be present as words in the acoustic model right? So I need to create a few audio files and add them in the acoustic model available here, again this needs to be speaker independent. Any way out? Thanks
>So I need to create a few audio files and add them in the acoustic model
>available here, again this needs to be speaker independent.
Try it out - if it does not work well, add more audio to the acoustic model from different speakers... repeat