VoxForge
Dear folks, does somebody known what is the state of the art of an italian acoustic model for Sphinx? Does it exist? If so where is it possible to download?
Thanks for any response.
Paolo.
Hi Paolo,
VoxForge does not yet have an Italian acoustic model. You can check this for every language on the downloads page, then choose the language you're interested in, so in this case Italian. At this point in time we have a bit more than three hours of speech, so it's not really worthwhile to make an acoustic model. Please try and persuade some more people to donate speech and donate some speech yourself, then we can think about creating an (initial) acoustic model for Italian.
Perhaps if we have an additional three or four hours of speech it becomes useful. Of course it would not be useful if we would get four hours of speech from only one person, because that would lead to an overtrained model. However, four hours from four persons would already be quite acceptable.
Robin
Thanks for you replay, Robin.
Browsing inside the download area, I have seen quite a bit of material concerning audio and associated text. The problem I have here is that the material if fragmented. For what you know, is there a single global package which contains both the audio and the associated text to be downloaded? Or do I have to collect all the pieces around and to create myself the final training material (this is quite a tedious task)?
In order for us to invest in providing material useful to improve acoustic model we have to evaluate (for what is possible) the current available technology (read here sphinx) in order to understand the quality (in term of accuracy and performance) of the engine itself. If the performance we obtain are near to our expectancy it is sure we would invest in this task.
Any answer would really be appreciated.
Thanks.
Ken manages the website and the corpora, so he is better positioned to answer your question. You can check out this thread: http://www.voxforge.org/home/forums/message-boards/general-discussion/preferred-way-to-download-all-16khz-english-audio-files
It is very recent, so I'm sure it is still more or less the same. I hope that helps, otherwise we will need to ask Ken.
> is there a single global package which contains both the
>audio and the associated text to be downloaded?
No single package - it would be too big... and since updates are continuous, and it would not make sense to re-download everything from a single package for a small number of additions...
Just use wget and download from either:
Hi guys,
I'm Paolo Russo an italian engineer interested in building or help building an italian acoustic model.
I found voxforge project just some days ago, and today I'm submitting my speech.
In the next few days i'll recruit some people(some software developer friends and their brothers and friend) to add their speech to the database.
I hope to reach the minimum of 6-8 hours of speech in no more than few weeks. Italian folks, if we all recruit someone, we can fill up the database very quickly :)
Then we will speak about how to make a good acoustic model; me and a friend of mine are studying pocketsphinx in order to make a good italian recognition software, my best hope is find someone else interested in doing that.
Let's start speaking & recording! ;)
> I hope to reach the minimum of 6-8 hours of speech in no more than few weeks. Italian folks, if we all recruit someone, we can fill up the database very quickly :)
Thats great news. If there will be data, training a model is not an issue, it's really quick.
Well, one question: is 1 prompt(10 lines of speech) something like 1 minute of speaking?30 seconds? How many prompts can I submit as single user, before stop (to not make an overtrained model)? 60?
An hour of speech
Hello Paolo, i'm Carlo :-D
Can I speak Italian? :-)
Non so come funziona per Sphinx, ma per Julius, qualcosa la mastico.. Sto lavorando per la facoltà d'ingegneria di Unisannio, proprio alla realizzazione di un sistema di riconoscimento vocale, il cui core è Julius.
Da quello che ho potuto fin ora sperimentare, Julius, necessita di acoustic model costruiti con la voce dell'user che deve poi utilizzare l'engine. Se si utilizza un'altra voce le prestazioni precipitano! Nel mio lavoro di tesi, sono riuscito a costruire un acoustic model che ha permesso a Julius un buon riconoscimento del 92% delle parole (640). Tuttavia se non vuoi scrivere SW di correzione output riconosciuto, gli unici mezzi che hai sono:
1- Training effettuato ripetendo almeno 10 volte ogni parola (ma questo ti funzionerà bene solo per la voce che ha effettuato le registrazioni);
2- Training esterno + training effettuato dall'user effettivo, ma anche qui trovi limiti di precisione, infatti, pur utilizzando le funzioni di upgrade del modello acustico, il training dell'user effettivo è determinante per il buon riconoscimento..
Insomma, un buon modello acustico, lo ottieni (con quello che voxfoge mette a disposizione), soltanto se hai molte registrazioni della voce da riconoscere.