VoxForge
I think, that all the important pages are translated. Then let all the things as there are.
Writing spanish for me is easy (not like writing english :-D, then I began to translate pages, and tried to traslate all the pages :-D
If you need more spanish translation job tell me, and I will do it.
Thanks for your work in VoxForge!!!
Hi ubanov,
Thanks for your help!
>If you need more spanish translation job tell me, and I will do it.
The spanish about page is probably the last main page that needs to be translated (since the "about" page is on the top menu). I just set it up so that you can edit it.
When you get a chance, it would be great if you could translate that page.
thanks again,
Ken
I have began to translate this page. This page is harder than the others!!!! may be tomorrow I will finish it.
If you need more help with the spanish part of this project, tell me. May be I can help with the Grammar or Language Model...
Hi ubanov,
>I have began to translate this page.
thanks!
>If you need more help with the spanish part of this project, tell me.
OK
>May be I can help with the Grammar or Language Model...
Grammars are used for command and control applications, and are specific to the domain of application... i.e. you don't really create a generic Grammar, you create your grammar for your particular application.
Language Models are used with Dictation applications (at least in the case of HTK/Julius...). We need lots more speech audio before we start looking at Language Models.
If you are looking for something specific for Spanish, a good first step would be the creation of a Spanish pronunciation dictionary.
Here are some resources we've come across:
Ken
I have sended a email to Juan Nolazco... If he says something I will tell you. He has the sphinx spanish project.
In the other hand I have read documents explaining the process of creating the voices for festival.
The first one is based on 40-50 phonemes (Let's say that if I have MAR, here it will be M A R). In order to create the voices for festival 400-500 diphonemes are used (Let's say that if I have MAR, here it will be MAR).
What is the more correct aproximation for voice recognition, using only a few phonemes, or using a lot of diphones.
The programs to translate text to diphones is on several .scm files in the directory /usr/share/festival/voices/spanish/Junta... Is there any way to use this files for us to use in the pronunciation dictionary? Does anyone if the scm files can be executed in anyway? (they are text and seems a script).
The bad thing is that while I'm reading, I'm not translating About page, sorry :-D
Hi Ubanov,
> I have sended a email to Juan Nolazco... If he says something I will tell you.
Thanks!
>The first one is based on 40-50 phonemes (Let's say that if I have MAR, here it will be M A R). In order to
>create the voices for festival 400-500 diphonemes are used (Let's say that if I have MAR, here it will be MAR).
Speech recognition uses phonemes (and creates triphones automatically as part of the training process)
>What is the more correct aproximation for voice recognition, using only a few phonemes, or using a lot of diphones.
Phonemes for speech recognition. Text-to-Speech requires more information in order to generate the proper intonation for a word. The Hidden Markov Models (the statistical models that represent the distinct sounds of a word) used in Speech Recognition don't need to be as detailed.
>The programs to translate text to diphones is on several .scm files in the directory /usr/share/festival/voices
>/spanish/Junta... Is there any way to use this files for us to use in the pronunciation dictionary? Does anyone
>if the scm files can be executed in anyway? (they are text and seems a script).
Festival uses the Scheme scripting language internally (a derivative of Lisp). These .scm files are likely Scheme scripts, and likely are programmatic versions of the letter to sound rules for Spanish Festival. But you don't necessarily need to look at them, you can use the rules they contain within Festival. Step 2 of the Automated Audio Segmentation Using Forced Alignment page has a description of how to use Festival for generating pronunciations for words:
To add a missing word (as displayed in your HDMan log - dlog)
to the VoxForge Lexicon, you need to look
at the pronunciation of similar words in the dictionary, and create a
new pronunciation entry for your word based on these similar
words.
For example, if you want to add the word "winward", you would look up words that are similar, such as:
WINWOOD [WINWOOD] w ih n w uh d
In this case, this gives us the pronunciation for the "win" in the word "winward". Next, we look for words that contain "ward" in the dictionary, such as:
WOODWARD [WOODWARD] w uh d w er d
WARD [WARD] w ow r d
Notice that although the words "woodward" and "ward" contain the same sequence of letters (ward), they are pronounced differently - they have different phoneme sequences. Next you need to make a judgment call based on your knowledge of your English dialect (you might also want to listen to the actual audio passage that contains the word, but this could take too much time for each and every word you are unsure of... ). For me, the way I pronounce the word part "ward" in "winward" is closer to the sounds I make in "woodward" that in the word "ward". Therefore, the final pronunciation dictionary entry I would use would look like this:
WINWARD [WINWARD] w ih n w er d
You then need to add this word to your version of the VoxForge Lexicon in *Alphabetical* sequence. You need to repeat these steps for all the "missing words" words in your eText. It's a little tedious when you perform this process for the first time, but as you get familiar with the words and phonemes, it goes much quicker.
Start Festival
$ festival
From the Festival command line, there are a series of "lex" commands
that can help you determine the phonemes contained in a word that is
not included in the VoxForge dictionnary, and as an added bonus, you
can actually listen to how Festival pronounces the word to get a better
feel for the phonemes.
First, find out which lexicons (i.e. pronunciation dictionnaries and
rules) are included in your distribution of Festival using the "lex.list" command as follows:
festival> (lex.list)
("english_poslex" "cmu")
Since VoxForge is based on the cmu dictionnary, we can use Festival
to determine the phonemes of an unknown word, using Festival's
dictionnary an pronunciation rules (see here for Festival's phone list).
Festival (rel 1.95) usullay uses the "cmu" lexicon by default.
To make sure that you are using this dictionnary, use the following
command:
festival> (lex.select "cmu")
Next, to determine the pronunciation of a word use the "lex.lookup" command as follows:
festival> (lex.lookup "internet")
("internet" nil (((ih n t) 1) ((er n) 0) ((eh t) 1)))
Festival will list the phonemes included in the word, but also includes numbers (these indicate "lexical stress" for a phoneme). Ignore the paranthesis and numbers, and you have Festival's view of the phonemes that make up the word you entered. Therefore, for the word "Internet", Festival says its phonemes are: "ih n t er n eh t".
So you should be able to do something similar for Spanish.
>The bad thing is that while I'm reading, I'm not translating About page, sorry :-D
Don't worry about it, this other stuff is much more interesting... :)
Ken
With all the information that you have given its very easy to make a very good program that do words to phonemes transformation (I mean using festival, and the JAndalucia voices).
I have reading about the JAndalucia voices, and they have done a very good work with the text2phonemes program (and I have tried it too).
Look at the program attached (festival2phone.c) with this you have the first thing that you asked me :-)
Another thing, the spanish phases that you have in the java page to record voices are very strange (may be that they are stranges for people from spain).
The translation of the About page is already finished.
Regards.
Hi Ubanov
>Look at the program attached (festival2phone.c) with this you have the
>first thing that you asked me :-)
Cool, thanks!
>the spanish phases that you have in the java page to record voices are
>very strange (may be that they are stranges for people from spain).
Mauricio created these (see this post:trinning text tralated - dated 7/28/2008) from a Spanish book ("Los Argonautas") that he found on the Gutenberg project - so the text might use archaic words or "turns of phrase".
If you would like to update the Spanish prompts, that would be great... Just make sure that the text you take them from (if you don't create your own) is compatible with a Free (GPLv3) license - like the public domain stuff on Gutenberg. I can incorporate them in the next release of the SpeechSubmission app.
>The translation of the About page is already finished.
Awesome, thanks!
Ken
Hi again,
I have updated the program that I make yesterday (it didn't work well with long words). Then I have searched a list of words in spanish (about 80000 words), and then make the translations to their phonemes. As the last step I have calculate the number of times that appears each phoneme.
Here you have the files.
I'm going to try to build some phrases in spanish that contains all the phonemes... If I get something usefull I will send you the result.
Thanks