VoxForge
Hi there,
I've checked in a first version of a hand-corrected (mostly by ralfherzog, thanks!) pronunciation lexicon that conforms to the Pronunciation Lexicon Specification. I'll try to train some G2P based on this data and check out if this improves over our current espeak-dictionary.
Cheers, Timo
Hi Timo,
Well done, thanks!
Ken
Hi Ralf,
I think the problem is with the way Trac is currently configured ... if you dowload the file in plain text (at the very bottom of the page), everything seems to display properly,
Ken
Hello Ken,
I just downloaded the file in "Plain Text" like you suggested. It didn't work out (with Notepad++, and with OpenOffice.org; both under Windows XP). Some special IPA characters are displayed correctly, others are not. But Firefox displays the "Plain Text" version (as well as the Original Format) 100% correctly.
So it is possible to display the pronunciation lexicon correctly with Firefox, but not with Notepad++. What is the reason for this different behavior? Maybe the encoding is correctly, and I need a different text editor. But which text editor should I use?
Off-topic: A similar problem occurs when I download the German Prompts.tgz. There seems to be a problem with the encoding. When I started to submit prompts to VoxForge, I didn't care about UTF-8. Instead, I submitted in ANSI (which probably means Windows-1252). Is it possible to fix this problem?
Greetings, Ralf
Trac's default character encoding is utf-8. I removed a reference in the trac.ini file (Trac config file) in the German repository (and all other languages...) that overrode the default and set it to: ISO-8859-1 (I am not sure what I was thinking when I set that way... :) )
German should now display correctly. If there are any more problems, please let me know,
Ken
Hi Ralf,
>But Firefox displays the "Plain Text" version (as well as the Original Format) 100% correctly.
I think this is because the text file is XML and the first line tells FireFox which encoding to use:
<?xml version="1.0" encoding="UTF-8"?>
>So it is possible to display the pronunciation lexicon correctly
with Firefox, but
>not with Notepad++. What is the reason for this different behavior?
I am not sure for the different behavior with Notepad++, it might work better if you download the original format version of the Pronunciation lexicon. You should not need to change text editors. There might an "encoding" or "charset" parameter that might need to be changed in Notepad++.
>A similar problem occurs when I download the German Prompts.tgz.
That is something that I noticed a while ago too... The prompt files in the individual submissions are correct, it is just when the prompt files were being added to the master_prompt files, the script was not using the correct encoding (I was not paying much attention to encoding back then either... :) ). I fixed this problem a few months ago, but the prompts
that were added to the master prompts files prior to the fix, need to be corrected. It's on the todo list.
Hi Ralf,
>we should try to exclusively use just UTF-8, and nothing else.
I agree, encoding issues give me a migraine... :)