VoxForge
This might be useful for the creation of a German pronunication dictionary:
TXT2PHO - a TTS front end for the German inventories of the MBROLA project.
However, the software has some restrictive licensing provisions:
Permission is granted to use this software for non-commercial, non-military purposes, with and only with the lexicon and prosody files made available by the author from the HADIFIX for MBROLA project ...
Not sure if that would apply to pronunciations generated with the toolkit.
Ken
I don't think we can use it.
Using TXT2PHO in order to create a dictionary is close to reading the dictionary it uses (BOMP) directly. And both the dictionary and TXT2PHO itself clearly state they are non-military, which the GPL -- unfortunately -- is not.
Anyway, if we could use it, then we could just as well use BOMP directly.
I've had a first look at Sequitur G2P (which is a trainable g2p-tool) and it's likely that I will be allowed to use another trainable g2p-tool (without name, published in [1]). Thus, I will be able to compare the two and see which performs better.
So, we need some data to bootstrap these trainable systems. I just checked in some tools that extract pronunciations from the German Wiktionary.
The resulting data has to be post-processed, before we can use it for bootstrapping. In order to priorize that, we could use the word frequency information from Wortschatz-project, for which a Perl-module (EDIT: newer version with fixed frequency extraction) is available.
I hope to be able to setup a webtool that helps to post-process the wiktionary output. Would there be anyone volunteering to actually use that webtool and help in creating the dictionary? Ralf, would you be willing (and able) to help?
Cheers!
Timo
[1]: Phonological Constraints and Morphological
Preprocessing for Grapheme-to-phoneme Conversion
Vera Demberg, Helmut Schmid and Gregor Möhler, 2007
In Proceedings of the 45th Annual Meeting of the Association for
Computational Linguistics (ACL-07), Prague, Czech Republic, June 2007
Hi Timo,
Good work!
thanks,
Ken
Hi Ralf,
sorry for not getting back to you any earlier.
I've set up a dictionary tool on http://www.ling.uni-potsdam.de/~timo/projekte/voxforge.html . The main task is to paste the entries in the first row on the right (Aussprachen) to the corresponding field on the left.
Now, if it was just that, it would be too easy and too boring...
Often, there are far more variants of the word on the left than there are transcriptions. In these cases it would be nice, if you could add the missing transcriptions (often it is just a matter of appending -? or -? or whatever.
Sometimes the list on the left contains ridiculous word forms -- just leave the corresponding field empty (or press "Wort entfernen", but the result will be the same). It may also happen, that you are asked for the same word more than once (there are different entries for "bin", "ist" "sind" in the wiktionary and each entry will ask about all different sein-forms). If you are sure you've entered a transcription already, then just ignore it the second time.
Sometimes there are actually more transcribed word forms than words on the left. (Or they are different.) Then you can add a word form on the left with "Wort hinzufügen". Note: Often there are different transcriptions for the same word form (?v?ltn?, ?v?lt?n). Usually you would want to pick the form that would be used most in colloquial speech (here: v?ltn?).
Also, there may just be erroneous transcriptions (quite often), where people just guessed how IPA works. It's important, that we catch most of these errors. So you might actually want to start out with the Wiktionary Transcription Guideline which shows, how the transcription *should* be.
To enter IPA symbols into the textfields directly, just type the keys listed on the right (for ? type N) and they will automagically be transformed to IPA. (This works in Firefox, I don't have Windows, so I can't check Internet Explorer.)
Please input your e-mail address or another kind of ID into the first textfield. This way we can later compare who's the most hard working transcriber!
Cheers, Timo
clickable link: http://www.ling.uni-potsdam.de/~timo/projekte/voxforge.html
UPDATE: It's important that you transcribe, how something would be spoken in colloquial standard German. By the way, what region of Germany are you from? ;-)
Hi nsh,
>You select a phoneset, build an LTS system that will generate variants and
>then use forced-alignment against the recording to check are pronuncations
>valid or not.
Forgive my ignorance, but by "LTS" do you mean "Letter to Sound"? If so, do you mean that for each letter in the alphabet for a target language, you create a table that contains the different sounds that the letter might have, then you create a dictionary that would have multiple alternate pronunciations of the same word. Then you take a transcribed speech recording and let the speech recognizer figure out the correct phonemes for each word (using forced alignment), based on what it recognizes in the recording?
For example, if someone wants to create a dictionary for a new language, do you first start with a set of speech transcriptions for the target language (i.e. speech audio files with a transcription of the actual words spoken in a text file).
Then create the letter-to-sound rules. For example the word "house" in the VoxForgdict is pronounced as follows:
HOUSE [HOUSE] hh aw s
If I were using your approach, first I would create a phone list like this (CMU's phone list in this case):
Phoneme Example Translation
------- ------- -----------
AA odd AA D
AE at AE T
AH hut HH AH T
AO ought AO T
AW cow K AW
AY hide HH AY D
B be B IY
CH cheese CH IY Z
D dee D IY
DH thee DH IY
EH Ed EH D
ER hurt HH ER T
EY ate EY T
F fee F IY
G green G R IY N
HH he HH IY
IH it IH T
IY eat IY T
JH gee JH IY
K key K IY
L lee L IY
M me M IY
N knee N IY
NG ping P IH NG
OW oat OW T
OY toy T OY
P pee P IY
R read R IY D
S sea S IY
SH she SH IY
T tea T IY
TH theta TH EY T AH
UH hood HH UH D
UW two T UW
V vee V IY
W we W IY
Y yield Y IY L D
Z zee Z IY
ZH seizure S IY ZH ER
I would then create a set of letter-to-phone rules as follows (phonemes converted to lower case for easier reading):
H hh
O ow, oy, uw
U uh
S z
E iy
Then create rules for letter combinations to sounds (only for such letter combinations that have a unique sound in the target language):
HO hh aa, hh uh,hh ow
OU aw
US ax s,
SE s
Then generate all the possible pronunciations for the word "house":
HOUSE hh ow uh z iy
HOUSE hh oy uh z iy
HOUSE hh uw uh z iy
HOUSE hh aa uh z iy
HOUSE hh uh uh z iy
...
And then use the forced alignment feature of a speech recognition engine(like Sphinx, HTK, ...) to look the text of a particular recording (in this case of the single word "house"), and see what phonemes it identifies as the most likely used in the recording (HTK format in this example):
0 9400000 sil -5373.277832 SENT-END
9400000 10400000 hh -750.756897 HOUSE
10400000 11300000 aw -659.823364
11300000 12900000 s -962.888245
12900000 13300000 sil -238.437622 SENT-END
Which can then be input into a script to create the final correct pronunciation to the word "house":
HOUSE [HOUSE] hh aw s
Ken