VoxForge
All,
over the past few weeks I wrote some command line python scripts which allow me to build the german model (and probably others) and also help me create audio recordings + transcripts as well as dictionary entries quickly.
the tools I have created are not yet ready for prime time (I do plan to publish them soon) but I do have accumulated quite a bit of material during my experiments which I would like to contribute.
The audio submissions (at the moment roughly 30 mins of recordings) are completely packaged and ready to be uploaded - I just need to know how and where I could upload them. I found
http://voxforge.org/home/submit/audiobooks/ftp-submissions
but this seems to be for audiobook-submissions only which mine clearly aren't. also, I am not sure if submissions here would ever find their way into the official german model? Is there a better way to upload those?
Also, as mentioned in the subject, I have created a rather large phonetic dictionary. It does cover all ~10000 words needed for the currently existing PROMPTs. The phonetic strings should be fairly high-quality as all of them have either been manually checked (by listening to the MaryTTS speech synthesizer pronounce them and checking the entries against wiktionary) or have been adapted from hopefully high-quality sources (most of those come from MaryTTS's internal dictionary).
Is there an official way to contribute to the lexicon?
Thanks,
guenter
Hello Guenter,
welcome to voxforge. Personally I'm not from voxforge but doing something very similar to you. I spent the last month cleaning and rebuilding the german voxforge model(if you use actual prompts beware. Some are faulty).
I'm very interested in your audio data AND your phonetic dictionary. I also have created two phonetic dictionary. The bigger one is an automatic conversion of ralf's german lexicon so it would be a blessing to have another phonetic dictionary to countercheck. The second one is just a small subset of the first one to work on.
I will sent you some login data for a ftp. Would be grateful if you could share that stuff with me.
binh
>Is there an official way to contribute to the lexicon?
For the audio submissions, the ftp site is the best - let me know when you have completed so I can move them to the proper place.
For you pronunciation dictionary, you can add it as an attachment to a reply to this post (I changed the settings to allow for file uploads...), and I will add it to subversion and have it replicate to the VoxForge repository.
thanks!
Ken
About the prompts: Actually I noticed as well that quality varies a lot in these submissions and the transcripts are sometimes wrong.
As I mentioned our EMail exchange (but will repeat here for the forum) the tools I am working on right now store metadata information (phonetics, transcripts, ...) in an SQL database, what I am uploading right now is just exports of portions of that database.
To handle prompts/transcripts properly I think we definitely have to review all the submissions manually and check the transcripts, possibly fixing them if necessary. Also, I am planning to store a quality rating for each submission in my database, maybe a simple 0-5 star rating will do so we can build different models for different purposes. I am not sure about this (as I am still pretty new to the field of speech recognition) but I could imagine it could help to build the model in stages or at least process the submissions in a certain order - start with high-quality submissions first and then later add lower/noisier submissions to the model.
Another thought: Shouldn't we store phoneme transcriptions (i.e. using IPA symbols) for each audio file? after all, there are words which have more than one "correct" pronounciation in german - often depending on the region the speaker comes from or whether the speaker choses to pronounce words like "Personal" in english or in german ("Personal Computer" vs "Personal Buero") - so I guess the proper way to have really clean training material is to review each submission, do a quality rating and compose an IPA transcript.
Thanks for the quick reply - good to see VoxForge is alive :)
I have uploaded my audio data to voxforge1.org FTP and attached my dictionary to this post.
Moin Moin Friends ! :)
I am new to speech recognition and trying my best in creating a new german acoustic model with limited vocabulary. I am using Sphinx4. I am doing this for applying this on an automated system which would recognize limited vocabulary to carryout some machine opearations. I used Sphinx4 and it worked succesfully but efficiency is low and i doubt it should be trusted for carrying out in secured applications. Hence creating an german based acoustic model would be left as the only solution to me.
Can you guys share some of your experience and data files which will be helpful in making it possible. It would be very helpful in getting your further guidance in future.
email: [email protected]
mit freundlichen Grußen,
Nayak
Hello all,
nice to see, that there are other german "speec recognition devs" active right now
I am still pretty new to the field of speech recognition and also interested in a better german acoustic/dictionary model.
Therefore, can i help you (Guenter, Binh)?
I am working with Sphinx4 and i am still using the acoustic model from "http://www.voxforge.org/de/Downloads" which is something about 3 years old...
Has anyone build a newer one?
Thanks!
Friendly Regards,
Andreas
Hi,
thanks for the offer. There ist quite a lot to do.
The most important parts that spring to mind are:
Check basic Voxforge Material:
Voxforge raw Material needs to be checked. There are quite a few mistakes and voxforge team is too small to check everything. Plus they don't speak german. So we need to open each one of them and check if the spoken words match the transcription. I already started with some of them so maybe we should coordinate our attemps here.
New Accoustic Material:
German accoustic Material is about 30 hours of speech. It's not a bad start but not even close to the english material. So we need to cut audiobooks or record other waves. Guenther and I can supply some sources. This can be easly done with audacity.
Language Modell:
This is only necessary if you don't use a grammer but a language Modell. For that we need huge amounts of german sentences. So called text corpora. Although they are ready aviable for english , for example gigaword corpus, this isn't the case for german. I'm working on a webcrawler right now but it has a long way to go.
I suppose I could give you my model. I removed some of the faulty audio data and rebuild the acoustic model using sphinxtrain but it has some drawbacks.
1. You have to use my dictionary. I used a sligthly different phonemset in training.
2. It is trained based on a lot of words. If you only want something along the line "recognise a limited amount of commands" it might be better to build you own.
if you interested in anything of that just drop me a line
EDIT: I forgot. Dictionary needs to be checked as well
Binh
First of all: sorry for my late reply
Just a few comments from my side in addition to what Binh already said:
Building the german audio model right now is pretty difficult because of the mistakes and varying degree of quality in the submissions. We definitely have to listen to all the material, rate it, sort out the mistakes.
I am running a small database which stores all the metadata and I am working on a set of command line / ncurses python scripts to handle it. Right now my main focus is rating the submissions and producing correct transcripts for them - I am still not sure if we can really get away with just fixing the prompts or if we should rather store IPA transcriptions alongside those as well. After all, there are words that have more than one correct pronounciation in german, i.e. is "1984" pronounced "eintausendneunhundervierundachzig" or "neunzehnhundertvierundachzig"?
So, what I would like to try next ist write a small tool which
- loads one submission
- splits the prompt into words
- loads all available dict entries for those words
- generates a preliminary IPA transcript from those
- highlights words missing in dict, words that have more than one pronounciation in dict
- allows me to edit the prompt
- allows me to add missing entries to my dict
- allows me to choose among different pronounciations
- allows me to play back the entry
- allows me to play back the IPA transcript using MARY TTS
- allows me to rate the submission
large parts of the necessary code already exist as part of my dictionary editor (which will become obsolete as soon as I have finished this transcript tool) but still I will have to find the time to implement all this.
I will report here about my progress.
Coordinating efforts: Binh has already started rating submissions and has done a very good job there. Since we do not have much infrastructure right now, maybe agreeing on a simple common format for those efforts would help already? e.g. we could decide we want to use simple CSV, something along the lines of
submission,prompt,rating,comment
and simply post those here?
Or shall we use a google docs/drive (whatever it is called right now) spreadsheet?
Reading that reminds me of one important thing.
People should avoid using Number 1, 2, etc in their prompts. Use the words instead. "Eins", "Zwei",etc. Number giving a heckload of problems. For example in the actual prompts:
2,5 ( zwei komma fünf) became 25. (because alls "," and "." have to be deleted in the prompts. This is done by a script).
The other reason ist already mentioned.
You got serveral possible pronounciation for a year for example.
Better avoid number in prompts alltogether. Better use the numberwords.