VoxForge
Hello,
First of all, great job with this project! I think that this project will be one the most popular projects in the open source world!
I searched the forums a little bit, and found nothing regards Hebrew, so I guess no one worked on Hebrew yet ...
Therefore, I willing to help open and maintain a section for Hebrew (with everything needed, speech recordings etc).
However, I really don't know what I need to do in order to start such thing, so every help will be appreciated.
Regards,
Ofir
--- (Edited on 12/23/2007 3:16 am [GMT-0600] by ofir) ---
You are welcome Ofir.
I suppose for a start we need around 2 hours of any transcribed speech from say 5 speakers. It's ok to start with such database.
Much later of course it will require some knowledge in ASR domain, but not now.
--- (Edited on 12/23/2007 6:34 am [GMT-0600] by nsh) ---
I saw in this site an Java applet which helps record the speech. Is it possible to use it?
If not, what instructions I need to give to the volunteers in order to produce the best recordings?
So, just to summaries, I need two hours of recordings from 5 different people (two hours all together or two hours from each and everyone of them [10 hours total]?).
I am sorry that I am a little bit annoying, I just want to unsure everything is done exactly how it meant to be done.
--- (Edited on 12/23/2007 7:03 am [GMT-0600] by ofir) ---
> I saw in this site an Java applet which helps record the speech. Is it possible to use it?
Sure, we need a list of prompts. But the contribution is not processed automatically so to bootstrap it's probably easier to record speech with a microphone on a local host.
> If not, what instructions I need to give to the volunteers in order to produce the best recordings?
Nothing special is required. They can just read some generic text, say, newspaper article or digits.
>I need two hours of recordings from 5 different people.
Yes, two hours total is enough. Say 25 minutes from each one.
--- (Edited on 12/23/2007 7:22 am [GMT-0600] by nsh) ---
Hi Ofir,
I created a new Hebrew "discussion forum" and a new Hebrew Speech Submission "forum" (which permits attachments so you can include your zipped speech audio files.)
The important thing to remember when looking for text to read, is that it must be from out-of-copyright texts. This is because any recordings of a copyrighted text are considered to be a "derivative work" of the original text, and therefore subject to the Copyrights of the author/owner of the original text. If you cannot find any out-of-Copyright text, you can create your own prompts, or just translate some of the VoxForge prompts.
See this thread for more information on how to create prompts and recordings for a new language.
Ken
--- (Edited on 12/23/2007 9:29 pm [GMT-0500] by kmaclean) ---
I just want to confirm that I instruct the volunteers correctly...
What they need:
1. Mike
2. Some text (not copyrighted, I will suggest one)
What they need to do:
1. Speak clearly to the mike (without any noise backgrounds)
2. Read the text over and over again until 25 minutes reached.
3. Upload the file to this forum: http://www.voxforge.org/home/downloads/speech/hebrew
1. Are the lists above cover all needed for recording?
2. Do they need to register in order to upload the recordings to the forum?
3. What is the estimated file size produced in a 25 minutes recording?
4. Are there any parameters regarding the record quality?
--- (Edited on 12/24/2007 6:32 am [GMT-0600] by ofir) ---
Hi ofir,
Here are the steps that are required (adapted from this post):
The best way to start is to try to find some texts that you can copy, segment them into 10-15 word "sentences", and record one speech audio file for each "sentence", and upload it to VoxForge.
Step 1
You need one of the following:
- public domain texts,
- out-of-Copyright texts, or
- create your own Copyrighted texts, but release them under an Open Source License (preferably GPL).
To start, it is OK to read one text that provides good monophone coverage. But to obtain the best possible speech recognition results, you need lots of different texts using as many different words as possible in as many different contexts, in order to improve your triphone coverage.
Step 2
Segment the texts you have chosen into 10-15 word "sentences". Put these into one text file, one "sentence" per line. The first column must be the name of the audio file containing the speech (without the ".wav" suffix). For example, a prompt file might contain the following entries:en16-01 He has to break it into several parts.
en16-02 They have to start in the same way.
en16-03 He is interested in exponentials.
"en16-01", "en16-02" and "en16-03" correspond to the names of three wav files (i.e. en16-01.wav, en16-02.wav and en16-03.wav) containing speech. For example, the en16-01.wav audio file will contain speech corresponding to the prompt line: "He has to break it into several parts".
It does not matter what you call your wav files, as long as they the first word of a prompt line corresponds to an actual audio file.
Step 3
Record your audio using Audacity (see Step 2 - Record your Speech with Audacity for details). Basically you need to create uncompressed (e.g. wav or aiff) 48kHz-16bit mono audio recordings, or lossless compressed (e.g. FLAC).
Step 4
Next, create your license and readme file, and package everything up into a zip or tar.gz file (see Step 3 - Upload your Speech Files to VoxForge for details), and upload it to the Hebrew Forum on VoxForge.
>1. Speak clearly to the mike (without any noise backgrounds)
Yes - see Step 2 - Record your Speech with Audacity for more details.
>2. Read the text over and over again until 25 minutes reached.
Reading the same text over and over again is not optimal ... it is better to have a variety of texts so that words are read in different contexts.
>3. Upload the file to this forum: http://www.voxforge.org/home/downloads/speech/hebrew
yes
>2. Do they need to register in order to upload the recordings to the forum?
yes.
>3. What is the estimated file size produced in a 25 minutes recording?
It is better to submit smaller recordings - around 5 minutes each. The VoxForge submission system is not designed to receive large files (there is a 50 meg limit in the submissions system)
If you wish to submit large recordings, you might try one of the following approaches (this is the process we use for people submitting uncompressed audiobook chapters, and works for large files - 50 meg and over).
If you ftp, (if you don't know what that means, ignore this option) go to the instructions by clicking this link.
If you don't ftp use one of these free sites to send your recordings:
http://www.yousendit.com/ files up to 100MB, holds files for 7 days
http://www.gigasize.com/ files up to 1.5GB, holds files for 90 days - no registration required
http://www.mailbigfile.com/ files up to 100MB, holds files for 5 days - no registration required
http://www.mediafire.com/ files up to 100MB, no time limit - no registration required
Click here for the instructions.
>4. Are there any parameters regarding the record quality?
You need uncompressed (wav or aiff), 48kHz sampling rate at 16 bits per sample, mono, audio recordings ... therefore no compressed recordings (i.e. no MP3). You can also use a lossless compressed format such as FLAC.
Hope this clarifies things.
Ken--- (Edited on 12/24/2007 9:32 am [GMT-0500] by kmaclean) ---
Thanks for your reply.
Now I know what are exactly the steps for submitting recordings.
However, since I am going to write an article in an online magazine, I afraid that many users will even not try to record themselves, because the instructions are somehow complicated for most of them.
So, do you have some suggestions that I can give which might ease the process?
--- (Edited on 12/24/2007 10:22 am [GMT-0600] by ofir) ---
Well, a bit complicated, I agree. I'm for example doing it a bit differently - I gave a speaker a text and he records it in a single wave file. He just reads the text and send me wave file (around 20 Mb on 16000 kHz).
Then I segment it on chunks by pauses with adintool from julius and transcribe it (it's easy to listen for the chunks and break text on newlines). Chunking takes around 30 minuts as well. That's all. Sound is ready for the database.
--- (Edited on 12/24/2007 2:57 pm [GMT-0600] by nsh) ---
I guess it all depends how much work you want to do, and how much you want your speech submitter to do. However, the easier it is for you user to submit speech, the likelier it is that they actually will do so. So nsh's approach will likely get you the most speech, but will involve some work on your part.
Constraints:
You might just get users to submit large unsegmented speech files (with the text they read and a GPL notice for their recording) using the AudioBook submission instructions (these were modified from the LibriVox audiobook chapter submission process). VoxForge would then be the central repository for large transcribed Hebrew speech files.
Then you could download these files at your leisure, and segment them (as discussed in nsh's post) for submission to the Hebrew Speech Submission Forum. Once there is enough speech, then we can look at creating a pronunciation dictionary, acoustic models, etc.
The Speech Submission Java applet has just been modified to permit localization (hardcoded) for other languages. We are starting with Dutch, then will look at Russian, German and/or French. Once completed, we could do the same for Hebrew.
Ken
--- (Edited on 12/26/2007 12:32 pm [GMT-0500] by kmaclean) ---
--- (Edited on 12/26/2007 12:44 pm [GMT-0500] by kmaclean) ---