jaiger-20070214 - Phoneme 18

Re: jaiger-20070214 - Phoneme 18

User: kmaclean
Date: 2/22/2007 8:32 am

Views: 229
Rating: 17

Hi Joe,

thanks again for the submissions!

Note that the Metrics page is still only 'Alpha' ... one of the things I need to look at is how to deal with silence in the submissions, and how this affects the metrics. Currently, the metrics scripts look at the WAV header files for the duration of the wav file. But because I ask users to put in 1 second silence before and after each utterance (to help get an idea of noise levels in their recording environments), this duration in the wav header overestimates the duration of the speech submission by 1-2 seconds per prompt line. For a total submission of 40 prompt lines this means that it might be overestimated by 40 to 80 seconds!

Not sure how I am going to address this (maybe just remove 1.5 seconds in the metrics calculations for each line of speech), but all this to say that while you are have basically hit 1% of our overall 140 hour goal (which is awesome regardless ...), it may get revised down a bit once I figure out the best way to address this.

all the best,

Ken

--- (Edited on 2/22/2007 9:32 am [GMT-0500] by kmaclean) ---

Notice: many prompts in "English Speech Files" were adapted from the prompt files contained in the CMU_ARCTIC speech synthesis database, which were in turn derived from out-of-copyright texts from Project Gutenberg, by the FestVox project at the Language Technologies Institute at Carnegie Mellon University.

Re: jaiger-20070214 - Phoneme 18

User: jaiger
Date: 2/22/2007 9:02 am

Views: 175
Rating: 9

Ken,

I understand this limitation of the metrics as implemented today. That doesn't deter my goal-setting though.

As to the technical issue of silence detection and removal from the statistics, is there some software we can use to estimate the silence before and after the recordings?

-joe

--- (Edited on 2/22/2007 10:02:47 [GMT-0500] by jaiger) ---

Notice: many prompts in "English Speech Files" were adapted from the prompt files contained in the CMU_ARCTIC speech synthesis database, which were in turn derived from out-of-copyright texts from Project Gutenberg, by the FestVox project at the Language Technologies Institute at Carnegie Mellon University.

Re: jaiger-20070214 - Phoneme 18

User: kmaclean
Date: 2/22/2007 9:49 am

Views: 320
Rating: 16

Hi Joe,

Actually there are a few options:

1. Currently, I use the 'Main' directory as VoxForge's 'Normalized' audio directory - i.e. it is where I put all audio that has been converted to a standard format (same sample rate, bits per sample, channels, etc.). One option might be to simply use the Sox audio tool (which is used to downsample all the audio) to remove most of the beginning and end silence from the audio in the Main directory. The Metrics are calculated using the audio in the 'Main' directory.

2. I could also tell people to stop putting in the 1 second of silence before and after their utterance, and just tell them to include a 3-5 second recording of silence to get an idea of the type of noise that they have in their recordings.

3. As you suggested, there should be some way to determine the length of silence. I don't know of any software off-hand. But even doing a very bad hack using sox (or Julius' adinrec tool) to create a temporary file without noise and compare the length to the file with noise would likely do the trick, and calculate the silence that way, is another possible approach.

4. Search for software with this specific feature.

For options 3 and 4, I would likely have to change the current approach to Metrics calculation, which runs through all the audio every night (not too onerous since it is only reading the headers). If the software has to listen to the audio to determine the silence, then I would need to create a file (or database table) to hold this information (likely as part of the downsampling script), because having to reading the entire audio database in real time to detect silence would not scale very well ...

Lots of possible approaches, but for now, I need to focus on Audio Submission.

thanks for keeping me on my toes!

Ken

--- (Edited on 2/22/2007 10:49 am [GMT-0500] by kmaclean) ---

Notice: many prompts in "English Speech Files" were adapted from the prompt files contained in the CMU_ARCTIC speech synthesis database, which were in turn derived from out-of-copyright texts from Project Gutenberg, by the FestVox project at the Language Technologies Institute at Carnegie Mellon University.


Username	Password