Audio and Prompts Discussions

Flat
I've written a VoxForge Updater for iPhone - Devs, a couple questions
User: Jim
Date: 12/24/2009 10:58 pm
Views: 8464
Rating: 12

Hey VoxForge Devs, 

I've got a mostly-put together iPhone application that'll allow lots of users to do a reading.  

I'm tying this application into all of my other apps, and for users that finish the training I specify, they will get some free flite voices for use in all my other applications.  My purpose, of course, is to get my hands on some high-quality voice databases.  I figure this will be the best way to get a lot of updates to the Voxforge project.

My questions - What error correction do I need to do for you guys?  Do your scripts auto-detect when someone has fed them bogus audio that doesn't follow the words in the prompts?  Or do I need a QA process in my upload sequence?

Also - should I direct the users directly to the FTP on the site, or set up a cron job to do a nightly dump of wav files to the FTP?

Lastly - I need some guidance on the most effective way to have users do the prompts.  

I was going to have them do: Comma Gets a Cure, The Rainbow Passage, and Arthur the Rat.  I was also going to have them do a daily PhoneMe prompt corresponding with the date.

Is this an an effective enough reading for hopefully a LOT of individual users?

Thanks!

Jim

 

--- (Edited on 12/24/2009 10:58 pm [GMT-0600] by Visitor) ---

Re: I've written a VoxForge Updater for iPhone - Devs, a couple questions
User: Robin
Date: 12/25/2009 4:50 pm
Views: 238
Rating: 16

Great idea!

The current setup for processing audio still requires manual interference by Ken. I don't think we automatically filter out bogus audio submissions. If possible the system should be entirely automated in the future, but checking whether or not audio corresponds 100% with the prompts will not be possible before long I assume. So it might be interesting to distribute QA. For instance by requiring someone to submit some speech and evaluate the speech of someone else (e.g. flag submissions that are incomprehensible or where words have been inserted/deleted).

Ken knows exactly what our current setup is like.

I think the amount of reading (or listening) you will require your users to do should be proportional to the reward. Obviously for us, the more speech the better, but users should choose your apps and not apps from a developer who will not contribute speech to VoxForge.

Just out of curiosity: Will your users submit their speech while reading from the screen (i.e. phone in their hands in front of your face), or will they repeat prompts that they listen to (i.e. phone next to their ear). I assume that that will make a difference in recording quality which might also have an impact on the way such speech will be used in the future (e.g. for a phone application, or e.g. a GPS application). I'm not sure if it will be a big difference, but if it is it would be good to tag it appropriately.

Thanks for contributing to VoxForge in this manner. It will be very important in the future to get this type of speech!

--- (Edited on 12/25/2009 4:50 pm [GMT-0600] by Robin) ---

--- (Edited on 12/25/2009 4:51 pm [GMT-0600] by Robin) ---

Re: I've written a VoxForge Updater for iPhone - Devs, a couple questions
User: polerizer
Date: 12/31/2009 8:56 pm
Views: 1630
Rating: 14

I'm rather new to VoxForge, but would it be possible to use some sort of SRE to roughly validate the contents of submitted files quickly? Because Windows' speech trainer allows users to read books, etc. with the next word highlighted so I assume that they are improving their model while interpretting too. That'd be cool to see in open source apps (a voice trainer), but at least couldn't the exisiting model be able to check that the use didn't record random bogus? Just a thought...

--- (Edited on 12/31/2009 8:56 pm [GMT-0600] by polerizer) ---

Re: I've written a VoxForge Updater for iPhone - Devs, a couple questions
User: kmaclean
Date: 1/5/2010 1:07 pm
Views: 379
Rating: 14

Hi Jim,

Sorry for the delay in getting back to you, I had problems with the new Web server I implemented for the VoxForge site, that, coupled with the holiday season, did not leave me much free time...

>I figure this will be the best way to get a lot of updates to the

>Voxforge project.

Thank you for helping out!

>My questions - What error correction do I need to do for you guys?

> Do your scripts auto-detect when someone has fed them bogus

>audio that doesn't follow the words in the prompts?  Or do I need a

>QA process in my upload sequence?

We do some rudimentary automated error-checking... the nightly script creates an acoustic model (monophone-based) using only the submitted speech, and then tries to force align (i.e. use speech audio to create time-stamps for the prompts) all speech for a submission.  This only catches large discrepancies from the prompts. 

A better approach would be to use a current VoxForge acoustic model to try to force align the speech, and issuing warnings if there are there are more than 2 retries for a prompt.

In addition, I also open all the audio files in a submission in Audacity, and look at the waveforms... this catches early starts, low volume, clipping, etc,

Usually, if there are multiple submissions with the same username, you can listen to one or two submissions, and look at the waveform, and can be pretty certain the audio is OK.

Anonymous submissions, or single submussions are a different matter.  Sometimes you need to edit the audio to match the prompts, or just discard the submission completely.

>Also - should I direct the users directly to the FTP on the site, or set

>up a cron job to do a nightly dump of wav files to the FTP?

If you already have an ftp server, then it would be best that we get a nightly dump.  If you don't have an FTP server, then we can accommodate you... one question, why FTP and not HTTP?

>Lastly - I need some guidance on the most effective way to have

>users do the prompts.  

>I was going to have them do: Comma Gets a Cure, The Rainbow

>Passage, and Arthur the Rat.  I was also going to have them do a

>daily PhoneMe prompt corresponding with the date.

That is lots of prompts... as stated in my other post, 10-20 prompts are all most people are usually willing to donate... if you can get them to donate more, that would be awesome!

>Is this an an effective enough reading for hopefully a LOT of

>individual users?

From an acoustic model creation perspective, this is excellent - these prompts cover all the monophones in the English language. 

The downside, is that they do not cover all the triphones.  To get better coverage of triphones, you need readings from lots of varied text (preferrably GPL compatible).  Project Gutenberg is a good resource, or for more current language, the Enron email dataset

For the short/medium term, I would recommend you use your current approach, but allow for the prompts to be updatable as you get more users.

Ken

--- (Edited on 1/5/2010 2:07 pm [GMT-0500] by kmaclean) ---

Re: I've written a VoxForge Updater for iPhone - Devs, a couple questions
User: Jim
Date: 10/16/2010 12:03 am
Views: 175
Rating: 16

Hey Ken,

I've been busy with some other stuff, but this application is mostly done and I can submit it in the next few weeks.

I've got a personal favor for the devs - is it possible for my app submit only a Gzipped file that's not Tar'd first?  Gzip exists on the iPhone, but I'd have to port TAR, and I'd rather not have to go there.  I don't 'think' tar affects the compression, so the files will still be compressed.

Thanks!

Jim 

[email protected]

--- (Edited on 10/16/2010 12:03 am [GMT-0500] by Visitor) ---

Re: I've written a VoxForge Updater for iPhone - Devs, a couple questions
User: kmaclean
Date: 10/16/2010 10:39 am
Views: 2542
Rating: 14

> is it possible for my app submit only a Gzipped file that's not

>Tar'd first?

Yes.

Thanks again for this!

Ken 

 

--- (Edited on 10/16/2010 10:39 am [GMT-0500] by Visitor) ---

PreviousNext