Why is there so much speech sitting in the waiting list?

Audio and Prompts Discussions

Flat

User: bendauphinee
Date: 5/3/2010 11:50 am

Views: 7466
Rating: 14

I thought that once it was rated, it would be incorporated. Is nobody rating speech? Or am I incorrect about how this works?

--- (Edited on 5/3/2010 11:50 am [GMT-0500] by bendauphinee) ---

Re: Why is there so much speech sitting in the waiting list?

User: kmaclean
Date: 5/3/2010 12:49 pm

Views: 739
Rating: 15

>I thought that once it was rated, it would be incorporated. Is nobody rating

>speech? Or am I incorrect about how this works?

I am still working on some updates to the backend acoustic model training scripts...

Ken

--- (Edited on 5/3/2010 1:49 pm [GMT-0400] by kmaclean) ---

Re: Why is there so much speech sitting in the waiting list?

User: AT
Date: 7/18/2013 5:13 am

Views: 129
Rating: 9

Having just submitted a set, this still appears to be a problem: There are donations "awaiting processing" from 2010!

I'm concerned - I'm new around here but had thought the repository looked like it could grow to be a really useful development tool with a bit more audio data. This amount of lag will put people off donations and make the project appear zombiefied though...

Is the work anything I could help with to push it through? Or something which could be temporarily suspended to clear the backlog of audio samples?

--- (Edited on 7/18/2013 5:13 am [GMT-0500] by AT) ---

Re: Why is there so much speech sitting in the waiting list?

User: kmaclean
Date: 7/18/2013 8:25 am

Views: 194
Rating: 9

>Having just submitted a set, this still appears to be a problem: There are

>donations "awaiting processing" from 2010!

This started out as a site to collect English language speech audio. I was asked to add other languages... which I did.

English (as are most of the lanuages) is up to date. The newest languages are not - they will be.

If you need access to the raw data so you can create your own acoustic model, I can provide you with a link, but you will need to validate the speech yourself.

Ken

--- (Edited on 7/18/2013 9:25 am [GMT-0400] by kmaclean) ---

Re: Why is there so much speech sitting in the waiting list?

User: AT
Date: 7/19/2013 5:31 am

Views: 123
Rating: 8

Hmm I'm not sure I understand - so the acoustic models are still being updated, and the new speech is appearing in the repository (my sample has found its way to SpeechCorpus/Trunk/Audio/Main I see)... buuuut, the submission process is still marking loads of entries as "awaiting processing" for some reason?

Is there just an error somewhere that's preventing the removal from speech from a pending list even though it's being dealt with, then?

You're quite right, Ken - I was looking to play around with the raw acoustic data. Is there an easy tool with which to download chunks of the repository rather than the http://www.repository.voxforge1.org/downloads/ web interface?

Sorry if I went in a little hot there, by the way - I've seen my fair share of really interesting but unfortunately stalled open projects... No offence was meant - I'm sure the update took a fair amount of effort!

--- (Edited on 7/19/2013 5:31 am [GMT-0500] by AT) ---

Re: Why is there so much speech sitting in the waiting list?

User: kmaclean
Date: 7/19/2013 7:33 am

Views: 313
Rating: 8

>so the acoustic models are still being updated

only the English acoustic models

>buuuut, the submission process is still marking loads of entries as

>"awaiting processing" for some reason?

I have not got around to processing them - we get submissions that will crash the acoustic model creation scripts for a variety of reasons. The verification process is only semi-automated, and very tedious...

>Is there just an error somewhere that's preventing the removal from speech

>from a pending list even though it's being dealt with, then?

> Is there an easy tool with which to download chunks of the repository

>rather than the http://www.repository.voxforge1.org/downloads/ web

>interface?

use wget - your talking 20-30 gigs of data here... and the web hoster has an unlimited download data plan

Ken

BTW. I'm travelling for the next 4 weeks with limited access to Internet...

--- (Edited on 7/19/2013 8:33 am [GMT-0400] by kmaclean) ---

Re: Why is there so much speech sitting in the waiting list?

User: camdixon
Date: 8/26/2013 1:18 pm

Views: 2165
Rating: 7

Hello,
I don't mean to hijack this thread, but wanted to add something to it.

I wrote to Ken earlier in the forums and he responded. I could no longer find this after using the search function and the email that it responds with does not include a link to the post on the forums. (Original post at bottom of this message)

I am a developer, and I have thought of a way to help voxforge with at least english models. This approval process is what I think would help getting things in the queue processed. Maybe not without manual alterations on some, but it would speed it up. The solution would be something similar to the prompt that voxforge uses to collect the information on the website.

It might need to be an applet but at least have functionality to:

pull and display text prompt user was given when submitting speech

button to play the recording

approve button

flag for admin button

deny / discard button.

It would also need to display the language it is in, and possibly the dialect region if you ever got that specific.

You could do this in conjunction with the scripts, or have people volunteer their time by reviewing ones the scripts have flagged.

To up confidence ratings, you could also have multiple users (2-3) approve / flag / discard samples, and then make sure you have the confidence ratings to pass these into building the model. This way you are sure multiple humans agree it matches the prompts.

I would like it to display only 5-10 prompts at a time (oldest in queue) to add to the models.

This makes it easy for no scrolling to submit and make it faster to approve and process samples.

Once the base app is made, it would be good to make advancements to it such as ajax so it will auto pull more samples without pressing refresh for the user once they complete what they have on the screen.

You could also compile approval rates against other people, and based upon agreement (easily done stats commonly used for "coding behavior" in psychology). Then as people approve more you could know how good they are at it by their approval ratings and if they consistently compare and agree with other people.

Original Post

__________________

I am an open source enthusiast, and I work as a developer. I was wanting to help build the appropriate open source acoustic models. I have contributed many samples, and I was reading an older post on the forums that said it would be a good idea to have a way to approve the voice recordings that people submit through the online page.

If possible, I would like to help build that ability to alleviate what I am assuming to be the bottleneck on building better and better models.

Please let me know if I can help develop something to help voxforge.org

Thanks,

~Cameron

--- (Edited on 8/26/2013 1:18 pm [GMT-0500] by camdixon) ---

Previous • Next •


Username	Password