VoxForge
Hi, I'm very interested in helping this project in a long-term fashion. I have a background in both audio recording and open source software, and I am studying computational linguistics at school right now. I am quite excited to help this project grow but would like to know the most productive method for my involvement.
Should I spend time recording speech? Are AudioBooks being included in the corpus? Can Creative Commons books be included or are the licences incompatible?
Should I spend time vetting recordings for transcription errors?
Should I spend time writing processing scripts?
Should I be poking all of my friends who speak other languages (French, German, Arabic, etc..) to contribute?
Where is this project at and what are the major boulders that need an extra hand to push up the hill?
--- (Edited on 5/9/2014 2:13 pm [GMT-0500] by EricHedekar) ---
Dear Eric
The most valuable help would be to write some code :) but datacollection could also help.
> Should I spend time recording speech?
No
> Are AudioBooks being included in the corpus?
Yes
> Can Creative Commons books be included or are the licences incompatible?
> Should I spend time vetting recordings for transcription errors?
> Should I spend time writing processing scripts?
Yes
> Should I be poking all of my friends who speak other languages (French, German, Arabic, etc..) to contribute?
> Where is this project at and what are the major boulders that need an extra hand to push up the hill?
--- (Edited on 5/12/2014 23:25 [GMT+0400] by nsh) ---
My opinion, scripts for sure. I was working on making a script to add into a separate page. The script did extract and play the recordings for users on this webpage (after some screening scripts ran) which would REALLY help voxforge as it would allow for distributed and user accepted processing. Basically, you need a username / pass setup at the moment to login and process the voice samples sumbitted via the site I belive. Let me know if you're interested in adding to / working on the script.
Seems like available sources are not the problem it's processing them all.
Not sure which languages need more voice files or which are priority. The more the merrier?
--- (Edited on 12/7/2014 1:39 am [GMT-0600] by camdixon) ---
--- (Edited on 12/7/2014 1:41 am [GMT-0600] by camdixon) ---
Erik,
You should also see a recent post on this forum here
http://www.voxforge.org/home/forums/message-boards/audio-discussions/why-is-there-so-much-speech-sitting-in-the-waiting-list#ZfuMlZ8N17bSi2LNKtGMxQ
Where ken (one of the organizers I believe) details why so many speech samples are awaiting processing. He says they crash the acoustic models.
">buuuut, the submission process is still marking loads of entries as
>"awaiting processing" for some reason?
I have not got around to processing them - we get submissions that will crash the acoustic model creation scripts for a variety of reasons. The verification process is only semi-automated, and very tedious..."
--- (Edited on 12/7/2014 1:46 am [GMT-0600] by camdixon) ---