VoxForge
Hi all,
I am not a technical guy but I do see lot of short-comings for community participation & the lack of it. It might sound unpleasant or whatever but this is my take on stuff :-
1 . Forums need a usability shot :- The forums are in a bad bad shape. I don't know what forum software are u guys using, but its just not something which many people could use/understand. Most of the forum softwares have a 'search' query thing where people can know if somebody has asked something before, so things are more organised. It should be visible all the time.
2. FAQ :- This FAQ should be visible all the times.
An FAQ which answers all the oft. repeated queries such as :-
i. How can I record the sounds?
ii. Which softwares do I need to record the sounds?
iii. How many sound clips are needed?
iv. from which countries?
v. what accents?
vi. any particular mike or hardware which would be useful?
and so on & so forth. This would make for huge gains.
3. There needs to be a blog which is accessible from the top itself so people know what new improvements are happening.
4. Break the requirements into small doable targets which are in the form of graph. Also tell what improvements would one have when we touch that target.
5. Make frequent releases of the corpus done & interact & blog the resulting improvements made in the various open source speech recognition engines.
6. Lastly, give alternative ways to do ftp submissions to the site. Give some generic instructions for people using console-based or graphical ftp clients to upload stuff. Perhaps there could be a way to tag them so they are attached to the job number automatically.
Feel free to suggest & improve the suggestions :)
--- (Edited on 3/22/2008 12:06 am [GMT-0500] by shirish) ---
Hello shirish,
#3 I think that this would be a good idea to have a VoxForge weblog. This would be good for the marketing. Interested people could get a quick impression of what is happening, of what is new. What about voxforge.wordpress.com? Or alternatively blog.voxforge.org? We could have a VoxForge weblog with several authors. I would like to become an author of such a weblog. Of course, there are the VoxForge forums. But a weblog could provide additional benefit. And hopefully, a weblog shouldn't have the problems with cache/reload like the WebGUI.
#5 I think that there are frequent releases of the speech corpus. But I do agree that a weblog would be good for the marketing. Ken, you could tell the people via a weblog what improvements have been made. It would be a good decision to communicate more details about the progress of the VoxForge project. Why is the AcousticModel-2008-03-21 better than the acoustic model from the previous day? What improvements have been made? What is the benefit of the recent nightly build? I don't know the answer, but VoxForge does really need more marketing. The people need to know that VoxForge continuously improves the products (acoustic models, speech corpus, lexicon). Most people are just consumers. But we should help the consumers to become ASR/TTS developers. We have to teach the people about the value of the VoxForge project. People need to understand. But if we don't tell them, they won't.
#6 There is an alternative way to submit via FTP.
I do have several further thoughts.
#7 Releasing a VoxForge dictionary (of course GPL licensed) that is compatible with the Pronunciation Lexicon Specification. This dictionary should be updated on a regular basis. GPL, PLS lexicons (English, German, Dutch) could be products that VoxForge has to offer. Recently, I submitted a first version of the German lexicon (GPL, PLS). This way should be continued.
#8 Making the speech corpus compatible with the Speech Synthesis Markup Language. VoxForge should focus on VoiceXML related standards. It is the right decision that VoxForge employs the GPL. But VoxForge should offer products that are compatible with XML (or to be more specific: with PLS for the dictionaries, and SSML for the prompts).
GPL is the legal standard. PLS (for the dictionaries) and SSML (for the prompts) should become VoxForge's technical standards. Those standards are easy to implement, you need just a simple text editor like Notepad++. I am planning to employ SSML in my future submissions. But first, I have to learn about the details of this specification (and related specifications, for example, the RDF/XML Syntax Specification). It is not easy to get involved, but in the end, it is just copy and paste.
#9 It would be great to have the VoxForge tutorials as screencasts. Screencasts are easy to understand, especially for beginners. Screencasts would help beginners to become involved with command line interfaces. You just have to watch the Screencasts to get an impression how Cygwin, Julius, HTK are working. And the different tools (CMU Sphinx, HTK) are using command line interfaces. But as long as graphical user interfaces aren't available, screencasts could be a good graphical solution. We could use a weblog to upload screencasts.
Those were a lot of thoughts. VoxForge should focus on products that can be used out of the box. But unfortunately, the final product (a complete speech recognition software) is not ready yet. So VoxForge should offer pre-products like tutorial-screencasts, PLS-dictionaries, SSML-prompts. These pre-products could be used immediately by ASR/TTS developers. And for the marketing of those pre-products a weblog would be an excellent solution.
Good products + marketing = success.
Greetings, Ralf
--- (Edited on 2008-03-22 6:31 am [GMT-0500] by ralfherzog) ---
--- (Edited on 3/22/2008 1:16 pm [GMT-0500] by Visitor) ---
Hi shirish,
thanks for the feedback!
>1 . Forums need a usability shot :- The forums are in a bad bad shape. I
>don't know what forum software are u guys using, but its just not something
>which many people could use/understand.
We use WebGUI as our CMS software. I'm not sure how that is relevant though, since with templates and style sheets, you can create anything you want with it (and this is the same with any other CMS) - it's quite flexible.
Every new forum package I've used required a little time/effort to get used to.
>Most of the forum softwares have a 'search' query thing where people can
>know if somebody has asked something before, so things are more
>organised. It should be visible all the time.
Each forum has a search link, for example:
Audio and Prompts Discussions
Add • Unsubscribe • Search
There is a "search all forums" box at the bottom of the Forum's page, you Google search the entire site on the Home page (bottom right hand side).
I agree, these might be better placed at the top of the page.
>2. FAQ :- This FAQ should be visible all the times.
It's on the links section on the About page. I think the VoxForge menu is busy enough as it is :)
>An FAQ which answers all the oft. repeated queries such as :-
>i. How can I record the sounds?
This is on Home page, under the section titled:
How Can You Help?
not sure why we would need another faq entry.
> ii. Which softwares do I need to record the sounds?
Again This is on Home page (the "How Can You Help" section), where it states all possible options for submitting speech to VoxForge:
Record yourself reading some text, and upload your recordings to VoxForge using one of the following approaches:
Other Options:
- your computer (using a Java applet which provides you with a list of prompts to read, and a "one-click" uploader; mirrors);
- your telephone (free long-distance telephone service providers).
- Voice2Type (commercial initiative to collect telephony speech, with collected speech to be donated to VoxForge under GPL);
- Submit an audio book chapter (if your recorded one through the LibriVox project);
- Submit a "Short" reading to MojoMove:
> iii. How many sound clips are needed?
If you tried the Java Speech submission applet, you would see that the Java Speech Submission applet asks for 10 prompts (and you can submit as many times as you wish), and you can also submit using Audacity using a tutorial that walks you through the steps to submit blocks of 40 prompts.
> iv. from which countries?
The speech submission page defaults to English and has links for other languages.
> v. what accents?
No limits - we are working on creating a new section to collect info on people's mother tongue (to help categorize non-native speech)
> vi. any particular mike or hardware which would be useful?
None - we are actually looking for submissions from various types of hardware/microphone combinations.
>3. There needs to be a blog which is accessible from the top itself so
>people know what new improvements are happening.
The news page does this... it can be formatted to look like a blog.
>4. Break the requirements into small doable targets which are in the form of
>graph. Also tell what improvements would one have when we touch that
>target.
I've asked and received many different opinions on the types of, and how much, speech we should be trying to collect. CMU Sphinx acoustic model was trained using 140 hours of 1996 and 1997 hub4 training data (proprietary, closed-source speech corpus). We are treading new ground here - at least from an open source speech recognition perspective - there are a lot of unknowns.
>5. Make frequent releases of the corpus done & interact & blog the
>resulting improvements made in the various open source speech
>recognition engines.
If you'd like to help get this done, it would be greatly appreciated. I was "non-technical" too when I started this project out.
>6. Lastly, give alternative ways to do ftp submissions to the site. Give some
>generic instructions for people using console-based or graphical ftp clients
>to upload stuff.
See this link: FTP Client Howtos, though FTP is not the preferred method of submitting speech. Please use the Speech Submission Java applet, or if using your own audio recorder (like Audacity), use the speech submission forum.
thanks,
Ken
--- (Edited on 3/25/2008 11:55 am [GMT-0400] by kmaclean) ---
>#3 I think that this would be a good idea to have a VoxForge weblog. [...] I
>would like to become an author of such a weblog.
I think anything that helps promote VoxForge is a good thing :)
However, as I stated in my post to shirish, (in my mind) a weblog is just a fancy forum with different formatting. WebGUI has weblog formatting options (which could be configured to look just like, or very close to, a wordpress site). I just have not used them.
Unfortunately I really don't have the time to commit to blogging about VoxForge. I spend quite a bit of time answering forum/dev questions, and working on improving the backend scripts and the speech submission applet.
I think what we really need are apps (that use VoxForge acoustic models) that solve user's problems and meet a user need. Once we achieve that, then users might start seeing the value that VoxForge could offer (i.e. a place to upload speech to improve speech recognition on their app). What those apps are, I'm not really sure, but blogging would take me away from trying to figure this out.
Ralf, if you would like to set up a blog, I can help you. It can either be a page on the VoxForge server (with caching issues ... :) ) or on another site, each has their advantages/disadvantages. But I cannot commit to provide ongoing contributions to a blog.
>#8 Making the speech corpus compatible with the Speech Synthesis Markup Language. VoxForge should focus on VoiceXML related standards.
If you want to do this, you are more than welcome.
Speech synthesis is a separate "domain" from speech recognition, and creating a speech corpus that is compatible with SSML might be a good thing in the long run (3-5 years+) but since Sphinx, ISIP, Julius and HTK use their own (but very similar) formats, I'm not sure how I understand how that will help us get FOSS speech recognition going now - there is no FOSS project, that I know of, that uses them now.
>#9 It would be great to have the VoxForge tutorials as screencasts.
It is much easier to make changes to a web page than it would be to have to re-record an entire screencast to correct an error. Remember, the target market for the tutorials are developers wanting to learn how to create an acoustic model. The VF tutorials go a long way to improving what was already in existence (for HTK/Julius at least) when I created them. If you want to take it to the next level, and create screencasts, then you are most welcome to do so. :)
For the newbie interested in learning more about speech recognition, I would recommend learning a scripting language first. It could be any scripting language (Perl, Python, Ruby, ...), but most speech recognition stuff uses Perl. Get a good understanding how Perl works (the basics - try Beginning Perl, or one of the books located here), and that will go a long way to helping them follow the VoxForge tutorials.
Ken
--- (Edited on 3/25/2008 1:06 pm [GMT-0400] by kmaclean) ---
Hi all,
First of all thanx for responding. I have to be frank. The motive which bought me to this were two :-
1. The search for a nice voice-recognition software.
2. The IPOD touch entry although don't think I'll be able to upload any entries in the near future (till the end of the month, atleast) as my microphone just got busted. :(
About mckalen's reply.
>See this link: FTP Client Howtos, though FTP is not the preferred method of submitting speech. Please use the Speech Submission Java applet, or if using your own audio recorder (like Audacity), use the speech submission forum.
It hasn't been made clear why FTP is not preferred method. I like to involve the browser as least as possible.
--- (Edited on 3/25/2008 1:45 pm [GMT-0500] by shirish) ---
>It hasn't been made clear why FTP is not preferred method.
Because the ftp site is not searchable by Google crawlers, the forums are. The forums on the Listen to User Submitted Speech Files page are the central repositories for all user speech submissions.
Ken
--- (Edited on 3/25/2008 3:21 pm [GMT-0400] by kmaclean) ---