VoxForge
Very interesting discussion!
>I've tried some things with sox and compand that removed the "background"
>noise, but unfortunately in the process it also clipped part of my speech (and it
>didn't work consistently amoung speakers).
Julius has something called "spectral substraction" that seems (I've never used it myself) to be used to remove noise from speech input using pre-estimated noise spectrum from file. Is there something equivalent for Sphinx?
>Why doesn't/hasn't someone taken something like WSJ1 and
>added/adapted it using all of these other speech files (Ie: ones from
>VoxForge, CMU, etc) that are available.
The source audio for the WSJ acoustic models can only be purchased from LDC - it is closed source. However, it seems like any acoustic models derived therefrom are freely distributable.
You could theoretically merge the WSJ and VoxForge acoustic models to create the "Super Acoustic Model" you were referring to, but the GPL license on the VoxForge corpus would prevent its *distribution*. This is because there is no freely distributable source audio for WSJ1, and the creation of a collective work that includes a GPL work (like the VoxForge corpus), must be distributed under the GPL. However, nothing stops you from merging them and only using them within your organization (as long as you don't distribute the resulting AM).
Ken
--- (Edited on 2/28/2008 11:19 pm [GMT-0500] by kmaclean) ---
--- (Edited on 2/28/2008 11:20 pm [GMT-0500] by kmaclean) ---
> Julius has something called "spectral substraction" that seems (I've never used it myself) to be used to remove noise from speech input using pre-estimated noise spectrum from file. Is there something equivalent for Sphinx?
Ken:There seems to be no equivalent on this. Actually it's only a three line code to remove estimate, there are better free methods availabe, they just need to be integrated.
> My application will be accepting incoming calls from numerous (ie: unlimited different speakers), probably the majority will be cell phones, and the majority will probably be while driving, hence a large amout of background noise (radio, road noise, passengers talking, etc). In order for this to be successful, I will need to find a way of maintaining a 90% or better recognition rate even under those conditions. This makes cleaning the incoming audio stream important.
Another thing I forgot is that you probably need to start with more representative test set then. The one I evalutated is for sure not optimal. Only when you'll have test set that is big enough you could proceed with algorithm optimization. Test set must include noisy calls, I suppose it's just an issue of enabling recording on server
> Wouldn't that increase the recognition rate overall, or is adaption limited to increasing the recognition rate for only one speaker?
Adaptation increase the rate for your environment dropping the issues related to original model environment, so it's for sure a good thing.
> I also really need to figure out how to build good quality large language models, but I believe I may be able to figure that one out on my own (especially if you have any links handy to information on the subject).
Ok, I'll look on the links on noise reduction. Update us about your progress too. There are too many things that should work properly. Another one is confidence score for example which you must use in your app to get the hypothesis correctly.
--- (Edited on 2/29/2008 1:44 am [GMT-0600] by nsh) ---
> Another thing I forgot is that you probably need to start with more representative test set then. The one I evalutated is for sure not optimal. Only when you'll have test set that is big enough you could proceed with algorithm optimization. Test set must include noisy calls, I suppose it's just an issue of enabling recording on server
Yes, I understand.. I have actually started created a set of audio that is more representitive of my environment. I was only getting 70% recognition rate in a controlled environment at the beginning of this thread though, so IMHO there was no need to proceed any further.. Now that I am getting 99% on my control, I can move into a more realistic test suite.
--- (Edited on 2/29/2008 8:30 am [GMT-0600] by Visitor) ---
>You could theoretically merge the WSJ and VoxForge acoustic models to create the "Super Acoustic Model"
can u please explain how to merge the WSJ and VoxForge or any two acoustic models
or would u give me the details on : [email protected]
thanks in advanced
--- (Edited on 7/3/2008 4:03 am [GMT-0500] by Lily) ---
Hi Lily,
>how to merge the WSJ and VoxForge or any two acoustic models
I guess what I was thinking of when I wrote that was not a true merging of acoustic models, but adapting the WSJ model using VoxForge speech data, as described in the Adapt Speaker Independent Acoustic Model to Your Voice tutorial:
--- (Edited on 7/11/2008 11:47 pm [GMT-0400] by kmaclean) ---
Hi nsh,
I was trying t ofind some solution to increase the accuracy of pocketsphinx and came across this discussion. I am new to this speech-to-text thing and have decided to use pocketsphinx to do my stuff. I tried to download the files u mentioned are in the following link:
http://www.mediafire.com/?camj5ujy1xw
but i didnt find any files there. In fact i got error there. So, could you plz provide me the files for my testing.
Thanks,
Johny :-)
johnsmith019 at yahoo dot com
--- (Edited on 3/8/2009 2:29 pm [GMT-0500] by ) ---
Sorry, I deleted the files. As I remember there was nothing important there. What are you interested in in particular? If you want an adaptation demo you can get it here:
http://www.mediafire.com/download.php?jerg1xddz4
--- (Edited on 3/8/2009 2:46 pm [GMT-0500] by nsh) ---
Hi,
As I said I am very new to the STT technology and I still not know much about the terminology used, so, cant say exactly that I need an "adaptation demo". But, as read in this discussion, u made some changes which increased the accuracy level to 99.3%. So, I thought using the files u tweaked will give the same accuracy when used here on my system. So, I asked for those files.
I am really sorry for being so unprepared with the theory when asking for a solution :-(
Could you please guide me.
Also, I wanted to learn about the various components of pocketsphinx, any weblinks for that??
I really appreciate you help.
Thanks,
Johny :-)
--- (Edited on 3/9/2009 2:58 am [GMT-0500] by ) ---
> So, I thought using the files u tweaked will give the same accuracy when used here on my system.
No, it doesn't work this way :)
> Also, I wanted to learn about the various components of pocketsphinx, any weblinks for that??
We have some loosely organized links here. Basically you need to start from reading a book first.
http://www.dev.voxforge.org/projects/Main/wiki/TheoryAndAlgorithms
You need to establish a ground test first of the system you are trying to create. Then we can try to imrove the performance as possible. But solutions for each particular case are different.
--- (Edited on 3/9/2009 6:00 am [GMT-0500] by nsh) ---