VoxForge
I'm interested in performing a thorough evaluation of various speech recognition engines, however I currently don't have the means to do so.
Could the VoxForge corpus be used to evaluate various systems, such as the Google Chrome Voice API, Bing, Nuance NDEV, etc.?
I understand that the open source toolkits have evaluation functions, that are used for evaluating the acoustic & language models of their own systems.
Could HTK/Julian/Sphynx etc. be used to evaluate another system?
This would require some code to compare the transcriptions in the VoxForge corpus to the output transcriptions of the various ASR systems.
Any advice or assistance that you could offer would be greatly appreciated.
--- (Edited on 10/22/2013 7:59 pm [GMT-0500] by ) ---
> I'm interested in performing a thorough evaluation of various speech recognition engines, however I currently don't have the means to do so.
> Could HTK/Julian/Sphinx etc. be used to evaluate another system?
--- (Edited on 10/23/2013 21:30 [GMT+0400] by nsh) ---
Thank you for your recommendation of the NIST toolsets. (previously this website was offline.)
SCTK & SCLITE seem best suited to perform the evaluation:
http://www.nist.gov/itl/iad/mig/tools.cfm
What format are the VOXFORGE transcription files in?
SCLite accepts 'trn', 'txt','stm' & 'ctm'.
http://www1.icsi.berkeley.edu/Speech/docs/sctk-1.2/infmts.htm
The following example output, doesn't appear to be in either 'trn', 'stm' or 'ctm':
jaiger-12032006-6/mfc/vf6-01 HE CRIED AND SWUNG THE CLUB WILDLY
What tasks am I interested in evaluating?
Real-time continuous speech, in telephone calls, with channels segmented between speakers (no diarization required.)
This will be spontaneous speech (not read,) with a large vocabulary.
--- (Edited on 10/23/2013 7:43 pm [GMT-0500] by ) ---
> The following example output, doesn't appear to be in either 'trn', 'stm' or 'ctm':
Yes, it has to be converted. It can be done with a simple script.
> Real-time continuous speech, in telephone calls, with channels segmented between speakers (no diarization required.) This will be spontaneous speech (not read,) with a large vocabulary.
So you can automatically sort out Google/Bing which are unable to recognize narrowband telephone calls and Nuance default engine too. You can compare only specialized Nuance engine and CMUSphinx, both should be equal in accuracy except the price/license. That's it, I saved you a month to run a comparison.
--- (Edited on 10/25/2013 01:37 [GMT+0400] by nsh) ---
> Yes, it has to be converted. It can be done with a simple script.
I've been searching for a script, however I haven't been able to locate anything on SPHINX / FESTIVAL / etc.
> So you can automatically sort out Google/Bing which are unable to
recognize narrowband telephone calls and Nuance default engine too. You
can compare only specialized Nuance engine and CMUSphinx, both should be
equal in accuracy except the price/license. That's it, I saved you a
month to run a comparison.
I appreciate your expert advice, it is helping me head into the right direction!
I was interested in training Bing/Nuance with narrowband telephone calls (rather than the wideband microphone recordings.)
I understand that the LM (and pronunciation dictionary) will need to be adjusted to fit my domain. Is the MITLM toolkit best suited to interpolate the LMs?
--- (Edited on 10/27/2013 7:38 pm [GMT-0500] by julian) ---
> I've been searching for a script, however I haven't been able to locate anything on SPHINX / FESTIVAL / etc.
> I was interested in training Bing/Nuance with narrowband telephone calls (rather than the wideband microphone recordings.)
You can not train Bing engine, I also doubt you have access to Nuance engine which you can train.
> I understand that the LM (and pronunciation dictionary) will need to be adjusted to fit my domain. Is the MITLM toolkit best suited to interpolate the LMs?
SRILM is better than MITLM.
--- (Edited on 10/29/2013 01:09 [GMT+0300] by nsh) ---
>> I've been searching for a script, however I haven't been able to locate anything on SPHINX / FESTIVAL / etc.
> You can not train Bing engine, I also doubt you have access to Nuance engine which you can train.
Both Nuance & Bing have developer environments (freely available, however they lack customisation) as well as commercially available systems (that can be trained.)
--- (Edited on 10/29/2013 6:08 pm [GMT-0500] by julian) ---