Evaluation of various Speech Recognition Engines

General Discussion

User: julian
Date: 10/22/2013 7:58 pm

Views: 7246
Rating: 9

I'm interested in performing a thorough evaluation of various speech recognition engines, however I currently don't have the means to do so.

Could the VoxForge corpus be used to evaluate various systems, such as the Google Chrome Voice API, Bing, Nuance NDEV, etc.?

I understand that the open source toolkits have evaluation functions, that are used for evaluating the acoustic & language models of their own systems.

Could HTK/Julian/Sphynx etc. be used to evaluate another system?

This would require some code to compare the transcriptions in the VoxForge corpus to the output transcriptions of the various ASR systems.

Any advice or assistance that you could offer would be greatly appreciated.

--- (Edited on 10/22/2013 7:59 pm [GMT-0500] by ) ---

Re: Evaluation of various Speech Recognition Engines

User: nsh
Date: 10/23/2013 12:30 pm

Views: 110
Rating: 9

> I'm interested in performing a thorough evaluation of various speech recognition engines, however I currently don't have the means to do so.

It is very unlikely you would be able to do proper evaluation without taking into account engine capabilities. Engines differ depending on the number of features and comparing them on the same task often doesn't make sense since engines are developed with a specific application in mind.

Most critical factors for application developers are not accuracy but for example adaptation capabilities (you can not measure that with google), ability to support a new vocabulary (also not possible with google/bing). At the same task google has very strong focus on web query language and using it for generic language might be significantly worse.

It's better to describe the task you want to implement, the resources you have and you can just get an expert advise which solution to use. Blackbox evaluation doens't make sense for engines.

> Could the VoxForge corpus be used to evaluate various systems, such as the Google Chrome Voice API, Bing, Nuance NDEV, etc.?

Voxforge corpus can be used as a test data though it's better to use domain-specific data for that.

> Could HTK/Julian/Sphinx etc. be used to evaluate another system?

The engies are used to decode speech, not to evaluate other engines. For evaluation there is a specific software from NIST.

--- (Edited on 10/23/2013 21:30 [GMT+0400] by nsh) ---

Re: Evaluation of various Speech Recognition Engines

User: julian
Date: 10/23/2013 7:43 pm

Views: 146
Rating: 10

Thank you for your recommendation of the NIST toolsets. (previously this website was offline.)

SCTK & SCLITE seem best suited to perform the evaluation:
http://www.nist.gov/itl/iad/mig/tools.cfm

What format are the VOXFORGE transcription files in?

SCLite accepts 'trn', 'txt','stm' & 'ctm'.
http://www1.icsi.berkeley.edu/Speech/docs/sctk-1.2/infmts.htm

The following example output, doesn't appear to be in either 'trn', 'stm' or 'ctm':

jaiger-12032006-6/mfc/vf6-01 HE CRIED AND SWUNG THE CLUB WILDLY

What tasks am I interested in evaluating?

Real-time continuous speech, in telephone calls, with channels segmented between speakers (no diarization required.)

This will be spontaneous speech (not read,) with a large vocabulary.

--- (Edited on 10/23/2013 7:43 pm [GMT-0500] by ) ---

Re: Evaluation of various Speech Recognition Engines

User: nsh
Date: 10/24/2013 4:37 pm

Views: 160
Rating: 4

> The following example output, doesn't appear to be in either 'trn', 'stm' or 'ctm':

Yes, it has to be converted. It can be done with a simple script.

> Real-time continuous speech, in telephone calls, with channels segmented between speakers (no diarization required.) This will be spontaneous speech (not read,) with a large vocabulary.

So you can automatically sort out Google/Bing which are unable to recognize narrowband telephone calls and Nuance default engine too. You can compare only specialized Nuance engine and CMUSphinx, both should be equal in accuracy except the price/license. That's it, I saved you a month to run a comparison.

--- (Edited on 10/25/2013 01:37 [GMT+0400] by nsh) ---

Re: Evaluation of various Speech Recognition Engines

User: julian
Date: 10/27/2013 7:38 pm

Views: 162
Rating: 9

> Yes, it has to be converted. It can be done with a simple script.

I've been searching for a script, however I haven't been able to locate anything on SPHINX / FESTIVAL / etc.

> So you can automatically sort out Google/Bing which are unable to recognize narrowband telephone calls and Nuance default engine too. You can compare only specialized Nuance engine and CMUSphinx, both should be equal in accuracy except the price/license. That's it, I saved you a month to run a comparison.

I appreciate your expert advice, it is helping me head into the right direction!

I was interested in training Bing/Nuance with narrowband telephone calls (rather than the wideband microphone recordings.)

I understand that the LM (and pronunciation dictionary) will need to be adjusted to fit my domain. Is the MITLM toolkit best suited to interpolate the LMs?

--- (Edited on 10/27/2013 7:38 pm [GMT-0500] by julian) ---

Re: Evaluation of various Speech Recognition Engines

User: nsh
Date: 10/28/2013 5:09 pm

Views: 146
Rating: 8

> I've been searching for a script, however I haven't been able to locate anything on SPHINX / FESTIVAL / etc.

Such scripts are easier to write yourself in your favorite scripting language

> I was interested in training Bing/Nuance with narrowband telephone calls (rather than the wideband microphone recordings.)

You can not train Bing engine, I also doubt you have access to Nuance engine which you can train.

> I understand that the LM (and pronunciation dictionary) will need to be adjusted to fit my domain. Is the MITLM toolkit best suited to interpolate the LMs?

SRILM is better than MITLM.

--- (Edited on 10/29/2013 01:09 [GMT+0300] by nsh) ---

Re: Evaluation of various Speech Recognition Engines

User: julian
Date: 10/29/2013 6:08 pm

Views: 2603
Rating: 7

>> I've been searching for a script, however I haven't been able to locate anything on SPHINX / FESTIVAL / etc.

>Such scripts are easier to write yourself in your favorite scripting language

I'm surprised that the functionality is not included in SPHINX, given that it's viewed as easy to do. I noticed that the SRILM has this functionality available.

> You can not train Bing engine, I also doubt you have access to Nuance engine which you can train.

Both Nuance & Bing have developer environments (freely available, however they lack customisation) as well as commercially available systems (that can be trained.)

--- (Edited on 10/29/2013 6:08 pm [GMT-0500] by julian) ---

Previous • Next •


Username	Password