VoxForge
In a paper entitled: Lexicon-Free Conversational Speech Recognition with Neural Networks by Maas, Xie, Jurafsky, and Ng, the authors describe a novel approach to creating acoustic models using the Kaldi speech toolkit without the use of a pronunciation dictionary:
We present an approach to speech recognition that uses only a neural network to map acoustic input to characters, a character-level language model, and a beam search decodingprocedure. This approach eliminates much of the complex infrastructure of modern speechrecognition systems, making it possible to directly train a speech recognizer using errors generated by spoken language understanding tasks. The system naturally handles out of vocabulary words and spoken word fragments. We demonstrate our approach using the challenging Switchboard telephone conversation transcription task, achieving a word error rate competitive with existing baseline systems.
They also state:
Our method yields a complete first-pass LVCSR system with about 1,000 lines of code — roughly an order of magnitude less than high performance HMM-GMM systems. Operating entirely at the character level yields a system which does not require assumptions about a lexicon or pronunciation dictionary, instead learning orthography and phonics directly from data.
--- (Edited on 5/26/2015 12:24 pm [GMT-0400] by kmaclean) ---
There's a flood of these end-to-end papers at the moment all claiming "a word error rate competitive with existing baseline systems" yet when you read the paper you find that they've made up thier own baseline or, in the case of this paper, are comparing with a GMM-HMM baseline that's 15 years old.
But writing a paper that says "lexicon free speech recognition gives 45% increase in wrod error rate - a pronuciation dictionary really helps!" doesn't sell.
Okay, I admit I'm a grumpy old reviewer - I'd have rejected this paper and the many like it (only because they say "comparable" when it's not - CTC is interesting) and as the founder of the use of RNNs in ASR I believe I have the right to be a grumpy old reviewer.
Tony
Speechmatics is hiring
www.speechmatics.com/careers
--- (Edited on 26-May-2015 8:02 pm [GMT+0100] by TonyR) ---
If a key element of science is reproducibility of results, Kaldi sets a high barrier. I have tried to get Kaldi running on a single machine about three different times in three years, and have yet to get it to operate satisfactorily. The hardware requirements are certainly formidable, and it seems little attention is given to installation on single non-cuda capable machines for the purposes of familiarity and training.
In the meantime, HTK and Julius are doing a remarkably solid and consistent job as we small horsepower participants endeavour to learn and contribute.
--- (Edited on 2015-06-12 4:35 am [GMT-0400] by colbec) ---