VoxForge
Has anyone ever tried using the corpus to build a Sphinx model as described here?
https://bakerstreetsystems.com/blog/post/training-cmu-sphinx-speech-recognition-software-ubuntu-1404
The benefit of Sphinx over Julius is that it doesn't need a grammar, so the model can be immediately used to transcribe audio similar to what it was presented with. The downside is that its accuracy is somewhat worse than Julius, because it doesn't have a grammar to guide it.
That said, there's gigs of corpus data, and I'm not sure how I'd scale the training method presented in the blog. I'd imagine the process would have to be broken into incremental steps, chunking off subsections of the corpus, and using map_adapt to merge them into the master model.
--- (Edited on 4/27/2016 7:13 pm [GMT-0500] by Cerin) ---