VoxForge
An excellent site for HTK based speech recognition for the portuguese language is Falab Brazil, includes:
Software:
Acoustic Models and Language:
Phonetic Dictionary:
Speech Corpus:
Text Corpora:
Scripts for training of Acoustic and Language Models:
hi,
I checked the scripts.Surprisingly the bigram language model that I build with LM HTK toolkit gain more accuracy than bigram that I build by SRILM tool kit. at least 10 perecent better!!!
Here is my command for build bigram in SRILM:
ngram-count -text sentences.txt -order 2 -wbdiscount 1 -wbdiscount 2 -lm bigram.txt
sentences.txt has 405 sentences.
Here is my Command for build bigram in HTK:
HLstatsCommand = ['HLStats -b Dictionary\bgwTel.txt -o dictionary\wordslist.txt labels\wordsMlf.mlf']
I build my acoustic model based on left to right HMMs with 16 Gaussian mixture for triphones using HTK.