VoxForge
Hello,
I have a new user of Voxforge - HTK -Julius.
I try to make a french model but I bolck on step 8 at this line
HVite -A -D -T 1 -l '*' -o SWT -b SENT-END -C config -H hmm7/macros -H hmm7/hmmdefs -i aligned.mlf -m -t 250.0 150.0 1000.0 -y lab -a -I words.mlf -S train.scp dict monophones1> HVite_log
and I have this error :
ERROR [+8220] LatticeFromLabels: Word t not defined in dictionary
FATAL ERROR - Terminating program HVite
the content of my hmm7 directory is
hmm7/hmmdefs : https://dl.dropbox.com/u/93726840/hmmdefs
hmm7/macros : https://dl.dropbox.com/u/93726840/macros
dict : https://dl.dropbox.com/u/93726840/dict
prompts : https://dl.dropbox.com/u/93726840/prompts
lexicon : https://dl.dropbox.com/u/93726840/lexicon
wlist : https://dl.dropbox.com/u/93726840/wlist
words.mlf : https://dl.dropbox.com/u/93726840/words.mlf
train.scp : https://dl.dropbox.com/u/93726840/train.scp
and files *.lab and *.mfc : https://dl.dropbox.com/u/93726840/interim_files.zip
Can you help me to resolve this problem?
I have search on the net but I have see nothing solution
thanks a lot
You just need to read the message
>ERROR [+8220] LatticeFromLabels: Word t not defined in dictionary
It says that word "T" is not defined in a dictionary. Your dictionary is called "dict". If you open this dictionary file in a text editor you will see there is no line with the word "T". It is a problem you are looking for. You need to add a line with the word "T" to the dictionary like it's present in lexicon file already.
Try to read the messages software gives you.
hello thanks for replie,
I have already try to add the words T in dict as this :
T [T] t t t t
but same error ...
in the dic file I have some word start with letter T in the word
TA [TA] t t a sp
TALONS [TALONS] t t a l o on s s sp
TEL [TEL] t t e e el sp
TENTA [TENTA] t t e e en t a sp
TENTACULES [TENTACULES] t t e e en t a c u l e e es sp
TERRE [TERRE] t t e e e e e r rr e e sp
TOUS [TOUS] t t ou o s s s sp
TOUT [TOUT] t'u sp
TOUTE [TOUTE] t t ou o t e e sp
TOUTES [TOUTES] t t ou o t e e es sp
TRAITE [TRAITE] t t r a ai ai t e e sp
TRAVAIL [TRAVAIL] t t r a v a a ai ai il i sp
TU [TU] t t u sp
it why I don't understand.
this error means : Less the LETTER T or less a word with the letter T ?
another question for construc phoneme I use the command :
espeak -v fr -q -X "T"
Translate 't'
22 t (_ []
1 t [t]
5 _) t [t]
26 _) t (_ [te]
t'e
but I m not sure I take the good phomene
for this example what is the phomene ?
me I have take the collum 2 so t t t t .
it's correct ?
Or what other tools I can use to construct phomene (in command line if possible)
thanks a lot
Ok, after some more investigation the problem is different. The thing is that label names are case sensitive. So as it given in the train.scp it tries to load labels with lowercase:
mfcc/sample1.mfc
mfcc/sample2.mfc
mfcc/sample3.mfc
mfcc/sample4.mfc
mfcc/sample5.mfc
However, your words.mlf has uppercase labels for utterances:
#!MLF!#
"*/SAMPLE1.lab"
TOUTE
CETTE
So it fallbacks to the lab files which have phonemes like t and i and not the words. That why it fails to proceed and reports there is no word t in the dictionary.
You need to make sure that all the labels are in lowercase. The easiest way would be to fix words.mlf file to contain lowercase labels:
#!MLF!#
"*/sample1.lab"
TOUTE
CETTE
For the reference check tutorial example
http://www.voxforge.org/uploads/Ni/jv/NijveUP5RI5o3gSCeoRlmg/words.mlf
hello,
i have do the modification in the file words.mlf
SAMPLE --> sample
and this step it's without error but the nexte step :
HERest -A -D -T 1 -C config -I aligned.mlf -t 250.0 150.0 3000.0 -S train.scp -H hmm7/macros -H hmm7/hmmdefs -M hmm8 monophones1
HERest -A -D -T 1 -C config -I aligned.mlf -t 250.0 150.0 3000.0 -S train.scp -H hmm7/macros -H hmm7/hmmdefs -M hmm8 monophones1
HTK Configuration Parameters[10]
Module/Tool Parameter Value
# NUMCEPS 12
# CEPLIFTER 22
# NUMCHANS 26
# PREEMCOEF 0.970000
# USEHAMMING TRUE
# WINDOWSIZE 250000.000000
# SAVEWITHCRC TRUE
# SAVECOMPRESSED TRUE
# TARGETRATE 100000.000000
# TARGETKIND MFCC_0_D_N_Z
HERest ML Updating: Transitions Means Variances
System is SHARED
99 Logical/99 Physical Models Loaded, VecSize=25
2 MMF input files
Pruning-On[250.0 150.0 3000.0]
Processing Data: sample1.mfc; Label sample1.lab
Utterance prob per frame = -5.294599e+01
Processing Data: sample2.mfc; Label sample2.lab
ERROR [+7332] CreateInsts: Cannot have Tee models at start or end of transcription
thanks
> ERROR [+7332] CreateInsts: Cannot have Tee models at start or end of transcription
Tee model is the model named "sp". You have "sp" in the end. The reason for this is that in your lexicon SENT-START and SENT-END are not transcribed as sil but as something different. Fix your lexicon and run the training from initialization step
SENT-END [] sil SENT-START [] sil