VoxForge
Thanks for your explanation.
I got one question in my mind after your explanation about it.
My transcript file to obtain Acoustic Model has "komsularini" word. Do I need to parse it by space in my transcript file as:
<s> komsu larini </s>
to use it in my language model in this way "komsu +larini" ?
I mean that if I parse the word in my language model then should I also parse it in my transcript file for acoustic ?
--- (Edited on 8/29/2009 7:50 am [GMT-0500] by ercani) ---
what do you mean "For example it will be useful to get the word error rate." ?
while I train the acoustic model as
<s> komsu larini </s> "with space" normally, there is no space in writing of this word
it will be more efficient to recognize the parsed word:
"komsu +larini" in the language model. Do you mean that is ?
--- (Edited on 8/29/2009 10:39 am [GMT-0500] by ercani) ---
With subword language model the output of the engine will be also in subword units:
<s> komsu +larini </s>
to use word error rate tools you need to compare it with a reference prompts:
<s> komsu +larini </s>
It will not work if word will be joined. The same is true for the dictionary which should have subwords to let trainer detect them. Subword dictionary could be used both for decoding and training then.
--- (Edited on 8/29/2009 11:41 am [GMT-0500] by nsh) ---
thanks for your reply
If I parse the word to obtain acoustic model, then I am afraid I will have spurious words when recognizing sentences.
I can keep the sub units in the vocabulary with other words, but it may bad effects on recognizing wholeword ?
--- (Edited on 8/29/2009 11:54 am [GMT-0500] by ercani) ---
Hi,
I got error while I was building feature file. It is related with phonems as follows:
Phase 7: TRANSCRIPT - Checking that all the phones in the transcript are in the phonelist, and all phones in the phonelist appear at least once
WARNING: This phone () occurs in the phonelist (/home/kapil/Work/ercan/myam/etc/myam.phone), but not in any word in the transcription (/home/kapil/Work/ercan/myam/etc/myam_train.transcription)
WARNING: This phone (SIL) occurs in the phonelist (/home/kapil/Work/ercan/myam/etc/myam.phone), but not in any word in the transcription (/home/kapil/Work/ercan/myam/etc/myam_train.transcription)
WARNING: This phone (a) occurs in the phonelist (/home/kapil/Work/ercan/myam/etc/myam.phone), but not in any word in the transcription (/home/kapil/Work/ercan/myam/etc/myam_train.transcription)
WARNING: This phone (aa) occurs in the phonelist (/home/kapil/Work/ercan/myam/etc/myam.phone), but not in any word in the transcription (/home/kapil/Work/ercan/myam/etc/myam_train.transcription)
WARNING: This phone (b) occurs in the phonelist (/home/kapil/Work/ercan/myam/etc/myam.phone), but not in any word in the transcription (/home/kapil/Work/ercan/myam/etc/myam_train.transcription)
WARNING: This phone (c) occurs in the phonelist (/home/kapil/Work/ercan/myam/etc/myam.phone), but not in any word in the transcription (/home/kapil/Work/ercan/myam/etc/myam_train.transcription)
WARNING: This phone (ch) occurs in the phonelist (/home/kapil/Work/ercan/myam/etc/myam.phone), but not in any word in the transcription (/home/kapil/Work/ercan/myam/etc/myam_train.transcription)
WARNING: This phone (d) occurs in the phonelist (/home/kapil/Work/ercan/myam/etc/myam.phone), but not in any word in the transcription (/home/kapil/Work/ercan/myam/etc/myam_train.transcription)
WARNING: This phone (e) occurs in the phonelist (/home/kapil/Work/ercan/myam/etc/myam.phone), but not in any word in the transcription (/home/kapil/Work/ercan/myam/etc/myam_train.transcription)
WARNING: This phone (ea) occurs in the phonelist (/home/kapil/Work/ercan/myam/etc/myam.phone), but not in any word in the transcription (/home/kapil/Work/ercan/myam/etc/myam_train.transcription)
WARNING: This phone (f) occurs in the phonelist (/home/kapil/Work/ercan/myam/etc/myam.phone), but not in any word in the transcription (/home/kapil/Work/ercan/myam/etc/myam_train.transcription)
WARNING: This phone (g) occurs in the phonelist (/home/kapil/Work/ercan/myam/etc/myam.phone), but not in any word in the transcription (/home/kapil/Work/ercan/myam/etc/myam_train.transcription)
WARNING: This phone (gh) occurs in the phonelist (/home/kapil/Work/ercan/myam/etc/myam.phone), but not in any word in the transcription (/home/kapil/Work/ercan/myam/etc/myam_train.transcription)
WARNING: This phone (h) occurs in the phonelist (/home/kapil/Work/ercan/myam/etc/myam.phone), but not in any word in the transcription (/home/kapil/Work/ercan/myam/etc/myam_train.transcription)
WARNING: This phone (i) occurs in the phonelist (/home/kapil/Work/ercan/myam/etc/myam.phone), but not in any word in the transcription (/home/kapil/Work/ercan/myam/etc/myam_train.transcription)
WARNING: This phone (ii) occurs in the phonelist (/home/kapil/Work/ercan/myam/etc/myam.phone), but not in any word in the transcription (/home/kapil/Work/ercan/myam/etc/myam_train.transcription)
WARNING: This phone (iy) occurs in the phonelist (/home/kapil/Work/ercan/myam/etc/myam.phone), but not in any word in the transcription (/home/kapil/Work/ercan/myam/etc/myam_train.transcription)
WARNING: This phone (j) occurs in the phonelist (/home/kapil/Work/ercan/myam/etc/myam.phone), but not in any word in the transcription (/home/kapil/Work/ercan/myam/etc/myam_train.transcription)
WARNING: This phone (k) occurs in the phonelist (/home/kapil/Work/ercan/myam/etc/myam.phone), but not in any word in the transcription (/home/kapil/Work/ercan/myam/etc/myam_train.transcription)
WARNING: This phone (l) occurs in the phonelist (/home/kapil/Work/ercan/myam/etc/myam.phone), but not in any word in the transcription (/home/kapil/Work/ercan/myam/etc/myam_train.transcription)
WARNING: This phone (m) occurs in the phonelist (/home/kapil/Work/ercan/myam/etc/myam.phone), but not in any word in the transcription (/home/kapil/Work/ercan/myam/etc/myam_train.transcription)
WARNING: This phone (n) occurs in the phonelist (/home/kapil/Work/ercan/myam/etc/myam.phone), but not in any word in the transcription (/home/kapil/Work/ercan/myam/etc/myam_train.transcription)
WARNING: This phone (o) occurs in the phonelist (/home/kapil/Work/ercan/myam/etc/myam.phone), but not in any word in the transcription (/home/kapil/Work/ercan/myam/etc/myam_train.transcription)
WARNING: This phone (oe) occurs in the phonelist (/home/kapil/Work/ercan/myam/etc/myam.phone), but not in any word in the transcription (/home/kapil/Work/ercan/myam/etc/myam_train.transcription)
WARNING: This phone (p) occurs in the phonelist (/home/kapil/Work/ercan/myam/etc/myam.phone), but not in any word in the transcription (/home/kapil/Work/ercan/myam/etc/myam_train.transcription)
WARNING: This phone (r) occurs in the phonelist (/home/kapil/Work/ercan/myam/etc/myam.phone), but not in any word in the transcription (/home/kapil/Work/ercan/myam/etc/myam_train.transcription)
WARNING: This phone (rh) occurs in the phonelist (/home/kapil/Work/ercan/myam/etc/myam.phone), but not in any word in the transcription (/home/kapil/Work/ercan/myam/etc/myam_train.transcription)
WARNING: This phone (s) occurs in the phonelist (/home/kapil/Work/ercan/myam/etc/myam.phone), but not in any word in the transcription (/home/kapil/Work/ercan/myam/etc/myam_train.transcription)
WARNING: This phone (sh) occurs in the phonelist (/home/kapil/Work/ercan/myam/etc/myam.phone), but not in any word in the transcription (/home/kapil/Work/ercan/myam/etc/myam_train.transcription)
WARNING: This phone (t) occurs in the phonelist (/home/kapil/Work/ercan/myam/etc/myam.phone), but not in any word in the transcription (/home/kapil/Work/ercan/myam/etc/myam_train.transcription)
WARNING: This phone (u) occurs in the phonelist (/home/kapil/Work/ercan/myam/etc/myam.phone), but not in any word in the transcription (/home/kapil/Work/ercan/myam/etc/myam_train.transcription)
WARNING: This phone (ue) occurs in the phonelist (/home/kapil/Work/ercan/myam/etc/myam.phone), but not in any word in the transcription (/home/kapil/Work/ercan/myam/etc/myam_train.transcription)
WARNING: This phone (ug) occurs in the phonelist (/home/kapil/Work/ercan/myam/etc/myam.phone), but not in any word in the transcription (/home/kapil/Work/ercan/myam/etc/myam_train.transcription)
WARNING: This phone (uu) occurs in the phonelist (/home/kapil/Work/ercan/myam/etc/myam.phone), but not in any word in the transcription (/home/kapil/Work/ercan/myam/etc/myam_train.transcription)
WARNING: This phone (v) occurs in the phonelist (/home/kapil/Work/ercan/myam/etc/myam.phone), but not in any word in the transcription (/home/kapil/Work/ercan/myam/etc/myam_train.transcription)
WARNING: This phone (y) occurs in the phonelist (/home/kapil/Work/ercan/myam/etc/myam.phone), but not in any word in the transcription (/home/kapil/Work/ercan/myam/etc/myam_train.transcription)
WARNING: This phone (z) occurs in the phonelist (/home/kapil/Work/ercan/myam/etc/myam.phone), but not in any word in the transcription (/home/kapil/Work/ercan/myam/etc/myam_train.transcription)
WARNING: This phone (zh) occurs in the phonelist (/home/kapil/Work/ercan/myam/etc/myam.phone), but not in any word in the transcription (/home/kapil/Work/ercan/myam/etc/myam_train.transcription)
-------------
My phone list as follows:
a
aa
b
c
ch
d
e
ea
f
g
gh
h
i
ii
iy
j
k
l
m
n
o
oe
p
r
rh
s
sh
t
u
ue
ug
uu
v
y
z
zh
SIL
----------
my transcriptions is follows:
<s> üç iki dört altı yedi sekiz dokuz bir </s> (1)
<s> müzik çal ömer danıŠgök yüzünde kuÅ olsam seni görür inerdim </s> (119)
<s> müzik çal ramazan garip ses yanlızlıÄım </s> (124)
As you see there are some letters in turkish alphabet as: ç - ü - Ä - Å - ı - ö
But those letters are not available in my phonem list because it is not ascii. This may causethose errors ? If so, what do you reccomend me about usage of "ç - ü - Ä - Å - ı - ö" ?
meanwhile my dictionary file is for above transcript as follows:
müzik m ue z i k
çal ch a l
ramazan r a m a z a n
ömer oe m e r
danış d a n iy sh
görür g oe r ue r
I have uploaded those files to:
http://rapidshare.com/files/275702188/etc.rar.html
Please let me know your comments about it.
--- (Edited on 9/4/2009 4:02 pm [GMT-0500] by ercani) ---
--- (Edited on 9/4/2009 4:16 pm [GMT-0500] by ercani) ---
Since all phones from your phoneset are missing, most likely it's not related to UTF-8 chars. I suspect your transcription file has incorrect format, for example there is space after closing brace in the end.
--- (Edited on 9/4/2009 4:15 pm [GMT-0500] by nsh) ---
All your files are UTF-16 with CR+LF, CR terminators. They must be UTF-8 with only LF.
Training on Windows is like shooting yourself in a leg. Moreover, many iteresting things aren't available in Window. I urgely recommend you to install Linux.
--- (Edited on 9/4/2009 4:47 pm [GMT-0500] by nsh) ---