VoxForge
I began a few days ago with some julius-testing. But I have a problem. I don't understand the syntax of the sample.dfa-file. My configuration can recognize just two words now (but that actually works great, 100% accuracy;)). Is there a tutorial or a readme about this?
And another question: in the dict file from the ubuntu package are many words, but some of theme have 'sp' phonemes. But Julius says they are not supported (the latest nightly build.) By removing the phoneme the words are recognized well.
--- (Edited on 23-11-2008 4:55 pm [GMT+0100] by dano) ---
Hi Dano,
>But I have a problem. I don't understand the syntax of the sample.dfa-file.
Sorry, I don't either...
>Is there a tutorial or a readme about this?
You might need to look at the code... why do you need to understand this?
>in the dict file from the ubuntu package are many words, but some of
>theme have 'sp' phonemes.
It's on my todo list: ticket #294.
Ken
--- (Edited on 11/25/2008 12:30 pm [GMT-0500] by kmaclean) ---
Because when I add now more words with the same number, they aren't recognized. Don't know why that is.
I can create a python script to get it to the right syntax.
This file is without sp phones, but it must be like sample.dict, right?
http://spraakherkenning.googlepages.com/dict2
So
2 [SEARCH] s er ch
?
--- (Edited on 26-11-2008 3:25 pm [GMT+0100] by dano) ---
So I created the script
fileone = open('dict', 'r')
filetwo = open('dict2', 'w')
x = ""
y = 0
for line in fileone:
line = line.replace(" sp","")
first = 16
second = line.index("]")-1
if second > 17:
wordone = line[first:second]
else:
wordone = line[first]
y = second
while line[y] == " ":
y = y + 1
word = line[y:]
x = x + "2 " + wordone + word
filetwo.write(x)
fileone.close()
filetwo.close()
here is the file
http://spraakherkenning.googlepages.com/dict3
--- (Edited on 26-11-2008 3:48 pm [GMT+0100] by dano) ---
You can say it when there needs more to be changed :)
--- (Edited on 26-11-2008 3:51 pm [GMT+0100] by dano) ---
No way, I did get the problem, I forgot to add the actual line in the dict file.
--- (Edited on 26-11-2008 3:54 pm [GMT+0100] by dano) ---
Hi Dano,
>Because when I add now more words with the same number
I'm a little confused here.... are you adding words directly to your ".dict" file? If so, then I don't think this will work.
Julius assumes that you are compiling from a ".grammar" and ".voca" file using the mkdfa.pl script, which then generates your ".dfa", ".term" and ".dict" files. If you have new words you want to recognize, you need to add them to your ".voca" file and then recompile with the mkdfa.pl script.
The ".dfa" (whose internal format I don't really understand...) is generated based on the contents of your ".grammar" and ".voca" files at compile time - i.e. it assumes that the only words it needs to recognize are the ones in the ".voca" file when it was compiled.
The dfa file is a "deterministic finite automaton"... Google it for some more information.
Hope that clears things up.
Ken
--- (Edited on 11/26/2008 11:07 am [GMT-0500] by kmaclean) ---
When I add words to sample.dict and sample.voca, it works.
Sample.voca
% NS_B
<s> sil
% NS_E
</s> sil
% WORD
SEARCH s er ch
OPEN ow p ax n
MUSIC m y uw z ix k
AFRAID ax f r ey d
IMAGE ix m ae jh
Sample.dict
0 [<s>] sil
1 [</s>] sil
3 [SEARCH] s er ch
3 [OPEN] ow p ax n
3 [MUSIC] m y uw z ix k
3 [AFRAID] ax f r ey d
3 [IMAGE] ix m ae jh
----
EDIT: ah, I understand I compiled it manually ;)
--- (Edited on 26-11-2008 7:09 pm [GMT+0100] by dano) ---