VoxForge
Hi,
I have been working on my own version of grammar and dictionary and following along with the tutorial. Have reached all the way to end without any problems but now I am getting hit with an issue that after hours of looking has really confounded me.
I hope someone out here can help.
See the output below: Please ignore the triphones missing error, Where I am stuck is in the last 5-6 lines of the output below...
Please Help
------------------------------------------------------------------------
$ julian -input mic -C julian.jconf
include config: julian.jconf
###### check configurations
###### initialize input device
Audio cycle buffer length: 768000 bytes
AD-in thread created
###### build up system
Reading in HMM definition...(ascii)...limit check passed
defined HMMs: 162
logical names: 636 in HMMList
base phones: 44 used in logical
done
Making pseudo bi/mono-phone for IW-triphone...409 added as logical...done
reading [sample2.dfa] and [sample2.dict]...
Reading in dictionary...
line 20: triphone "*-z+ow" or biphone "z+ow" not found
line 20: triphone "z-ow+l" not found
> 2 [XOLO] z ow l ow
line 92: triphone "z-ih+r" not found
line 92: triphone "ih-r+ow" not found
> 16 [ZERO] z ih r ow
line 99: triphone "eh-v+ax" not found
line 99: triphone "v-ax+n" not found
> 17 [SEVEN] s eh v ax n
line 130: triphone "*-z+eh" or biphone "z+eh" not found
line 130: triphone "z-eh+d" not found
> 18 [Z] z eh d
////// Missing phones:
*-z+eh or biphone z+eh
*-z+ow or biphone z+ow
eh-v+ax
ih-r+ow
v-ax+n
z-eh+d
z-ih+r
z-ow+l
//////////////////////
errors are ignored
130 words...done
Reading in DFA grammar...done
- Gram #0: read
[grammars]
# 0: [active ] 130 words, 19 categories, 87 nodes (new) "sample2"
gram "sample2" registered
- Grammar update check
Mapping dict item <-> DFA terminal (category)...Error: wrong format: terminal number is not digit in dict! []
Error: no such terminal symbol "" in DFA grammar:
129: " @0.000000 [
]"
Error in dict <-> DFA mapping
Terminated
----------------------------------------------------------------------------------------------------------------------
>Mapping dict item <-> DFA terminal (category)...Error: wrong format: >terminal number is not digit in dict! []
>Error: no such terminal symbol "" in DFA grammar:
>129: " @0.000000 [
>]"
>Error in dict <-> DFA mapping
>Terminated
were there any errors when you compiled you grammar with mkdfa.pl?
No there were no errors during compile. Never the less, I decided to start fresh and redo all steps and this particular problem disappeared. Instead I am getting a new error "skippable sp should not repeat".
See below the output from running Julian....
croma@Sams-PC ~/voxforge/newproj
$ julian -debug -input mic -C julian.jconf >julian.out
include config: julian.jconf
###### check configurations
###### initialize input device
Audio cycle buffer length: 768000 bytes
AD-in thread created
###### build up system
Reading in HMM definition...(ascii)...limit check passed
defined HMMs: 162
logical names: 638 in HMMList
base phones: 44 used in logical
done
Making pseudo bi/mono-phone for IW-triphone...410 added as logical...done
Reading in dictionary...
line 20: triphone "*-z+ow" or biphone "z+ow" not found
line 20: triphone "z-ow+l" not found
> 2 [XOLO] z ow l ow
line 92: triphone "z-ih+r" not found
line 92: triphone "ih-r+ow" not found
> 16 [ZERO] z ih r ow
line 99: triphone "eh-v+ax" not found
line 99: triphone "v-ax+n" not found
> 17 [SEVEN] s eh v ax n
errors are ignored
130 words...done
Reading in DFA grammar...done
Mapping dict item <-> DFA terminal (category)...done
Error: skippable sp should not repeat
Terminated
Additional Information,
I did a bit of digging around Julian code here is what the calling sequence is,
- main()
- main_recognition_loop()
- final_fusion()
- multigram_exec()
- extract_cpair()
So while doing, extraction category-pair constraint from DFA grammar and newly set the category pair matrix of the give DFA, it comes across sp model in a back to back situation.
So my guess is there is some issue with my grammar and some how consecutive words are creating 2 sp's in a row (i.e. the former word has a sp at the end and the later word has a sp in the begining).
- IS this true?
I see 2 possibilities,
1: There is problem with my grammar, but why couldn't this be unearthed earlier during grammar compilation? so this seems unlikely.
2: It possible that even though I am ignoring the missing phones (see output earlier in the thread) by setting ignore flag, that there is just no way to skip/ignore those errors as they are causing undesirable side effect?
I am pretty much stuck at this point so any help greatly appreciated....
>Error: skippable sp should not repeat
if might be that you have short pauses ("sp") in your pronuniciation dictionary and you are telling Julian to insert a short pause in the Julian config (-iwsp)...
Have you tried with the newest version of Julius (v4.2)? the config should be the same (or very similar) and you run it as julius rather than julian.
Not using -iwsp to run Julian.
How do I find out if I have sp in my pronounciation dictionary?
Tried Julian 4.2 with the same jconf file as in the tutorial. Good news is that it doesn't complain about the skippable space. The bad news is that its returning more words than I uttered and most of the times the words its returning is not what I uttered.
Now I am completely lost...what should I try next?
Thanks in advance..
So here is the latest update...
I decided to elliminate the errors that I was ignoring with -forcedict flag, these were about missing phones.
Then I could get julian working with julius 3.5.2
It turns out that the pronounciation of a couple of words in my .voca file was different than the vox forge lexicon. This probably could have been checked way back when to avoid so much trouble later.
Now the problem is, the recognition is very very bad (almost random). I have a decent training corpus with at least 5-6 instances of each words in my vocabulory via the prompts file.
The worst thing is that when I speak a single word its returning a handful of words. No clue as to how to improve this...
Any ideas?
I have bothe individual letters such A B C D... as well as words such as Apple in my vocabulary. When I say Apple it seems to return a string of single letters (not necessarily accurate) as opposed to the entire word. Given my utterence does not have any pause/space between uttered phonemes I don't understand why its matching to individual letters. Seems strange.
Probably best to follow the VoxForge tutorial, get that working, and then deviate from it one thing at a time...
HTK/Julius can have hard to debug errors that originate from an earlier step (that worked...) that affects things at a later step.