VoxForge
When I try to use http://www.keithv.com/software/giga/
I'm gettting this error:
--- (Edited on 4/20/2016 9:55 am [GMT-0500] by Visitor) ---
I think the important line here is "head sil word "<s>" not exist in voca"
Check to see that you have the lines
<s> sil
</s> sil
someplace in the file pointed to by the -v parameter. If not, insert them and try again. I don't think the positioning is important, see discussion in https://github.com/julius-speech/julius/issues/15 - I have found some variability in the output of language model generators that can lead to some confusion when loaded by Julius.
--- (Edited on 2016-04-21 5:47 am [GMT-0400] by colbec) ---
Thanks it worked. But now I'm getting this:
------
### read analyzed parameter
enter MFCC filename->
I pluged in macros file:
------
### read analyzed parameter
enter MFCC filename->macros
and I'm getting this error:
input MFCC file: macros
Warning: rdparam: header says it has 2121206332 frames (more than 10 minutes)
Warning: rdparam: it may be a little endian MFCC
Warning: rdparam: now try reading with endian conversion
Error: rdparam: failed to read 39552 bytes
Error: rdparam: failed to read 39552 bytes
--- (Edited on 4/21/2016 12:10 pm [GMT-0500] by Visitor) ---
Enter MFCC filename-> is the kind of prompt I get when I forget to state that I'm intending to use the mike as input and Julius starts making assumptions. Are you sure that macros belongs here?
--- (Edited on 2016-04-21 2:55 pm [GMT-0400] by colbec) ---
Hi John,
I have just gone to follow the training recipe you linked to but can't seem to find a way to download the LDC's Gigaword text corpus. Is there a secret to this I don't know about?
--- (Edited on 5/16/2016 5:14 am [GMT-0500] by ) ---
--- (Edited on 5/16/2016 8:32 am [GMT-0400] by kmaclean) ---
I tried that link earlier. I couldn't see a download link on the page so I created an account, logged in and now just get a nice big error message when trying to load that page again that says:
We're sorry, but something went wrong.
--- (Edited on 5/16/2016 7:45 am [GMT-0500] by ) ---
--- (Edited on 5/16/2016 7:46 am [GMT-0500] by ) ---
The LDC controls copyright on the corpus. You cannot get to it unless you buy a membership and DVD or become a student and get permission that way. In any case you are probably taking too big a bite at the problem; you don't need the entire corpus if you are content to try the already existing LMs (see the downloads on the link you already have) that have been generated by others.
To find out how to use a LM, create a simple LM from a Librivox book text with one of the freely available LM generators, or even better convert a grammar that you know works to a LM by scripting a "corpus" based on the grammar, run an LM generator over it, and then set Julius to work on the resulting LM with the audio model you have from the grammar. Once you have some experience, go back to the LMs derived from the LDC corpus if necessary.
But first, find out what a LM looks like, generate a few and toss them away just so you know what they consist of.
--- (Edited on 2016-05-16 9:10 am [GMT-0400] by colbec) ---
Thanks (once again!) for the advice. You are really helping me get an understanding of something I knew nothing about!
I downloaded a 2gram and 3gram bundle from the link above. I plugged the 2 arpa files into mkbingram using the following command:
mkbingram -nlr lm_giga_64k_vp_2gram.arpa -nlr lm_giga_64k_vp_3gram.arpa outfile.bingram
I then setup my test.jconf file to be:
## Language model file(s)
-d lm/outfile.bingram
## Word dictionary file
-v tutorial/sample.dict
## Acoustic HMM file
-h tutorial/hmm15/hmmdefs
-hlist tutorial/tiedlist
I run julius with this test.jconf but get:
STAT: include config: test.jconf
STAT: jconf successfully finalized
STAT: *** loading AM00 _default
Stat: init_phmm: Reading in HMM definition
Stat: rdhmmdef: ascii format HMM definition
Stat: rdhmmdef: limit check passed
Stat: check_hmm_restriction: an HMM with several arcs from initial state found: "sp"
Stat: rdhmmdef: this HMM requires multipath handling at decoding
Stat: rdhmmdef: no <SID> embedded
Stat: rdhmmdef: assign SID by the order of appearance
Stat: init_phmm: defined HMMs: 811
Stat: init_phmm: loading ascii hmmlist
Stat: init_phmm: logical names: 24402 in HMMList
Stat: init_phmm: base phones: 41 used in logical
Stat: init_phmm: finished reading HMM definitions
STAT: making pseudo bi/mono-phone for IW-triphone
Stat: hmm_lookup: 799 pseudo phones are added to logical HMM list
STAT: *** AM00 _default loaded
STAT: *** loading LM00 _default
Stat: init_voca: read 18 words
ERROR: m_fusion: head sil word "<s>" not exist in voca
ERROR: m_fusion: failed to initialize dictionary
ERROR: Error in loading model
--- (Edited on 5/17/2016 5:15 am [GMT-0500] by ) ---
--- (Edited on 5/17/2016 5:20 am [GMT-0500] by ) ---