General Discussion

Flat
Clarification on Julius Language Model
User: Ravishanker
Date: 8/27/2010 11:26 am
Views: 5916
Rating: 1

Hi


Julius man page says -

2-gram and reverse 3-gram  language  models  are  used. The Standard  ARPA  format  is  supported.  In addition, a binary format N-gram is also supported  for  efficiency.   The  tool mkbingram can convert binary N-gram from the ARPA language models.

So I dont understand now if any n-gram can be used or only specifically 2 gram and reverse 3-gram.Since there are 3-gram language models available on CMU Sphinx page which can now be used for julius.

 

Also please give any pointers on where a language model for julius can be found. Thanks a ton

--- (Edited on 8/27/2010 11:26 am [GMT-0500] by Ravishanker) ---

Re: Clarification on Julius Language Model
User: Ravishanker
Date: 8/27/2010 12:19 pm
Views: 79
Rating: 2

Also, any pointers on text corpus for building language models is highly appreciated. Thanks :)

--- (Edited on 8/27/2010 12:19 pm [GMT-0500] by Ravishanker) ---

Re: Clarification on Julius Language Model
User: kmaclean
Date: 8/29/2010 1:11 pm
Views: 93
Rating: 2

>Also please give any pointers on where a language model

>for julius can be found

See Keith vertanen's: English Gigaword language model training recipe

I don't think you need reverse 3-grams with the newest version on Julius.

Ken

 

 

--- (Edited on 8/29/2010 1:11 pm [GMT-0500] by Visitor) ---

Re: Clarification on Julius Language Model
User: Ravishanker
Date: 8/31/2010 1:59 pm
Views: 127
Rating: 2

Thanks ken :) Is there a corpus available for developing language models. For Keith's model is there a way to get the source text corpus. Any open source text corpus available for building language model?  Thanks :)

--- (Edited on 8/31/2010 1:59 pm [GMT-0500] by Ravishanker) ---

Re: Clarification on Julius Language Model
User: kmaclean
Date: 9/11/2010 10:39 pm
Views: 2216
Rating: 2

>For Keith's model is there a way to get the source text corpus.

LDC's Gigaword text corpus

>Any open source text corpus available for building language model?

As listed on the VoxForge dev page:

DIY corpus using Search Engines (like Google)

--- (Edited on 9/11/2010 11:39 pm [GMT-0400] by kmaclean) ---

PreviousNext