Speech Recognition Engines

Nested
configure julius for LCSR
User: cchen1103
Date: 8/5/2010 3:23 pm
Views: 6422
Rating: 3

I have downoad Julius 4.1.5 and tried to use it to do dictation. The sample comes with a grammar while I am looking for using ngram language model. I create my lm using the tool mkbingram from a 3gram arpa format.

 

mkbingram -nlr mylm.arpa lmf2r3.bin

Acoustic model is download from voxforge julius nightly built.

When I run the julius for transcript my recording files

julius -C julian.jconf

>input file: myvoice.raw

I got "<s></s>", basically it does not recognize anything.

myvoice.raw is 8KHz big Eiden audio file.

 

Could anyone give me a direction where my problem is?

 

Thanks.

 

Here is my configuration file:

######################################################################
#### Files
######################################################################
##
## Grammar definition file (DFA and dictionary)
##

#### There are three ways to specify the grammar files.
#### (1) and (2) can be used multiple times.

#### (1) Specify by common prefix of .dfa and .dict files. Comma-separated
#### prefixes can be specified for multiple grammar recognition
#-gram /cdrom/testrun/sample_grammars/vfr/vfr

#### (2) Or you can give Julian a text file which contains list of grammar
#### prefixes one per line.
#-gramlist file

#### (3) Classic way to specify a grammar.
#-dfa grammar/sample.dfa
#-v grammar/sample.dict
-v /home/ubuntu/julius4/models/acoustic/dict
-d /home/ubuntu/julius4/models/language/lmf2r3.bin

#### If you want to clear previously specified grammars, use this at the
#### point.
#-nogram

##
## Acoustic HMM file
##
# support ascii hmmdefs or binary format (converted by "mkbinhmm")
# format (ascii/binary) will be automatically detected
-h models/acoustic/hmmdefs

## triphone model needs HMMList that maps logical triphone to physical ones.
-hlist models/acoustic/tiedlist

######################################################################
#### Multiple grammar recognition
######################################################################
#-multigramout        # Output results for each grammar

######################################################################
#### Language Model
######################################################################
##
## word insertion penalty
##
-penalty1 5.0        # first pass
-penalty2 20.0        # second pass

######################################################################
#### Dictionary
######################################################################
##
## do not giveup startup on error words
##
#-forcedict

######################################################################
#### Acoustic Model
######################################################################
##
## Context-dependency handling will be enabled according to the model type.
## Try below if julius wrongly detect the type of hmmdefs
##
#-no_ccd        # disable context-dependency handling
#-force_ccd        # enable context-dependency handling

##
## If julius go wrong with checking parameter type, try below.
##
#-notypecheck
#

##
## (PTM/triphone) switch computation method of IWCD on 1st pass
##
#-iwcd1 best N    # assign average of N-best likelihood of the same context
#-iwcd1 max    # assign maximum likelihood of the same context
-iwcd1 avg    # assign average likelihood of the same context (default)

######################################################################
#### Gaussian Pruning
######################################################################
## Number of mixtures to select in a mixture pdf.
## This default value is optimized for IPA99's PTM,
## with 64 Gaussians per codebook
#-tmix 2

## Select Gaussian pruning algorithm
## defulat: beam (standard setting), safe (others)
-gprune safe        # safe pruning, accurate but slow
#-gprune heuristic    # heuristic pruning
#-gprune beam        # beam pruning, fast but sensitive
#-gprune none        # no pruning

######################################################################
#### Gaussian Mixture Selection
######################################################################
#-gshmm hmmdefs        # monophone HMM for GMS
            # (OFF when not specified)
#-gsnum 24        # number of states to be selected on GMS

######################################################################
#### Search Parameters
######################################################################
#-b 400                 # beam width on 1st pass (#nodes) for monophone
#-b 800                 # beam width on 1st pass (#nodes) for triphone,PTM
-b 10000                # beam width on 1st pass (#nodes) for triphone,PTM,engine=v2.1
-b2 50                 # beam width on 2nd pass (#words)
#-sb 200.0        # score beam envelope threshold
#-s 500                 # hypotheses stack size on 2nd pass (#hypo)
#-m 2000                # hypotheses overflow threshold (#hypo)
#-lookuprange 5         # lookup range for word expansion (#frame)
#-n 1                   # num of sentences to find (#sentence)
-n 10                  #   (default for 'standard' configuration)
#-output 1              # num of found sentences to output (#sentence)
#-looktrellis        # search within only backtrellis words

######################################################################
#### Inter-word Short Pause Handling
######################################################################
##
## Specify short pause model name to be treated as special
##
-spmodel "sp"        # HMM model name

##
## For insertion of context-free short-term inter-word pauses between words
##  (multi-path version only)
##
-iwsp            # append a skippable sp model at all word ends
-iwsppenalty -70.0    # transition penalty for the appenede sp models

######################################################################
#### Speech Input Source
######################################################################
## select one (default: mfcfile)
#-input mfcfile         # MFCC file in HTK parameter file format
-input rawfile         # raw wavefile (auto-detect format)
                        # WAV(16bit) or
                        # RAW(16bit(signed short),mono,big-endian)
                        # AIFF,AU (with libsndfile extension)
            # other than 16kHz, sampling rate should be specified
            # by "-smpFreq" option
#-input mic             # direct microphone input
            # device name can be specified via env. val. "AUDIODEV"
#-input netaudio -NA host:0    # direct input from DatLink(NetAudio) host
#-input adinnet -adport portnum # via adinnet network client
#-input stdin        # from standard tty input (pipe)

#-filelist filename    # specify file list to be recognized in batch mode

#-nostrip        # switch OFF dropping of invalid input segment.
                        # (default: strip off invalid segment (0 sequence etc.)
-zmean            # enable DC offset removal (invalid for mfcfile input)

######################################################################
#### Recording
######################################################################
#-record directory    # auto-save recognized speech data into the dir

######################################################################
#### GMM-based Input Verification and Rejection
######################################################################
#-gmm gmmdefs        # specify GMM definition file in HTK format
#-gmmnum 10        # num of Gaussians to be computed per mixture
#-gmmreject "noise,laugh,cough" # list of GMM names to be rejected

######################################################################
#### Too Short Input Rejection
######################################################################
#-rejectshort 200    # reject input shorter than specified millisecond

######################################################################
#### Speech Detection
######################################################################
#-pausesegment        # turn on speech detection by level and zero-cross
-nopausesegment    # turn off speech detection by level and zero-cross
            # (default: on for mic or adinnet, off for file)
-lv 1000        # threshold of input level (0-32767)
-headmargin 500    # head margin of input segment (msec)
-tailmargin 2000    # tail margin of input segment (msec)
-zc 60            # threshold of number of zero-cross in a second

######################################################################
#### Acoustic Analysis
######################################################################
-smpFreq 8000        # sampling rate (Hz)
-smpPeriod 1250        # sampling period (ns) (= 10000000 / smpFreq)
#-fsize 400        # window size (samples)
#-fshift 160        # frame shift (samples)
#-delwin 2        # delta window (frames)
#-hifreq 4000        # cut-off hi frequency (Hz) (-1: disable)
#-lofreq 10        # cut-off low frequency (Hz) (-1: disable)
#-cmnsave filename    # save CMN param to file (update per input)
#-cmnload filename    # load initial CMN param from file on startup

######################################################################
#### Spectral Subtraction (SS)
######################################################################
#-sscalc        # do SS using head silence (file input only)
#-sscalclen 300        # length of head silence for SS (msec)
#-ssload filename       # load constant noise spectrum from file for SS
#-ssalpha 2.0        # alpha coef. for SS
#-ssfloor 0.5        # spectral floor for SS

######################################################################
#### Forced alignment
######################################################################
#-walign        # do forced alignment with result per word
#-palign        # do forced alignment with result per phoneme
#-salign        # do forced alignment with result per HMM state

######################################################################
#### Word Confidence Scoring
######################################################################
#-cmalpha 0.05        # smoothing coef. alpha

######################################################################
#### Output
######################################################################
#-separatescore        # output language and acoustic score separately
-progout        # output partial result per a time interval
-proginterval 300    # time interval for "-progout" (msec)
#-quiet            # output minimal result
#-demo            # = "-progout -quiet", suitable for dictation demo
#-debug            # output full message for debug
#-charconv from to    # output character set conversion (see manual for
            # available code set name)

######################################################################
#### Server module mode
######################################################################
#-module        # Run Julius on "Server module mode"
#-module 5530        # (when using another port number for connection)
#-outcode WLPSC        # select output message toward module (WLPSCwlps)

######################################################################
#### Misc.
######################################################################
#-help            # output help and exit
#-setting        # output engine configuration and exit
#-C jconffile        # expand other jconf file in its place

################################################################# end of file

 

 

 

--- (Edited on 8/5/2010 3:23 pm [GMT-0500] by cchen1103) ---

Re: configure julius for LCSR
User: nsh
Date: 8/8/2010 4:58 pm
Views: 174
Rating: 3

To recognize 8khz audio you need a model trained from 8khz audio. Voxforge nightly model is trained on 16khz audio and is not compatible with telephone bandwidth signal. You can try to decode 16khz first.

--- (Edited on 8/9/2010 01:58 [GMT+0400] by nsh) ---

Re: configure julius for LCSR
User: kmaclean
Date: 8/21/2010 11:24 pm
Views: 2474
Rating: 2
[   ] HTK_AcousticModel-2010-08-21_8kHz_16bit_MFCC_O_D.zip            21-Aug-2010 05:53   3.4M  

--- (Edited on 8/22/2010 12:24 am [GMT-0400] by kmaclean) ---

PreviousNext