Speech Recognition Engines

Nested
Help, Julius doesn't recognize anything and give warning
User: eld1e6o
Date: 1/30/2012 8:28 pm
Views: 8108
Rating: 7

Hello!

I'm trying to test Julius and Julius-voxforge, and when I try to start the recognition, julius runs but I don't receive any recognition, when it starts I got the next messages, warnings and errors:

### read waveform input
Stat: capture audio at 16000Hz
Stat: adin_alsa: latency set to 32 msec (chunk = 512 bytes)
Error: adin_alsa: unable to get pcm info from card control
Warning: adin_alsa: skip output of detailed audio device info
STAT: AD-in thread created

The program runs, but while I'm speaking, and I got these warnings:

Warning: strip: sample 287-302 has zero value, stripped
Warning: strip: sample 32-47 has zero value, stripped
Warning: strip: sample 251-266 has zero value, stripped
Warning: strip: sample 497-512 has zero value, stripped
Warning: strip: sample 563-579 has zero value, stripped
Warning: strip: sample 53-68 has zero value, stripped
Warning: strip: sample 196-212 has zero value, stripped
Warning: strip: sample 341-356 has zero value, stripped
Warning: strip: sample 606-621 has zero value, stripped

and after a time, I got this warning:

WARNING: adin_thread_process: too long input (> 320000 samples), segmented now
Warning: input buffer overflow: some input may be dropped, so disgard the input

and the program continues in an infinite loop.


Also, I run the program with the command -record and if I open the file with VLC, i can hear the sound fine.

 

Does anybody have idea what can I do to run it?

Can anybody help me, please?

Thanks!

Diego

 

Output:

********************************************************************

Output:STAT: include config: julian.jconf
STAT: jconf successfully finalized
STAT: *** loading AM00 _default
Stat: init_phmm: Reading in HMM definition
Stat: rdhmmdef: ascii format HMM definition
Stat: rdhmmdef: limit check passed
Stat: check_hmm_restriction: an HMM with several arcs from initial state found: "sp"
Stat: rdhmmdef: this HMM requires multipath handling at decoding
Stat: init_phmm: defined HMMs:  8002
Stat: init_phmm: loading ascii hmmlist
Stat: init_phmm: logical names:  9406 in HMMList
Stat: init_phmm: base phones:    44 used in logical
Stat: init_phmm: finished reading HMM definitions
STAT: making pseudo bi/mono-phone for IW-triphone
Stat: hmm_lookup: 1085 pseudo phones are added to logical HMM list
STAT: *** AM00 _default loaded
STAT: *** loading LM00 _default
STAT: reading [sample.dfa] and [sample.dict]...
Stat: init_voca: read 25 words
STAT: done
STAT: Gram #0 sample registered
STAT: Gram #0 sample: new grammar loaded, now mash it up for recognition
STAT: Gram #0 sample: extracting category-pair constraint for the 1st pass
STAT: Gram #0 sample: installed
STAT: Gram #0 sample: turn on active
STAT: grammar update completed
STAT: *** LM00 _default loaded
STAT: ------
STAT: All models are ready, go for final fusion
STAT: [1] create MFCC extraction instance(s)
STAT: *** create MFCC calculation modules from AM
STAT: AM 0 _default: create a new module MFCC01
STAT: 1 MFCC modules created
STAT: [2] create recognition processing instance(s) with AM and LM
STAT: composing recognizer instance SR00 _default (AM00 _default, LM00 _default)
STAT: Building HMM lexicon tree
STAT: lexicon size: 313 nodes
STAT: coordination check passed
STAT: multi-gram: beam width set to 200 (guess) by lexicon change
STAT: wchmm (re)build completed
STAT: SR00 _default composed
STAT: [3] initialize for acoustic HMM calculation
Stat: outprob_init: state-level mixture PDFs, use calc_mix()
Stat: addlog: generating addlog table (size = 1953 kB)
Stat: addlog: addlog table generated
STAT: [4] prepare MFCC storage(s)
STAT: [5] prepare for real-time decoding
STAT: All init successfully done

Input speech data will be stored to = ./grabaciones/
STAT: ###### initialize input device
----------------------- System Information begin ---------------------
JuliusLib rev.4.2.1 (fast)

Engine specification:
 -  Base setup   : fast
 -  Supported LM : DFA, N-gram, Word
 -  Extension    :
 -  Compiled by  : gcc -g -O2 -fPIC -fPIC

------------------------------------------------------------
Configuration of Modules

 Number of defined modules: AM=1, LM=1, SR=1

 Acoustic Model (with input parameter spec.):
 - AM00 "_default"
    hmmfilename=/usr/share/julius-voxforge/acoustic/hmmdefs
    hmmmapfilename=/usr/share/julius-voxforge/acoustic/tiedlist

 Language Model:
 - LM00 "_default"
    grammar #1:
        dfa  = sample.dfa
        dict = sample.dict

 Recognizer:
 - SR00 "_default" (AM00, LM00)

------------------------------------------------------------
Speech Analysis Module(s)

[MFCC01]  for [AM00 _default]

 Acoustic analysis condition:
           parameter = MFCC_0_D_N_Z (25 dim. from 12 cepstrum + c0, abs energy supressed with CMN)
    sample frequency = 16000 Hz
       sample period =  625  (1 = 100ns)
         window size =  400 samples (25.0 ms)
         frame shift =  160 samples (10.0 ms)
        pre-emphasis = 0.97
        # filterbank = 24
       cepst. lifter = 22
          raw energy = False
    energy normalize = False
        delta window = 2 frames (20.0 ms) around
         hi freq cut = OFF
         lo freq cut = OFF
     zero mean frame = OFF
           use power = OFF
                 CVN = OFF
                VTLN = OFF
    spectral subtraction = off
  cepstral normalization = real-time MAP-CMN
     base setup from = Julius defaults

 MAP-CMN:
      initial cep. data   = none
      beginning data weight = 100.00
    beginning data update = yes, from last inputs at each input

------------------------------------------------------------
Acoustic Model(s)

[AM00 "_default"]

 HMM Info:
    8002 models, 5950 states, 5950 mpdfs, 5950 Gaussians are defined
          model type = context dependency handling ON
      training parameter = MFCC_N_D_Z_0
       vector length = 25
    number of stream = 1
         stream info = [0-24]
    cov. matrix type = DIAGC
       duration type = NULLD
    max mixture size = 1 Gaussians
     max length of model = 5 states
     logical base phones = 44
       model skip trans. = exist, require multi-path handling
      skippable models = sp (1 model(s))

 AM Parameters:
        Gaussian pruning = safe  (-gprune)
  top N mixtures to calc = 2 / 0  (-tmix)
    short pause HMM name = "sp" specified, "sp" applied (physical)  (-sp)
  cross-word CD on pass1 = handle by approx. (use max. prob. of same LC)
   sp transition penalty = -70.0

------------------------------------------------------------
Language Model(s)

[LM00 "_default"] type=grammar

 DFA grammar info:
      9 nodes, 19 arcs, 11 terminal(category) symbols
      category-pair matrix: 104 bytes (1216 bytes allocated)

 Vocabulary Info:
        vocabulary size  = 25 words, 85 models
        average word len = 3.4 models, 10.2 states
       maximum state num = 24 nodes per word
       transparent words = not exist
       words under class = not exist

 Parameters:
   found sp category IDs =

------------------------------------------------------------
Recognizer(s)

[SR00 "_default"]  AM00 "_default"  +  LM00 "_default"

 Lexicon tree:
     total node num =    313
      root node num =     23
      leaf node num =     25

    (-penalty1) IW penalty1 = +5.0
    (-penalty2) IW penalty2 = +20.0
    (-cmalpha)CM alpha coef = 0.050000

     inter-word short pause = on (append "sp" for each word tail)
      sp transition penalty = -70.0
 Search parameters:
        multi-path handling = yes, multi-path mode enabled
    (-b) trellis beam width = 200 (-1 or not specified - guessed)
    (-bs)score pruning thres= disabled
    (-n)search candidate num= 1
    (-s)  search stack size = 500
    (-m)    search overflow = after 2000 hypothesis poped
            2nd pass method = searching sentence, generating N-best
    (-b2)  pass2 beam width = 200
    (-lookuprange)lookup range= 5  (tm-5 <= t <tm+5)
    (-sb)2nd scan beamthres = 200.0 (in logscore)
    (-n)        search till = 1 candidates found
    (-output)    and output = 1 candidates out of above
     IWCD handling:
       1st pass: approximation (use max. prob. of same LC)
       2nd pass: loose (apply when hypo. is popped and scanned)
     all possible words will be expanded in 2nd pass
     build_wchmm2() used
     lcdset limited by word-pair constraint
    short pause segmentation = off
    fall back on search fail = off, returns search failure

------------------------------------------------------------
Decoding algorithm:

    1st pass input processing = real time, on-the-fly
    1st pass method = 1-best approx. generating indexed trellis
    output word confidence measure based on search-time scores

------------------------------------------------------------
FrontEnd:

 Input stream:
                 input type = waveform
               input source = microphone
        device API          = default
              sampling freq. = 16000 Hz
             threaded A/D-in = supported, on
       zero frames stripping = on
             silence cutting = on
                 level thres = 2000 / 32767
             zerocross thres = 60 / sec.
                 head margin = 300 msec.
                 tail margin = 400 msec.
                  chunk size = 1000 samples
        long-term DC removal = off
          reject short input = off

----------------------- System Information end -----------------------

    *************************************************************
    * NOTICE: The first input may not be recognized, since      *
    *         no initial CMN parameter is available on startup. *
    * for MFCC01*
    *************************************************************

------
### read waveform input
Stat: capture audio at 16000Hz
Stat: adin_alsa: latency set to 32 msec (chunk = 512 bytes)
Error: adin_alsa: unable to get pcm info from card control
Warning: adin_alsa: skip output of detailed audio device info
STAT: AD-in thread created
Warning: strip: sample 106-122 has zero value, stripped
Warning: strip: sample 228-244 has zero value, stripped
Warning: strip: sample 351-366 has zero value, stripped
Warning: strip: sample 281-321 has zero value, stripped
Warning: strip: sample 396-411 has zero value, stripped
Warning: strip: sample 532-547 has zero value, stripped
Warning: strip: sample 26-44 has zero value, stripped
Warning: strip: sample 377-396 has zero value, stripped
Warning: strip: sample 415-431 has zero value, stripped
Warning: strip: sample 505-523 has zero value, stripped
Warning: strip: sample 606-623 has zero value, stripped
Warning: strip: sample 251-270 has zero value, stripped
Warning: strip: sample 51-66 has zero value, stripped
Warning: strip: sample 477-492 has zero value, stripped
Warning: strip: sample 619-634 has zero value, stripped
Warning: strip: sample 166-189 has zero value, stripped
Warning: strip: sample 298-318 has zero value, stripped
Warning: strip: sample 414-432 has zero value, stripped
Warning: strip: sample 530-546 has zero value, stripped
Warning: strip: sample 8-23 has zero value, stripped
Warning: strip: sample 604-620 has zero value, stripped
Warning: strip: sample 38-53 has zero value, stripped
Warning: strip: sample 67-83 has zero value, stripped
Warning: strip: sample 154-171 has zero value, stripped
Warning: strip: sample 183-200 has zero value, stripped
Warning: strip: sample 270-286 has zero value, stripped
Warning: strip: sample 351-369 has zero value, stripped
Warning: strip: sample 422-437 has zero value, stripped
Warning: strip: sample 468-487 has zero value, stripped
Warning: strip: sample 511-526 has zero value, stripped
Warning: strip: sample 588-605 has zero value, stripped
Warning: strip: sample 202-218 has zero value, stripped
Warning: strip: sample 259-274 has zero value, stripped
Warning: strip: sample 347-362 has zero value, stripped
Warning: strip: sample 544-559 has zero value, stripped
Warning: strip: sample 29-44 has zero value, stripped
Warning: strip: sample 170-185 has zero value, stripped
Warning: strip: sample 284-299 has zero value, stripped
Warning: strip: sample 294-309 has zero value, stripped
Warning: strip: sample 315-331 has zero value, stripped
Warning: strip: sample 531-546 has zero value, stripped
Warning: strip: sample 116-133 has zero value, stripped
Warning: strip: sample 243-260 has zero value, stripped
Warning: strip: sample 341-358 has zero value, stripped
Warning: strip: sample 413-428 has zero value, stripped
Warning: strip: sample 110-131 has zero value, stripped
Warning: strip: sample 19-34 has zero value, stripped
Warning: strip: sample 152-168 has zero value, stripped
Warning: strip: sample 286-302 has zero value, stripped
Warning: strip: sample 557-572 has zero value, stripped
Warning: strip: sample 55-70 has zero value, stripped
Warning: strip: sample 155-170 has zero value, stripped
Warning: strip: sample 420-437 has zero value, stripped
Warning: strip: sample 337-352 has zero value, stripped
Warning: strip: sample 618-635 has zero value, stripped
Warning: strip: sample 117-135 has zero value, stripped
Warning: strip: sample 254-270 has zero value, stripped
Warning: strip: sample 98-115 has zero value, stripped
Warning: strip: sample 445-460 has zero value, stripped
Warning: strip: sample 580-597 has zero value, stripped
Warning: strip: sample 70-95 has zero value, stripped
Warning: strip: sample 135-150 has zero value, stripped
Warning: strip: sample 202-235 has zero value, stripped
Warning: strip: sample 332-366 has zero value, stripped
Warning: strip: sample 463-498 has zero value, stripped
Warning: strip: sample 595-627 has zero value, stripped
Warning: strip: sample 89-120 has zero value, stripped
Warning: strip: sample 229-256 has zero value, stripped
Warning: strip: sample 590-611 has zero value, stripped
Warning: strip: sample 0-16 has zero value, stripped
Warning: strip: sample 76-92 has zero value, stripped
Warning: strip: sample 121-138 has zero value, stripped
Warning: strip: sample 202-217 has zero value, stripped
Warning: strip: sample 246-264 has zero value, stripped
Warning: strip: sample 329-345 has zero value, stripped
Warning: strip: sample 585-604 has zero value, stripped
Warning: strip: sample 622-646 has zero value, stripped
Warning: strip: sample 63-79 has zero value, stripped
Warning: strip: sample 101-118 has zero value, stripped
Warning: strip: sample 191-209 has zero value, stripped
Warning: strip: sample 230-249 has zero value, stripped
Warning: strip: sample 321-338 has zero value, stripped
Warning: strip: sample 360-378 has zero value, stripped
Warning: strip: sample 451-469 has zero value, stripped
Warning: strip: sample 490-507 has zero value, stripped
Warning: strip: sample 582-599 has zero value, stripped
Warning: strip: sample 61-79 has zero value, stripped
Warning: strip: sample 44-61 has zero value, stripped
Warning: strip: sample 173-191 has zero value, stripped
Warning: strip: sample 304-321 has zero value, stripped
Warning: strip: sample 435-453 has zero value, stripped
Warning: strip: sample 567-585 has zero value, stripped
Warning: strip: sample 138-153 has zero value, stripped
Warning: strip: sample 407-423 has zero value, stripped
Warning: strip: sample 542-559 has zero value, stripped
Warning: strip: sample 606-626 has zero value, stripped
Warning: strip: sample 41-58 has zero value, stripped
Warning: strip: sample 105-125 has zero value, stripped
WARNING: adin_thread_process: too long input (> 320000 samples), segmented now
Warning: input buffer overflow: some input may be dropped, so disgard the input

--- (Edited on 1/30/2012 8:28 pm [GMT-0600] by ) ---

Re: Help, Julius doesn't recognize anything and give warning
User: TonyR
Date: 1/31/2012 1:25 am
Views: 3871
Rating: 8

For the message "Warning: strip: sample 287-302 has zero value, stripped" see the command line/config varaible  -nostrip  "disable stripping off zero samples"

For the message "WARNING: adin_thread_process: too long input" you are giving Julius more than 20s of audio and it didn't find a sp to segment on (assuming you set this).
Julius is very difficult to set up correctly.   I suggest that you first record short, easy to recognise, sentences to files and get it going well on that before moving on.   You will need to understand all of the beam pruning parameters and most of the other config variables.
Good luck,
Tony

-- 

Dr Tony Robinson
Founder Cantab Research Ltd
http://www.cantabResearch.com

--- (Edited on 31-January-2012 7:25 am [GMT+0000] by TonyR) ---

PreviousNext