VoxForge
Hello
I finished doing the samples and the julia script for julius's model's and when running julius i get the following
john@john-VirtualBox:~/voxforge/howto$ julius -input mic -C sample.jconf
STAT: include config: sample.jconf
pass1_best: <s> DIAL EIGHT
sentence1: <s> DIAL ONE </s>
pass1_best: <s> DIAL SIX
sentence1: <s> DIAL TWO </s>
pass1_best: <s> DIAL SIX
sentence1: <s> DIAL THREE </s>
pass1_best: <s> DIAL SIX
sentence1: <s> DIAL FOUR </s>
even though i might say 'cat' or 'dog' to it.
i also noticed it wouldn't let me say the full sentance.
So i'm trying to accomplish
http://www.voxforge.org/home/dev/acousticmodels/linux/create/htkjulius/how-to/run-julius
And i did not see:
----------------------- System Information end -----------------------
Notice for feature extraction (01),
*************************************************************
* Cepstral mean normalization for real-time decoding: *
* NOTICE: The first input may not be recognized, since *
* no initial mean is available on startup. *
*************************************************************
------
### read waveform input
Stat: capture audio at 16000Hz
Stat: adin_alsa: latency set to 32 msec (chunk = 1536 bytes)
Error: adin_alsa: unable to get pcm info from card control
Warning: adin_alsa: skip output of detailed audio device info
STAT: AD-in thread created
<<< please speak >>>
When the command was submitted.
Anyone know what i might be missing?
--- (Edited on 12/21/2020 12:12 pm [GMT-0600] by maglinvinn) ---
i figured out how to enable the debugging and disable quiet mode.
pass1_best: <s> DIAL SIX
pass1_best_wordseq: 0 3 5
pass1_best_phonemeseq: sil | d ay ah l | s ih k s
pass1_best_score: -3556.129395
### Recognition: 2nd pass (RL heuristic best-first)
STAT: 00 _default: 16 generated, 16 pushed, 5 nodes popped in 124
sentence1: <s> DIAL THREE </s>
wseq1: 0 3 5 1
phseq1: sil | d ay ah l | th r iy | sil
cmscore1: 1.000 1.000 1.000 1.000
score1: -7631.721191
pass1_best: <s> DIAL EIGHT
pass1_best_wordseq: 0 3 5
pass1_best_phonemeseq: sil | d ay ah l | ey t
pass1_best_score: -2927.570312
### Recognition: 2nd pass (RL heuristic best-first)
STAT: 00 _default: 16 generated, 16 pushed, 5 nodes popped in 88
sentence1: <s> DIAL EIGHT </s>
wseq1: 0 3 5 1
phseq1: sil | d ay ah l | ey t | sil
cmscore1: 1.000 1.000 0.963 1.000
score1: -7340.739258
First i said "Dial three" - it worked.
Second i said "Three" - and it spits out dial eight.
Halp. lol.
--- (Edited on 12/21/2020 12:44 pm [GMT-0600] by maglinvinn) ---
i think maybe its the grammar file... its forcing sentances of specific structure.
http://www.voxforge.org/home/dev/acousticmodels/linux/create/htkjulius/how-to/data-prep/step-1
This link causes a very limited number of valid phrases it would seem.
Logically there'd be a large number of options nessessary, but when looking at the grammar file this spits out you only have 3 or so.
This implies then that my grammar file needs notable upgrading, does anyone have tutorial links for that?
Is there a way, with julius, just to see the phoenetic output it hears regardless of what its trying to map it to?
--- (Edited on 12/21/2020 12:54 pm [GMT-0600] by maglinvinn) ---
>Is there a way, with julius, just to see the phoenetic output it
> hears regardless of what its trying to map it to?
I think Julius uses the grammar to constrain the search space it uses to determine the most likely series of phonemes it has heard.
So it does not strictly recognize each phoneme, then find the matching word from the grammar. Rather, it uses the grammar to simplify how many different combinations of phonemes it has to look at, and then returns the best set.
--- (Edited on 12/22/2020 6:50 pm [GMT-0500] by kmaclean) ---
The answers you seek are here, it's the grammar file:
http://voxforge.org/home/forums/message-boards/speech-recognition-engines/quickstart-vocabulary-limit#equ-prgsDz7w_g6amJD-Dw
--- (Edited on 1/11/2021 12:01 am [GMT-0600] by Tom J.) ---
Nice! Was looking for this also, thank you man.
--- (Edited on 11/23/2021 2:42 am [GMT-0600] by chaples199) ---