VoxForge
I've been looking around the forums for answers, but I still didn't find all the answers so I have to post this question again.
I created a small dictionary like this:
0 [<s>] sil
1 [</s>] sil
2 [OPEN] ow p ax n
3 [NOTEPAD] n ow t p ae dx
4 [WINAMP] w ih n ae m p
5 [PAINT] p ey n t
6 [CLOSE] k l ow z
7 [WINDOW] w ih n d ow
8 [DEBUG] d iy b ah g
When using it with the "quickstart" downloaded from the hompage I already got a few errors. After searching for answers for a while I decided to download the newest AM build. The amount of errors was reduced, but a few still remained:
Error: voca_load_htkdict: line 9: triphone "d-iy+b" not found
Error: voca_load_htkdict: line 9: triphone "iy-b+ah" not found
Error: voca_load_htkdict: the line content was: 8 [DEBUG] d iy b ah g
Error: voca_load_htkdict: begin missing phones
Error: voca_load_htkdict: d-iy+b
Error: voca_load_htkdict: iy-b+ah
Error: voca_load_htkdict: end missing phones
I want to know if these errors are unavoidable because of the limited data or is there a mechanism that can automatically solve this?
Would it be a good idea to test the model with a big list of transcriptions, like the CMU lexicon?
--- (Edited on 2/15/2010 4:22 pm [GMT-0600] by danijel) ---
>Error: voca_load_htkdict: the line content was: 8 [DEBUG] d iy b ah g
Not sure where you got your pronunciation for this, but the VoxForge Lexicon (used in the creation of the VoxForge acoustic models in the Nightly Builds) uses this:
DEBUG [DEBUG] d ix b ah g
--- (Edited on 2/17/2010 10:36 pm [GMT-0500] by kmaclean) ---
CMU dictionary has this:
DEBUG D IY0 B AH1 G
But that still doesn't change the fact that, as I understand, I can only model the words that contain triphones that were used in VoxForge prompts?
One solution was to tie missing triphones to others that sound "similar", but that's obviously not very elegant.
The problem is that I want to make an app that allows ordinary users to add pronunciations for words that they want recognized. This makes the whole process much more difficult...
--- (Edited on 2/18/2010 3:20 am [GMT-0600] by Visitor) ---
Try pocketsphinx instead :)
--- (Edited on 2/19/2010 03:45 [GMT+0300] by nsh) ---
>The problem is that I want to make an app that allows ordinary users to
>add pronunciations for words that they want recognized.
We currently use the CMU unstressed dictionary from the xvoice site, which I assumed was the same as CMU v0.6. Which, as you are finding out, is not the case...
If you describe what you are doing, we can get a better idea whether it it might be worthwhile for us to move the current VoxForge phone list to the CMU v0.7 pronunciation dictionary - which is where we should be at regardless (Ticket #468).
Ken
--- (Edited on 2/21/2010 9:50 pm [GMT-0500] by kmaclean) ---
I'm currently developing this:
http://code.google.com/p/voice-remote-android/
I want to allow users to build simple grammars using my GUI and attach them to certain "actions" in the operating system. Nothing too ambitious, but it has to be simple to use.
I intend to add the dictionary that you mentioned to the program, so whenever the user chooses a word thats already been transcribed, it will automatically use the correct pronunciation.
But sooner or later they will want to recognize words not in the dictionary (eg. application names) and that makes the whole situation kinda tricky.
As I understand, it's not possible to have all the triphones in the model, so I need to make some easy way of dealing with the missing ones.
I usually work with hybrid ANN/HMM systems. It's so much easier there...
--- (Edited on 2/22/2010 1:48 pm [GMT-0600] by danijel) ---
>But sooner or later they will want to recognize words not in the
>dictionary (eg. application names) and that makes the whole situation
>kinda tricky.
I think you need to look at grapheme-to-phoneme conversion (g2p).
Sequitur G2P (GPL) can be trained with any flavour of pronunciation dictionary.
Ken
--- (Edited on 2/25/2010 7:08 pm [GMT-0500] by kmaclean) ---
I'm not quite clear on what the fix is for these errors:
STAT: reading [ofsample.dfa] and [ofsample.dict]...
Error: voca_load_htkdict: line 6: triphone "ah-l+er" not found
Error: voca_load_htkdict: line 6: triphone "l-er+*" or biphone "l-er" not found
Error: voca_load_htkdict: the line content was: 3 [COLOR] k ah l er
Error: voca_load_htkdict: begin missing phones
Error: voca_load_htkdict: ah-l+er
Error: voca_load_htkdict: l-er+* or biphone l-er
Error: voca_load_htkdict: end missing phones
Error: init_voca: error in reading ofsample.dict: 1
Does this mean that somewhere along the line that triphone wasn't generated and included in the hmmdefs for?
--- (Edited on 5/27/2010 6:51 pm [GMT-0500] by Visitor) ---
>Error: voca_load_htkdict: the line content was: 3 [COLOR] k ah l er
It means that your training corpus does not contain the word "[COLOR] k ah l er".
--- (Edited on 6/9/2010 9:45 pm [GMT-0400] by kmaclean) ---