General Discussion

Nested
NATO alphabet and voice recognition
User: colbec
Date: 2/7/2009 8:40 am
Views: 6561
Rating: 13

Just wondering about the applicability of the NATO alphabet in voice recognition. Originally intended to help with clear communication over radio, I'm wondering if the same principles apply with voice recognition, or perhaps there are other alphabets which fit with VR more effectively.

I have tried using the NATO set but find if I start replacing some words with my own choices the accuracy rate goes up. One word which constantly gives me problems with accuracy is the word 'ONE' in the set of digits from (zero,...,nine) (Julius constantly hears 'NINE'). If I replace 'one' with 'ace' my recognition accuracy immediately goes up.

Starting from first principles, what would be your ideal VR alphabet for the 26 chars in English?

--- (Edited on 2/7/2009 8:40 am [GMT-0600] by colbec) ---

Re: NATO alphabet and voice recognition
User: nsh
Date: 2/7/2009 6:57 pm
Views: 135
Rating: 14

> Starting from first principles, what would be your ideal VR alphabet for the 26 chars in English?

The NATO one? It was designed to improve the accuracy of recognition the way you described.

> I have tried using the NATO set but find if I start replacing some words with my own choices the accuracy rate goes up.

Well, it's just an issue of specific acoustic model I think. Could you provide the recordings you are trying to recognize? On digits accuracy must be very high, around 98%, it's unlikely that it depends on specific word.

--- (Edited on 2/7/2009 6:57 pm [GMT-0600] by nsh) ---

Re: NATO alphabet and voice recognition
User: colbec
Date: 2/8/2009 7:29 am
Views: 94
Rating: 16

nsh: Unfortunately lack of bandwidth (I am on dialup here, not by choice I can assure you) prevents me from offering a complete set of wav files. If your intention is to listen to the quality of the recordings to judge their suitability for use in this context I can send perhaps a sample of two or three of the files if that would assist. I am using a Logitech USB headset under Linux Alsa, so far very reliable but perhaps not as noise-cancelling as it might be. My recording environment is however quite quiet.

I do agree that lack of depth in the model may be an issue. Re-recording my prompts sometimes produces different accuracy results. I am working on this.

However I am still not persuaded that an alphabet designed for use while artillery is going off or over a crackly radio is necessarily ideal for use in a voice recognition context. One of the principles that is different is that in the one case there is human to human communication, in the other it is human to machine.

--- (Edited on 2/8/2009 7:29 am [GMT-0600] by colbec) ---

Re: NATO alphabet and voice recognition
User: nsh
Date: 2/8/2009 8:34 am
Views: 80
Rating: 15

Hm, are you building your own model? It's not quite clear if you use voxforge's one. If you are using voxforge model, it's enough to have few recordings that aren't recognized correctly to check the issue.

--- (Edited on 2/8/2009 8:34 am [GMT-0600] by nsh) ---

Re: NATO alphabet and voice recognition
User: colbec
Date: 2/8/2009 8:43 am
Views: 106
Rating: 11

Yes I am using my own model. For each possible combination of words I generate two prompts (two separate recordings) plus recordings of a number of other non-used words to exercise the triphones.

--- (Edited on 2/8/2009 8:43 am [GMT-0600] by colbec) ---

Re: NATO alphabet and voice recognition
User: nsh
Date: 2/8/2009 9:13 am
Views: 116
Rating: 14

> For each possible combination of words I generate two prompts (two separate recordings) plus recordings of a number of other non-used words to exercise the triphones.

Hm, it's not quite clear if you built it properly. How big is the vocabulary? What number of states do you use? How many states per word? What is the number of triphones? What is the total size of the database? What is the error rate on a test set?

 

I think it's not easy to discuss particular words while we dont know anything about the model.

--- (Edited on 2/8/2009 9:13 am [GMT-0600] by nsh) ---

Re: NATO alphabet and voice recognition
User: colbec
Date: 2/8/2009 10:18 am
Views: 2937
Rating: 14

Vocabulary is about a hundred words. Each word guaranteed exercised at least twice in prompts and each phone used at least 4 times. Triphones I have no data on. Success rate is quite high, over 95%.

I agree, the model is probably not large enough. I have tried to adapt the latest Voxforge model and have reached the stage where I am attempting the 'forced realignment' in step 4, but evidently I am using a word 'ORDINAL' that the HVite process does not like. The word is in the obelisk_lexicon and runs fine and is recognized in my own model, but perhaps it needs to be in the VF set I am trying to adapt as well and is not covered.

--- (Edited on 2/8/2009 10:18 am [GMT-0600] by colbec) ---

PreviousNext