General Discussion

Flat
Getting the phonems directly instead words
User: brmassa
Date: 3/11/2010 5:13 am
Views: 5360
Rating: 2

Guys,

 

Instead using VoxForge/Julius to write down the words, is it possible to make it write the phonems and the exact time it was found in the audio file? My idea is to create a automatic lip sync to an animation im doing, so i need to find out what phonem and what time, so i can make the character to speak.

 

I need it to print (in a file) something like:

(time:phonem)
1s100ms:hh
1s230ms:aw
1s745ms:s

 

regards,

 

massa

--- (Edited on 3/11/2010 5:13 am [GMT-0600] by Visitor) ---

Re: Getting the phonems directly instead words
User: kmaclean
Date: 3/22/2010 8:15 pm
Views: 137
Rating: 3

>is it possible to make it write the phonems and the exact time it was

>found in the audio file?

I have not tried this myself, but for a small grammar, do an initial recognition pass where you determine the words that are being uttered, and then a second recognition pass using forced alignment (since the forced alignment pass needs to know which words to match to the utterance beforehand) to generate time stamps for the phonemes using palign - from the Julian manual:

      -palign
              Do viterbi alignment per phoneme (model) units from the recognition result.  The phoneme boundary frames and the average acoustic scores per frame are calculated.

This will be very slow...

--- (Edited on 3/22/2010 9:15 pm [GMT-0400] by kmaclean) ---

Re: Getting the phonems directly instead words
User: brmassa
Date: 4/12/2010 10:24 pm
Views: 239
Rating: 1

kmaclean,

thanks for your reply! So its going to be a slow operation? hmmm

note that i dont want to extract the words but only the phonemes (because the actors will read a known script). If a dictonary is needed, its basically a "fake" with one word per phoneme.

i was hoping that it would be even faster than normal speech recognition tasks, since i could skip the dictionary search phase and print directly each phoneme it finds (as well the time/frame it was found)

regards,

massa

--- (Edited on 4/12/2010 10:24 pm [GMT-0500] by Visitor) ---

Re: Getting the phonems directly instead words
User: traylerphi
Date: 5/17/2010 2:32 pm
Views: 153
Rating: 2

Sorry if this is a backwards question, but I have been searching everywhere and this thread seems to be the closest thing going...

 

Is there a way to get Julius to just output whatever phonemes it THINKS it is hearing without trying to do anything with them at all?  No words, no grammar... like if I speak some random gibberish can it give me the phonemes for it?

 

Or is there a [linux] tool other than Julius out there for this?


Any advice here would be greatly appreciated!

 

 

--- (Edited on 5/17/2010 2:32 pm [GMT-0500] by Visitor) ---

Re: Getting the phonems directly instead words
User: nsh
Date: 5/17/2010 2:45 pm
Views: 2111
Rating: 2

It's easy with CMUSphinx. Read

http://cmusphinx.sourceforge.net/wiki/phonemerecognition

--- (Edited on 5/17/2010 23:45 [GMT+0400] by nsh) ---

PreviousNext