Audio and Prompts Discussions

Flat
Sequitur G2P
User: kmaclean
Date: 4/16/2008 8:31 pm
Views: 33219
Rating: 28
Sequitur G2P is a GPL, trainable Grapheme-to-Phoneme converter (i.e. automatically figures out the pronunciation of new words that are not in your pronunciation dictionary).  From their web site:
Sequitur G2P is a data-driven grapheme-to-phoneme converter developed at RWTH Aachen University - Department of Computer Science by Maximilian Bisani.
The method used in this software is described in
M. Bisani and H. Ney: "Joint-Sequence Models for Grapheme-to-Phoneme Conversion". Submitted for publication in Speech Communication
Anyone used this software or familiar with the approach?  How is this different (if at all) from rule-based TTS Text-to-phoneme approaches (using Festival or ESpeak)?

thanks,

Ken

--- (Edited on 4/16/2008 9:31 pm [GMT-0400] by kmaclean) ---

Re: Sequitur G2P
User: kmaclean
Date: 4/17/2008 9:58 pm
Views: 3952
Rating: 29

To get G2P to compile and run properly on 64-bit Fedora Core 6 (AMD64), I had to install the Python numarray package as follows:

# yum install python-numarray 

--- (Edited on 4/17/2008 10:58 pm [GMT-0400] by kmaclean) ---

Re: Sequitur G2P
User: nsh
Date: 4/21/2008 7:51 am
Views: 427
Rating: 40

Well, I checked it with cmudict0.6 and 1/10th for testing. The performance is indeed a bit better than for CART:

    total string errors:     4735 (36.60%)
    total symbol errors:     7269 (7.50%)

stock festvox will give you around 50%/10%. The model is much bigger though (15 Mb). But there is sense to use this package since it's simple to train and test.

About better performance, I could just cite Alan:

Actually the L2S task is a common test in CMU's Machine Learning classes 
as it such a well defined task. Results have typically been a little
better than the standard CART methods (typically if some component takes
into account previous predictions you can get improvements). The reason
these have never been implement with Festival is time required to port
their prediction engines into the system.

For English the best improvements would actually come from a better
lexicon, CMU was created by different people and quite a varied view of
its phoneme use. We have looked at research to try to automatically fix
it, but not come up with any good solution yet. Redoing it with our new
lex learner tools for boot strapping lexicons is probably the best
solution but that's pretty boring to do.
Indeed I tried unilex and it's performance a way better (more than 90% are correct). So
even it's strange the proper lexicon is much more importantn thing than a ML method.
http://www.cstr.ed.ac.uk/projects/unisyn/ 

--- (Edited on 4/21/2008 7:51 am [GMT-0500] by nsh) ---

Re: Sequitur G2P
User: kmaclean
Date: 4/21/2008 5:14 pm
Views: 451
Rating: 39

Hi nsh,

Excellent!

thanks,

Ken 

--- (Edited on 4/21/2008 6:14 pm [GMT-0400] by kmaclean) ---

Re: Sequitur G2P
User: kmaclean
Date: 4/28/2008 9:52 pm
Views: 393
Rating: 32

Here are my tests (so far):

Models trained using the following script:

#!/bin/sh
time python ./g2p.py --train train.lex --devel 5% --write-model model-1
time python ./g2p.py --model model-1 --ramp-up --train train.lex --devel 5% --write-model model-2
time python ./g2p.py --model model-2 --ramp-up --train train.lex --devel 5% --write-model model-3
time python ./g2p.py --model model-3 --ramp-up --train train.lex --devel 5% --write-model model-4

time python ./g2p.py --model model-4 --ramp-up --train train.lex --devel 5% --write-model model-5

Where train.lex contains the VoxForgeDict, but without the return word in brackets (i.e."[" & "]").  The following HTK command was used to create this file:

HDMan train.lex VoxForgeDict 

model-3 (1.4MB) test results:

g2p.py --model model-3 --test train.lex &

[...]

None
    total: 130747 strings, 832976 symbols
    successfully translated: 130747 (100.00%) strings, 832976 (100.00%) symbols
        string errors:       55576 (42.51%)
        symbol errors:       88273 (10.60%)

            insertions:      7375 (0.89%)
            deletions:       8976 (1.08%)
            substitutions:   71922 (8.63%)
    translation failed:      0 (0.00%) strings, 0 (0.00%) symbols
    total string errors:     55576 (42.51%)
    total symbol errors:     88273 (10.60%)
   
stack usage:  1135 

model-4 (6.2MB) test results:

g2p.py --model model-4 --test train.lex &

[...]

None
    total: 130747 strings, 832972 symbols
    successfully translated: 130747 (100.00%) strings, 832972 (100.00%) symbols
        string errors:       28297 (21.64%)
        symbol errors:       40176 (4.82%)

            insertions:      3304 (0.40%)
            deletions:       4708 (0.57%)
            substitutions:   32164 (3.86%)
    translation failed:      0 (0.00%) strings, 0 (0.00%) symbols
    total string errors:     28297 (21.64%)
    total symbol errors:     40176 (4.82%)
   
stack usage:  1826

model-5 (16.5MB) test results:

g2p.py --model model-5 --test train.lex &

[...]

None
    total: 130747 strings, 832972 symbols
    successfully translated: 130747 (100.00%) strings, 832972 (100.00%) symbols
        string errors:       15466 (11.83%)
        symbol errors:       20629 (2.48%)

            insertions:      2007 (0.24%)
            deletions:       2734 (0.33%)
            substitutions:   15888 (1.91%)
    translation failed:      0 (0.00%) strings, 0 (0.00%) symbols
    total string errors:     15466 (11.83%)
    total symbol errors:     20629 (2.48%)
   
stack usage:  1867

I think I can improve results even more by removing the duplicate entries in the pronunciation dictionary (i.e. entries that have more than one possible pronunciation have additional entries with an incrementing number in parenthesis following the most likely pronunciation). 

For example, in the VoxForge pronunciation dictionary, there are the following entries:

ZYUGANOV        z y uw g aa n aa v
ZYUGANOV'S      z y uw g aa n aa v z
ZYUGANOV'S(2)   z y uw g aa n aa f s
ZYUGANOV'S(3)   z uw g aa n aa v z
ZYUGANOV'S(4)   z uw g aa n aa f s
ZYUGANOV(2)     z y uw g aa n aa f
ZYUGANOV(3)     z uw g aa n aa v
ZYUGANOV(4)     z uw g aa n aa f

When running  g2p.py in test mode (i.e. "g2p.py --model model-5 --test train.lex &"), it compares the pronunciations that it creates based on the trained models with the actual pronunciations in the dictionary, and returns the following errors:

ZYUGANOV        z y uw g aa n aa v      (0 errors)
ZYUGANOV'S      z y uw g aa n aa v z    (0 errors)
ZYUGANOV'S(2)   z y uw g aa n aa f s    (0 errors)
ZYUGANOV'S(3)   z y uw g aa n aa v z    (1 errors)
ZYUGANOV'S(4)   z y uw g aa n aa v/f z/s        (3 errors)
ZYUGANOV(2)     z y uw g aa n aa f      (0 errors)
ZYUGANOV(3)     z y uw g aa n aa v      (1 errors)
ZYUGANOV(4)     z y uw g aa n aa f      (1 errors)

Removing the alternate pronouciations should improve results, therefore the cleaned-up pronunciation dictionary would only include the following entries:

ZYUGANOV        z y uw g aa n aa v
ZYUGANOV'S      z y uw g aa n aa v z

A quick Perl script and a recompilations of the models are required to test this theory.  I'll post the results when completed.

Ken 

 

--- (Edited on 4/28/2008 10:52 pm [GMT-0400] by kmaclean) ---

--- (Edited on 4/28/2008 10:55 pm [GMT-0400] by kmaclean) ---

Re: Sequitur G2P
User: nsh
Date: 4/29/2008 2:33 pm
Views: 367
Rating: 26

Great Ken. Though I'd like to stress attention on two details:

1. Since you are going to use the model to extend the dictionary, you must test it on independant part too. Usually one use a split on train and test data here. For example you can take each 10th word in a dictionary and exclude them from training. This will give you more natural results. In ML ten-fold cross validation is used to get even more correct result, but it's probably too complicated.

2.  As far as I understand voxforge dictionary was created mostly automatically. Then probably there is no sense to train on automatic part of it.

Btw, cmudict 0.7 was released some time ago, did you know? 

--- (Edited on 4/29/2008 2:33 pm [GMT-0500] by nsh) ---

Re: Sequitur G2P
User: kmaclean
Date: 4/29/2008 3:04 pm
Views: 418
Rating: 28

Hi nsh,

>Usually one use a split on train and test data here. For example you can take each 10th word in a

>dictionary and exclude them from training. This will give you more natural results.

Aaah, I thought my results were too good to be true  :)

Can I still run the Sequitor g2p model trainor against the whole VoxForge dictionary, and use that for the creation of pronunciations (since that is what I have been running on my computer for the past 18+ hours), but then for testing, split the dictionary as you described to get more reliable/accurate test results?

>As far as I understand voxforge dictionary was created mostly automatically.

I generate a draft pronunciation using Festival (I never did create a new Festival model using the VoxForge dictionary), but then validate manually.  I'm not a linguist, so it is only my best guess based on similar words.

>Btw, cmudict 0.7 was released some time ago, did you know? 

No I did not ... but when I just checked the download page, it only allowed me to download version 0.6.  However, on the CMU Pronouncing Dictionary web page, you can use a form to lookup pronunciations using version 0.7 ... is version 0.7 open source?

thanks,

Ken 

--- (Edited on 4/29/2008 4:04 pm [GMT-0400] by kmaclean) ---

Re: Sequitur G2P
User: nsh
Date: 4/29/2008 3:40 pm
Views: 655
Rating: 33

> Can I still run the Sequitor g2p model trainor against the whole VoxForge dictionary, and use that for the creation of pronunciations (since that is what I have been running on my computer for the past 18+ hours), but then for testing, split the dictionary as you described to get more reliable/accurate test results?

Yes, this way you'll get better numbers :).

> No I did not ...

They didn't provide it in uploads, but it's announced on sourceforge project page and available in svn:

http://cmusphinx.svn.sourceforge.net/viewvc/cmusphinx/trunk/cmudict/

--- (Edited on 4/29/2008 3:40 pm [GMT-0500] by nsh) ---

Re: Sequitur G2P
User: tndkrj
Date: 2/18/2009 6:41 pm
Views: 202
Rating: 9

Would anybody happen to know how to access the resulting grapheme-phoneme pair inventory(list) that is formed after creating the model?


I tried opening the model-3 files after running the sequitur trainer but it only gave me a bunch of binary codes which I coudln't interprete in anyway.


I am thinking that there must be a way to have access to this but I can't figure it out..

 

and by the way.. don't you run the trainer against the dictionary that has been split (lets say the list with 10th word taken out.) and then test it with the other remainder of the 10% dictionary?

(The otherway round to what you described up there..)

Any opinions would be helpful!! Thank you!

--- (Edited on 2/18/2009 6:41 pm [GMT-0600] by tndkrj) ---

Re: Sequitur G2P
User: nsh
Date: 2/18/2009 7:16 pm
Views: 212
Rating: 9

> Would anybody happen to know how to access the resulting grapheme-phoneme pair inventory(list) that is formed after creating the model?

Do you want to get variants of transcription for each letter? I don't think such thing exists in the model itself. What do you need it for? There are easier methods to get this mapping, by hand for example.

> don't you run the trainer against the dictionary that has been split (lets say the list with 10th word taken out.) and then test it with the other remainder of the 10% dictionary?

Are you asking me or Ken?

--- (Edited on 2/18/2009 7:16 pm [GMT-0600] by nsh) ---

PreviousNext