Italian

Flat
italian phonemes
User: NadaDeNada
Date: 10/15/2009 10:21 am
Views: 13358
Rating: 13

Hi,

i looking for a tool to decompose in italian phonemes my dictionary words.

Re: italian phonemes
User: kmaclean
Date: 10/15/2009 12:40 pm
Views: 292
Rating: 14

Hi NadaDeNada,

>i looking for a tool to decompose in italian phonemes my dictionary

>words.

Some open source text-to-speech engines let you do this... like Festival or Espeak.  If they can generate text-to-speech for Italian, then there should be a command to generate phonemes.

For example, with Festival, to determine the pronunciation of a word, you need to use the "lex.lookup" command as follows:

 festival> (lex.lookup "internet")

("internet" nil (((ih n t) 1) ((er n) 0) ((eh t) 1)))

Festival will list the phonemes included in the word, but also includes numbers (these indicate "lexical stress" for a phoneme).  Ignore the parenthesis and numbers, and you have Festival's view of the phonemes that make up the word you entered.  Therefore, for the word "Internet", Festival says its phonemes are: "ih n t er n eh t".

For the Italian version of Festival, see the FESTIVAL speaks Italian! page.

Ken

Re: italian phonemes
User: NadaDeNada
Date: 10/16/2009 3:53 am
Views: 500
Rating: 15

Thank you very much! :-)

Re: italian phonemes
User: occimanete
Date: 11/17/2009 1:28 am
Views: 298
Rating: 16

look, i don't know how to load a a file in this site but i will give you that perl script to normalize the Festival dictionary.

1) first you find the festlex_IFD.tar file (countig 500.000 word in Festival format).

2) untar it and look for inside the folders to get lex.out (30MB).

3) launch the perl script(at the end of this post) as:

 

perl cleandict.pl lex.out normalized.dict


you have now the 500K names in italian with its fonetic, you should also find useful to get the phonethic table under festival/lib/italian_scm/italian_phoneset.scm

someone has already formatted to resemble a phonetic table you'll use in HTK. look around for it. or juts get mine here.

a
a1
b
d
dz
dZZ
e
e1
EE
f
g
i
i1
j
JJ
k
l
LL
m
n
nf
ng
o
o1
OO
p
r
s
SIL
SS
t
ts
tSS
u
u1
v
w
z

I thin i got it from the user "nsh". Anyway here follows also the cleandict.pl script:

#!/usr/local/bin/perl -w
#
# -- Script usato per pulire il dizionario preso da festival
# e renderlo un semplice lista di parole fonemi f OO n e1 m i
#
# TODO
# don't convert in latin1, don't know why. Anyway at the end everything
# should be put in plain ASCII.
#
use feature "switch";
use Encode;
use PerlIO::encoding;


my ($srcdic, $dstdic);
#$srcdic="lessico_italiano_500K.dic";
#$dstdic="it-500kNorm.dic";

if (@ARGV != 2) {
print "usage: $0 Festival-like.dic Normalized.dic\n\n";
exit (0);
}

($srcdic, $dstdic) = @ARGV;

#encoding ISO-8859-1 is latin-1
open(my $SRCDIC, "<:encoding(iso-8859-1)", $srcdic) or die;
open(my $DSTDIC, ">:encoding(iso-8859-1)", $dstdic) or die;

$newline = encode("latin1", "\n");


$nlinee=0;
while ($linea = <$SRCDIC>){

$nlinee++;

@lista = split(//, $linea);
$got=0;
$deep=0;

for $i (@lista){
#print "[INFO processing=\"".$i."\"| got=".$got." deep=".$deep."]\n";
$enc= encode("utf-8", $i);
given ($enc) {

when (/["]/) {
if( $got == 0){
$got = 1
}else{
$got = 0;
$tmp = encode("latin1", " ");
print $DSTDIC $tmp;
}
}

when ( /[(]/ ) {
if ($deep == 3){
$tmp = encode("latin1", " ");
print $DSTDIC $tmp;
}
$deep++;
}

when ( /[)]/ ) { $deep-- }

when (/[ 1-9a-zA-Zèéìíùúàáòó]/){
if($got || $deep > 3){
$dec = decode("latin1", $enc);
print $DSTDIC $dec;
}
}

}
}
print $DSTDIC $newline;

}

close($SRCDIC);
close($DSTDIC);

print "processed ".$nlinee." parole\n";
Re: italian phonemes
User: occimanete
Date: 11/17/2009 1:32 am
Views: 259
Rating: 12

phonetic table again, well formatted and pointing out that in this particular phone list I've already inserted the SIL phoneme, so watch out if you don't mean to use it that way.

a
a1
b
d
dz
dZZ
e
e1
EE
f
g
i
i1
j
JJ
k
l
LL
m
n
nf
ng
o
o1
OO
p
r
s
SIL
SS
t
ts
tSS
u
u1
v
w
z

Re: italian phonemes
User: kmaclean
Date: 11/17/2009 9:02 am
Views: 4906
Rating: 13

Hi occimanet,

thanks!

Ken

PreviousNext