VoxForge
After months i am still investigating.
How should I do to use èéìíùúàáòó in the pronunciation dictionary or wherever I have to?
Better explained: I have a lexicon from wich i get words and pronunciation in utf8. This because I use the accents wovels aforementioned. any file i derive from it becamen an utf8. If i use it as a non utf-8, I loose the wovels. I read somewhere in this site I need to encode those in a ASCII sequence of letters. So I don't understan.
lets take as an example an extract of my lexicon:
CAFè [CAFè] k a f E1
CAFFè [CAFFè] k a f f E1
should i encode somehow the wovel in the word CAFè as something like E1 to get CAFE1 and CAFFE1?
so everywhere i should use these word as spelled with these coding?
I am Really lost.
thank folks.
Michele
PS. this lexicon is based on phonetic list i explained to use in a reply to this post.
There is no such problem. Phones should be plain ascii. Words could be UTF-8 for example.
> should i encode somehow the wovel in the word CAFè as something like E1 to get CAFE1 and CAFFE1?
No
So then, what do you think can be the problem in the following situation:
HDMan -A -D -T 1 -m -w wlist_prz -n monophones1_prz -i -l dlog_prz dict_prz lexicon/onemarket_lexicon
writes in the 'dlog_prz' log file
No HTK Configuration Parameters Set
Output dictionary dict_prz opened
Source dictionary lexicon/onemarket_lexicon opened
Dictionary dict_prz created - 198 words processed, 198 missing
the wlist_prz contains the list of word from prompts in uppercase (no accents words are present in there) like in the tutorial. the onemarket_lexicon containt the 500.000K festival dictionary normalized in the lexicon form fot HTK...
I don't really know what's the matter then. I thought i guess they are ordered as i ran the unix utlity sort to be sure. still the same problem. the file are in utf8 but you just said no matter about that for HTK.
any idea?
Well, you need to learn to provide the data required to reproduce the problem you are asking about. We can't guess what mistake you did on your local machine.
You can always share the files on some sharing service and give us a link.
Hi nsh,
http://www.wikifortio.com/785813/wlist_prz
and here is the
http://www.wikifortio.com/796263/onemarket_lexicon
(it's 23MB+).
if it's useful this is the original prompt
http://www.wikifortio.com/832033/prompts_prz
the command is as before and so is the result.
thanks. Oc
Your onemarket_lexicon is incorrectly sorted. Use
LANG= LC_ALL= sort onemarket_lexicon > onemarket_lexicon.sorted
to sort it properly
Now it passes that step. Great shell insight.
Thank you.
Michele
Here I go again one step forward and two backward.
Some of my phones, partially made with numerical values, are truncated with just the alphabetical part, so as an example a and a1 became both a. The dlog for HDMan tells me that the resulting dictionary is completely without any such phone.
dlog_prz
As by '4.6 Strings and Names' from the HTBook i tried to make quote any phone in the source dictionary (i.e. the lexicon) and tried to pass the -C HFMan.config option to HDMan, where HDMan.config is:
QUOTEHCAR=\"
I tried by quoting onemarket.phones (quoted). Here the wlist_prz and the onemarket.phone.quoted
the final result in dict_prz has no such phones, or better said they are truncated.
Occimanete
you just need to remove 'rs cmu' from global.ded. read htkbook about that.
thanks again,
I had read it but did not undersantd what the book meant for stress making, there was no example of that.
now it works, in fact. Resourceful as usual.
Occimanete