German

Nested
Re: retroflex nasal; dictionary acquisition project crashes
User: kmaclean
Date: 5/31/2008 12:47 pm
Views: 454
Rating: 76

Hi Ralph,

>3.  Do you plan to release the results coming from the dictionary acquisition

>project under the Pronunciation Lexicon Specification

You certainly are persistent with respect to PLS   :)

Ken 

Your goals: HTK, Sphinx, Julius. My goals: PLS, SSML
User: ralfherzog
Date: 5/31/2008 10:56 pm
Views: 333
Rating: 22
Hello Ken,

Persistent - that is true.  But I do know that you have different priorities.  XML related standards are nearly everywhere.  For example, OpenOffice.org uses the OpenDocument format which is based on the XML format.  And I am convinced that in the long-term XML related standards are the way to go.  Maybe not today, but in a few years.  At least, I don't have a better idea at the moment.  

The success of the Internet is based on standards that are coming from the World Wide Web Consortium.  The "application" Internet is a very complicated task.  Speech recognition is a very complicated task, too.  So why not follow the path of success?  The whole world is about standards.  Standards, standards, standards.  Standards are everywhere.  And the W3C provides us with standards.  So, the result is that I don't have to care about difficult applications like HTK.  I just have to care about the W3C-standards.

You did the right thing to collect speech under the GPL.  But what are our next targets?  I would like to submit much more speech samples (prompts) in the English and in the German language employing the SSML, even if there isn't any demand at the moment.  

Maybe HTK, Sphinx, Julius do have their own standards.  But where should I start? Until now, I didn't find the time to get involved into the details of those tools.  I started to read the HTK book.  It is really not easy.  I do understand the concept of XML.  But HTK etc. are really very complicated.  So why go the complicated way, if there is an easy one?  The alternative would be if someone would release screencasts that explain HTK or Sphinx.  If it would be more easy to understand HTK or Sphinx, maybe I would think differently.

I would like to see some results.  And one goal could be to develop a German pronunciation lexicon (PLS, GPL; eventually IPA).  This goal is achievable (thanks to Timo) and understandable.  I understand the concept of the GPL.  And I understand the value of XML.  But difficult tools like Sphinx, this is something for specialists like nsh.

So I would suggest that you follow your goals (HTK, Sphinx, Julius).  And I follow mine (PLS, SSML).

Greetings, Ralf
Re: Your goals: HTK, Sphinx, Julius. My goals: PLS, SSML
User: kmaclean
Date: 6/1/2008 1:07 pm
Views: 152
Rating: 17

Hi Ralph,

>And I am convinced that in the long-term XML related standards are the way

>to go.  Maybe not today, but in a few years. 

I agree. 

>I would like to submit much more speech samples (prompts) in the English

>and in the German language employing the SSML, even if there isn't any

>demand at the moment. 

Please note that SSML is only a markup language for directing what a text-to-speech engine says.  I don't think it used as a format for describing transcribed audio submitted for the creation of acoustic models. 

>I started to read the HTK book.  It is really not easy.

The HTK book is a difficult read... I have only read the first few chapters, and now only use it as a reference - I don't have the math skills to understand all the formulas and how they interact.  But if you look at HTK as a "black-box", and only focus on the minimum command set required to compile an acoustic model, then you can do quite a bit with trial and error - which essentially was my approach when I started out... :)

You might be interested in the W3C VoiceXML standard, which essentially merges subsets of the SSML, CCXML, SRGML specifications.   This doc: "Voice Browsers, Introduction" provides a good overview of how they all should work together. 

The jvoicexml project has implemented a working VoiceXML browser, which essentially provides a VoiceXML dialog manager front-end to Sphinx and Festival.  They might provide bindings to Asterisk (IP PBX).  Note that jvoicexml uses the JSAPI and JTAPI "standards" to accomplish this.  

Ken 

Re: Your goals: HTK, Sphinx, Julius. My goals: PLS, SSML
User: ralfherzog
Date: 6/1/2008 8:10 pm
Views: 113
Rating: 15
Hello Ken,

OK, maybe it is not SSML what I am looking for.  I will take a look into the Introduction and Overview of W3C Speech Interface Framework, thanks for the link.  I am looking for "a format for describing transcribed audio."  Perhaps it is VoiceXML, I am not sure at the moment, I will read about it.

When I submit to VoxForge, there is the readme-file with information about the gender, pronunciation dialect, microphone type, audio card, operating system, sampling rate, etc..  And these information that are currently part of the readme-file should become part of a VoiceXML file.  So my goal is to merge the information from the prompts-file and the readme-file into one single VoiceXML file.

Yes, HTK - of course it is possible to do some trial and error.  I did that, but I didn't succeed.  A few months ago, I tried to follow the instructions from the VoxForge tutorial about HTK.  But there were too many errors for me.  I solved one problem, and then the next problem occurred.  I stopped trying.  Maybe I will try it again.

Greetings, Ralf
Re: Your goals: HTK, Sphinx, Julius. My goals: PLS, SSML
User: kmaclean
Date: 6/1/2008 9:39 pm
Views: 5361
Rating: 20

Hi Ralph,

>I will take a look into the Introduction and Overview of W3C Speech Interface Framework

Good link, I didn't see that particular one...

>I am looking for "a format for describing transcribed audio."

I am not sure there is a any such format on the W3C site for this.  The LDC might have something.  With XML, one could be created fairly easily.  But in VoxForge's case, there would be quite a few scripting changes on the acoustic model creation backend that would required to implement such a thing.

>  Perhaps it is VoiceXML, I am not sure at the moment, I will read about it.

VoiceXML is a language to describe spoken dialogs... think of spoken interactive voice response (IVR) systems in a telephone environment (which is what VoiceXML was originally designed for).  For example, when I call my ISP, I used to use keypad sequences to get routed to the help desk.  Now I call their number, and just say "Internet technical support" on my phone, and get routed to the help desk queue. 

A VoiceXML browser "abstracts" away all the differences between the different implementation of:

  • speech recognition engines (like Sphinx, Julius, etc.),
  • text to speech engines (like Festival, Flite), and
  • telephony engines (VoIP, Asterisk being the prime open source example, or PSTN-based)
and lets you describe a call flow using a standard language (VoiceXML). 

There have been a few open source implementations of VoiceXML that implemented the text to speech and the telephony components.  But most attempts to implement the speech recognition portion failed - because it is very difficult to do.  jvoiceXML is amazing since they got the speech rec component working (though I have not tried it out myself).  I think using JSAPI was an excellent way to avoid having to work out the details of a particular speech rec or tts engine, but I am not sure of where Sun's JSAPI licensing is currently at.  

>I solved one problem, and then the next problem occurred.  I stopped

>trying.  Maybe I will try it again.

Don't give up yet, if that is what you are interested in.  It takes some effort.  A bit of understanding of a scripting language is also very helpful.  

Ken 

PreviousNext