General Discussion

Flat
Other languages
User: Visitor
Date: 10/11/2006 2:57 am
Views: 24095
Rating: 62
How about corpora from other languages?
Re: Other languages
User: kmaclean
Date: 10/11/2006 9:01 am
Views: 563
Rating: 43

That is the plan, but I want to get the processes and scripts down cold for English before tackling other languages.  What other languages might you be interested in?

thanks,

Ken 

Re: Other languages
User: Visitor
Date: 11/6/2006 8:47 am
Views: 487
Rating: 39

italian Please!!!

Ask 4 help if u need. I'm not a programmer

[email protected] 

Re: Other languages
User: postgraduate
Date: 11/16/2006 1:35 pm
Views: 557
Rating: 32

Hi,

I'm interest in speech recognition. There are a lot of project of english language speech recognition and i want one of them adapt to lithuanian speech. I take an interest in this project, so would be exciting to consult with you.

Re: Other languages
User: Luc Delorme
Date: 8/2/2007 10:00 am
Views: 349
Rating: 32

In french !  I'm not a programmer also, but if you need any help, I can do it too, by contributing myself with my voice, and also by promoting contributions across french-speaking forums related to Open Source movement, etc.

[email protected]

 

Re: Other languages
User: Robin
Date: 8/12/2007 8:31 am
Views: 388
Rating: 39

Perhaps I have good news for all the non programmers out there. Setting up any new language actually requires quite a bit of work that can be done without any programming skills.

You can for instance figure out  whether or not  there exists a phonetic dictionary for your language. If so perhaps it's a good idea to start a new thread (called "language X") and post the link to the dictionary there.

If it doesn't exist, it needs to be made unfortunately!

What also needs to be made are prompt files containing non copyrighted short sentences such as a in the English prompt files. Preferably they should use modern spelling rules. Also it's nice if they contain all the sounds of your language. You can even records some if you want and store them yourself, ready to be submitted when VoxForge is ready for "language X" (but make sure the quality is okay, because we don't want you to waste your time).

Finally promotion is always good! If you have a blog for instance, you can write a tiny bit about VoxForge and post a link (if you know how, use interesting keywords such as "open source" "speech recognition" etc. - in your language). Your blog/forum post might bring someone else to Voxforge to help you out with your language. You never know.

Robin 

Re: Other languages
User: V
Date: 3/18/2008 5:37 pm
Views: 366
Rating: 46

Hi!

 I am trying to figure out what is available to start putting together a Hungarian corpus, but I am not sure if I understand your terminus technicus.

* phonetic dictionary: a dictionary where every word [wurd] is written like this? Does this correspond to the pronunciation dictionary used in the tutorial or to the lexicon?

* prompts: are books allowed? is it necessary to segment them as the prompts file in the tutorial?

* what about audiobooks + text? 

* licensing: what source material is allowed besides public domain? I presume GPL is fine, but what about CC (and which type of CC?), or MIT-like licences? 

I am sure that once you answer me, I'll have some more question! :)

 Cheers,

Re: Other languages
User: nsh
Date: 3/19/2008 2:31 am
Views: 306
Rating: 38

> phonetic dictionary: a dictionary where every word [wurd] is written like this? Does this correspond to the pronunciation dictionary used in the tutorial or to the lexicon?

 yes

> prompts: are books allowed? is it necessary to segment them as the prompts file in the tutorial?

 yes, it's necessary to segment them and it's the biggest problem with books

> what about audiobooks + text?

ok, but we prefere raw wav, not mp3

>licensing: what source material is allowed besides public domain? I presume GPL is fine, but what about CC (and which type of CC?), or MIT-like licences?

GPL is better, though any other free speech is also suitable to start. See the discussion on this forum.

Basically to start you can just record yourself (10 minutes) and a few your friends (5 x 10 minutes).  About dictionary, you can build it with text2pho from

  http://tkltrans.sourceforge.net/

Re: Other languages
User: kmaclean
Date: 3/19/2008 12:04 pm
Views: 3765
Rating: 37

Hi V,

one clarification: 

> Does this correspond to the pronunciation dictionary used in the tutorial or to the lexicon?

The pronunciation dictionary used in the Tutorial and How-to is based on the ISIP Switchboard corpus (contains around 27,500 words).  Whereas the  QuickStart and nightly AM builds is based on version 0.6 of the CMU Pronunciation Dictionary (contains around 130,000 words).   Unfortunately, the Switchboard and CMU pronunciation dictionaries use slightly different phoneme syntax.  This is enough to make them incompatible from a Grammar and Acoustic Model testing perspective (see ticket #52).

Ken 

Re: Other languages
User: bunte
Date: 9/28/2007 8:09 am
Views: 434
Rating: 34

Hi evryone.

I am very interested in starting a project to develop Swedish open source speech recognition. We have quite a lot of tools and corpora, but unfortunately not much time to handle it at the moment. Is there a time plan for when there will be possibilities for people to  donate their recordings etc in other languages?If I (for example) arrange a phone number etc and take care of the recordings and so on, would it then be possible to admit information and resources from these web pages for such a project? I am very interested in creating speaker recognition for Swedish as well.

Best regards

Jonas

Gothenburg University, Sweden 

Previous