French

Nested
How to best help as french speaker?
User: berteh
Date: 1/2/2013 7:04 am
Views: 13251
Rating: 15

Hello.

I recently discovered your initiative and would like to thank all contributors so far to this endeavour. I just hope you make it one day!

I think I could spend some time helping the project. I have no knowledge of acoustic modelling, am native french speaker, decent english reader (with french accent though), and got some software dev/tampering/scripting skills.

What's the best way to help?

  1. recording sentences (if yes do you have a favorite/needed list, or will any GPL/PD book do?)
  2. timing/sub-titling/splitting/re-encoding some audio stream you already have? (or from gutenberg.org, eg)
  3. any other?

-point 2 has the advantage to be doable while on public transportation in a noisy environment-

Have a nice day, and a happy new year!

Berteh.

 

Re: How to best help as french speaker?
User: nsh
Date: 1/2/2013 10:02 am
Views: 818
Rating: 20

> timing/sub-titling/splitting/re-encoding some audio stream you already have

This is a more efficient way to collect audio data. The top task is to collect more transcribed data and to make sure it can be automatically aligned with the existing tools. The automatic alignment part of CMUSphinx needs extensive testing.

 

Re: How to best help as french speaker?
User: arbae
Date: 5/2/2014 3:35 am
Views: 139
Rating: 9

>> timing/sub-titling/splitting/re-encoding some audio stream you already have

>This is a more efficient way to collect audio data.

 

While I am willing to add subtitles to audio streams, I don't know :

_if any french audio stream without noise or music will do. example : Are outdated films (that have at least 10 years) ok?

example : Are public TV news ok ?

example : Are podcasts from radio ok ?

_how to upload it and in which format (using which freeware, if needed).

Re: How to best help as french speaker?
User: nsh
Date: 5/2/2014 6:27 pm
Views: 138
Rating: 12

> _if any french audio stream without noise or music will do. example : Are outdated films (that have at least 10 years) ok?

Yes

> example : Are public TV news ok ?

Yes

> example : Are podcasts from radio ok ?

Yes

> _how to upload it and in which format (using which freeware, if needed).

Well, first of all try to collect something to upload. Once you have enough data (at least 200-300 hours) then we can discuss how to share it.
You do not need to convert the format, you can just store the data in the original format.
Re: How to best help as french speaker?
User: nsh
Date: 5/2/2014 6:38 pm
Views: 145
Rating: 15

To get you a better idea on what is needed here is the set for UK English we are using:

http://downloads.bbc.co.uk/podcasts/radio4/rla76/rla76_20110920-0930b.mp3 -O '2011, Eliza Manningham-Buller: Securing Freedom, 3.mp3'
wget http://downloads.bbc.co.uk/rmhttp/radio4/transcripts/2011_reith3.pdf -O '2011, Eliza Manningham-Buller: Securing Freedom, 3.pdf'
wget http://downloads.bbc.co.uk/podcasts/radio4/rla76/rla76_20110913-0930b.mp3 -O '2011, Eliza Manningham-Buller: Securing Freedom, 3.mp3'
wget http://downloads.bbc.co.uk/rmhttp/radio4/transcripts/2011_reith3.pdf -O '2011, Eliza Manningham-Buller: Securing Freedom, 3.pdf'
wget http://downloads.bbc.co.uk/podcasts/radio4/reith/reith_20110906-0940a.mp3 -O '2011, Eliza Manningham-Buller: Securing Freedom, 3.mp3'
wget http://downloads.bbc.co.uk/rmhttp/radio4/transcripts/2011_reith3.pdf -O '2011, Eliza Manningham-Buller: Securing Freedom, 3.pdf'
wget http://downloads.bbc.co.uk/podcasts/radio4/rla76/rla76_20110628-0915c.mp3 -O '2011, Aung San Suu Kyi: Liberty, 1.mp3'
wget http://downloads.bbc.co.uk/rmhttp/radio4/transcripts/1974_reith1.pdf -O '2011, Aung San Suu Kyi: Liberty, 1.pdf'
wget http://downloads.bbc.co.uk/podcasts/radio4/reith/reith_20100622-0940a.mp3 -O '2010, Martin Rees: Scientific Horizons, 4.mp3'
wget http://downloads.bbc.co.uk/rmhttp/radio4/transcripts/20100622_reith.pdf -O '2010, Martin Rees: Scientific Horizons, 4.pdf'
wget http://downloads.bbc.co.uk/podcasts/radio4/reith/reith_20100615-0945a.mp3 -O '2010, Martin Rees: Scientific Horizons, 3.mp3'
wget http://downloads.bbc.co.uk/rmhttp/radio4/transcripts/20100615_reith.pdf -O '2010, Martin Rees: Scientific Horizons, 3.pdf'
wget http://downloads.bbc.co.uk/podcasts/radio4/reith/reith_20100608-0940a.mp3 -O '2010, Martin Rees: Scientific Horizons, 2.mp3'
Re: How to best help as french speaker?
User: kmaclean
Date: 5/3/2014 2:32 pm
Views: 445
Rating: 12

Bonjour arbae,

> Are outdated films (that have at least 10 years) ok?

No, these cannot be used on VoxForge because of Copyright issues.

You can create your own Acoustic Models from any audio stream as Nick is doing, as long as you don't distribute the source audio.  There is an argument that an acoustic model is a derivative work of the copyrighted source material, and therefore even the AM cannot be distributed, but whether such an argument would hold up in court is unknown to me.

For this reason, any audio we collect here at VoxForge is licensed by the author (who owns the Copyright) with a GPL compatible license, so we can redistribute freely.

Your best bet would be to use French Project Gutenberg recordings for timing/sub-titling/splitting/re-encoding since they are in the public domain, and therefore have no Copyright restrictions,

thanks,

Ken

Re: How to best help as french speaker?
User: arbae
Date: 6/18/2014 3:13 am
Views: 143
Rating: 12

Bonjour kmaclean.

>You can create your own Acoustic Models from any audio stream as Nick is doing, as long as you don't distribute the source audio.

Please provide a link to the thread you are talking about.

>Your best bet would be to use French Project Gutenberg recordings for timing/sub-titling/splitting/re-encoding since they are in the public domain, and therefore have no Copyright restrictions,

I'm rather scientific so when I took a look at what was there, I was a little bored because titles where not classified by genre nor had keywords associated.


I was thinking about the radio channel France Info : You can listen to it from the internet and from the AM radio in France.

There are also podcasts in some categories.

example :

audio source : http://rf.proxycast.org/905031126442057728/18998-18.06.2014-ITEMA_20642896-0.mp3

audio transcript :
http://www.franceinfo.fr/emission/nouveau-monde/2013-2014/comment-facebook-dresse-notre-portrait-psy-06-18-2014-06-50

2 problems ,though, I have found : I haven't asked them if it's allowed and their transcript is 90% exact often, that is not 100%.

What do you think about that ?

Re: How to best help as french speaker?
User: kmaclean
Date: 6/18/2014 6:27 am
Views: 129
Rating: 10

>Please provide a link to the thread you are talking about.

Copyright law of France.

Basically, you can create your own acoustic models from any speech source you want, but unfortunately, if the source audio is protected by Copyright, we cannot host it on VoxForge.

>There are also podcasts in some categories.

If the works are in the public domain, then we can host it on VoxForge.  If they are protected by Copyright, you will need to get a license from the author which is compatible with the GPL.

Re: How to best help as french speaker?
User: nsh
Date: 6/18/2014 6:54 am
Views: 405
Rating: 13

> There are also podcasts in some categories. http://rf.proxycast.org/905031126442057728/18998-18.06.2014-ITEMA_20642896-0.mp3

This seems to be a good source for the training. Is it possible to get 200 hours of audio like this? We could train a model then.

 

Re: How to best help as french speaker?
User: arbae
Date: 7/31/2014 1:56 am
Views: 423
Rating: 11

>as Nick is doing

I would like a link to the thread where Nick describes how to "create your own Acoustic Models" please.

>Is it possible to get 200 hours of audio like this?

With an average of 5mins per article, you would need 2400 links.

In 2009 and 2010, everything was recorded. Nowadays, only chronicles ("chroniques" in French)are podcasted. If you wait for newest audios, I believe in about 4 monthes you would get enough audio.

Furthermore, the rss flows are broken : they show the title of the chronicle with its date but they do not link to it. But they still link to mp3 of those pages. And the rss flows show only about 15 mp3. This means that the remaining mp3s must be retrieved by going to the page with the transcript then obtain the mp3 podcast.


Is it possible to make an audio model without hosting the audio source files on this site ? I could also ask them their license by mail.

PreviousNext