French

Nested
Transcription dictionary
User: samuel buffet
Date: 9/15/2008 4:30 pm
Views: 21445
Rating: 21

Hi evreyone,

I have a few questions about the dictionary.


i/  Is the bigger the better ?

ii/ Can I use a GNU Free Documentation License source to create my dictionary ? (I guess Yes)

iii/ What is suposed to be a *word* in that dictionnary

i.e:  il l'a pris

Is it 3 words il + l'a + pris or 4 words il + l' + a + pris


Samuel,

Re: Transcription dictionary
User: kmaclean
Date: 9/15/2008 6:53 pm
Views: 228
Rating: 19

Hi Samuel,

>i/  Is the bigger the better ?

ye

>ii/ Can I use a GNU Free Documentation License source to create my dictionary ? (I guess Yes)

yes

>iii/ What is suposed to be a *word* in that dictionnary

>i.e:  il l'a pris

>Is it 3 words il + l'a + pris or 4 words il + l' + a + pris

In English, contractions are usually considered to be a single word.  There "l'a" is one word.  Did you check the Lium dictionary?

You are very lucky... you should be able to build on the Lium dictionary (65K words).  From the note at the bottom of their licensing page:

Unless explicitly stated otherwise, any tool downloadable from the "Ressources Sphinx" and "Speech recognition tools" sections of the "Speech Project" part of this site is distributed under the revised BSD license. In short, the tools and their source code are completely free of use, for commercial or non-commercial projects, as long as the accompanying copyright notice is included.

Ken

 

Re: Transcription dictionary
User: samuel buffet
Date: 9/16/2008 12:46 am
Views: 364
Rating: 15

Hi ken,

Yes, I knew the Lium ressource. I used their model one year ago with sphinx 4.

So I actually have :

a 65k words from Lium => BSD

a 240k words dico I've extracted by script from Wikitionary => GNU Free Documentation.

I'll post it on the Listen section when finished.

Samuel,

 

Re: Transcription dictionary
User: kmaclean
Date: 9/17/2008 2:03 pm
Views: 223
Rating: 20

Hi Samuel,

>a 240k words dico I've extracted by script from Wikitionary => GNU Free

>Documentation.

I've always been under the impression that the GNU Free Doc license and GPL were incompatible, I may be wrong... However, according to Wikipedia (this seems to be an older post):

GPL incompatible in both directions

The GNU FDL is incompatible in both directions with the GPL: that is GNU FDL material cannot be put into GPL code and GPL code cannot be put into a GNU FDL manual.[10] Because of this, code samples are often dual-licensed so that they may appear in documentation and can be incorporated into a free software program.[citation needed]

At the June 22nd and 23rd 2006 international GPLv3 conference in Barcelona, Eben Moglen hinted that a future version of the GPL could be made suitable for documentation:[11]

I can't find anything on the FSF FAQ to answer this question definitively one way or another...

>I'll post it on the Listen section when finished.

thanks!

Ken

Re: Transcription dictionary
User: samuel buffet
Date: 9/18/2008 5:45 am
Views: 208
Rating: 19

Hi Ken

 

So if GPL and GFDL are not compatible, the dictionary has to be GFDL.

If the dictionary is GFDL the acoustic model can't be GPL (as it would be considered has derived work from the dictionary).

It's sad, because it could have been a dictionary ressource for a lot of languages.

The potential of the English Wiktionary was around 192,665 English words on the last dump 4 months ago.

 

Samuel,

Re: Transcription dictionary
User: dano
Date: 9/18/2008 1:48 pm
Views: 195
Rating: 18

Hi Ken / Samuel

I thought always that you can't license it to GPL, but that you can use it for, for example an acoustic model. You can also create with closed source (for example Photoshop) or open source with another license open source content. Correct me if I'm wrong :) You can also create an open source acoustic model by closed source speech corpera, and that's why you made VoxForge! I don't see the problem here :)

 

Daniël

Re: Transcription dictionary
User: kmaclean
Date: 9/19/2008 11:51 pm
Views: 1746
Rating: 16

Hi Daniël,

I have to read the GFDL license in more detail, but...

As a general rule, the GFDLv1.2 says:

You may copy and distribute a Modified Version of the Document under
the conditions of sections 2 and 3 above, provided that you release
the Modified Version under precisely this License[...]

And section 5 of the GPLv3 says:

 You may convey a work based on the Program, or the modifications to
produce it from the Program, in the form of source code under the
terms of section 4, provided that you also meet all of these conditions:
[...]

c) You must license the entire work, as a whole, under this
License
to anyone who comes into possession of a copy.

(Note that In this context, the source code of the Program refers to the pronunciation dictionary.)

Therefore, these first two passages seem to say that the GFDL and the GPL are incompatible because you can only release a modified version of the Wikitionary under the GFDL (I am assuming that taking the words and pronunciations from the Wikitionary would be considered a modification of the original document...), but the GPL also requires any combined work based on a GPL dictionary to be release under the GPL.

The GFDL contains another passage that seems to allow it to be combined with other licenses, as long as the new aggregate work does not limit the users rights:

GFDLv1.2 (section 7) says:

A compilation of the Document or its derivatives with other separate
and independent documents or works, in or on a volume of a storage or
distribution medium, is called an "aggregate" if the copyright
resulting from the compilation is not used to limit the legal rights
of the compilation's users beyond what the individual works permit.
When the Document is included in an aggregate, this License does not
apply to the other works in the aggregate which are not themselves
derivative works of the Document.

This might allow a GFDL work to be combined with a GPL work, and the combination thereof licensed under the GPL, since it could be argued that a user's right would not be limited if a GFDL doc was licensed under the GPL... this needs to be validated.

However, the GPLv3 (section 5) has a similarly worded passage that permit combined works that don't limit a user's rights, but makes an exception in the case of taking two separate works (like one under the GFDL and another under the GPL) and combining them to form a new larger work (as in the case of creating a larger pronunciation dictionary):

  A compilation of a covered work with other separate and independent
works, which are not by their nature extensions of the covered work,
and which are not combined with it such as to form a larger program,

in or on a volume of a storage or distribution medium, is called an
"aggregate" if the compilation and its resulting copyright are not
used to limit the access or legal rights of the compilation's users
beyond what the individual works permit. Inclusion of a covered work
in an aggregate does not cause this License to apply to the other
parts of the aggregate.

It's too confusing for me...I'll send a letter to the FSF and ask the about this. 

Note that the turnaround time for questions like this is around 6 to 8 weeks...

 

With respect to your specific questions:

>I thought always that you can't license it to GPL, but that

>you can use it for, for example an acoustic model.

Under Copyright law, which is where the GPL gets its legal 'force', an acoustic model would be considered a derivative work, and subject to the Copyright of the original work it was derived from.

>You can also create with closed source (for example Photoshop)

>or open source with another license open source content.

I'm not sure what you mean here...

>You can also create an open source acoustic model by closed source speech corpera,

>and that's why you made VoxForge!

The Corpora used to create the "open source acoustic models" you are referring to must have had permissive licenses that let the user create derivative works like acoustic models.  Under a strict interpretation of Copyright law, this would not be permitted unless there was a license to do so, since a derivative work has the same protections of the original work.

On the other hand, Google is somehow able to display segments of copyrighted work in their search results... This seems to be grey area (to me at least...), and the same principle might also apply in the context of using word and pronunciations from GDFL Wikitionary document to create a big GPL pronunciation dictionary.  But I don't have the money or the team of lawyers that Google has to find out... :)

Ken


 

Re: Transcription dictionary
User: dano
Date: 9/20/2008 2:24 pm
Views: 445
Rating: 19

Hi Ken,


Thank you for your great answer :)

>I'm not sure what you mean here...

I mean that if you use Photoshop or Microsoft Word to create things, you can publish that what you made under a open source license, even in their own 'closed' format.

>>I thought always that you can't license it to GPL, but that
>>you can use
it for, for example an acoustic mode
l.

>Under Copyright law, which is where the GPL gets its legal

>'force', an acoustic model would be considered a derivative

> work, and subject to the Copyright of the original work it was

> derived from.

Ok, the pronunciation dictionairy can't be licensed for sure under the GPL. I hope that that the acoustic can :)

Btw I have updated my little application:

http://spraakherkenning.googlepages.com/Charles0.2-prototype.zip


Daniël

Re: Transcription dictionary
User: kmaclean
Date: 9/29/2008 7:06 pm
Views: 231
Rating: 17

Copy of the email I sent to the FSF:

Dear Sir/Madam,

I am the Web Admin for the VoxForge (www.voxforge.org) project.  We are collecting speech from people from all around the world in order to create a speech corpus that can be used to generate "acoustic models" for Free and Open Source speech recognition engines.  All submitted speech is licensed under the GPL.

Part of the process in the creation of an acoustic model requires that the pronunciations of all the words in the corpus be known beforehand (there are rule-based, and statistical, approaches to determining the pronunciation of an unknown word, but manually created pronunciations are more accurate...).  We have a pronunciation dictionary that was originally licensed under BSD, and our modifications have been licensed under the GPL.

We are working on increasing the number of words in the VoxForge pronunciation dictionary, which has about 130,000+ pronunciations in it.  The English Wiktionary has around 192,665 English words, and we would like to merge these pronunciations with the VoxForge pronunciation dictionary, but we are not sure about the compatibility between the GDFL license, which it what the English Wikitionary is licensed under, and the GPL, which is used by VoxForge.

Question: can we combine the pronunciations from the English Wikitionary (licenced under GDFL) with the pronunciations from the VoxForge pronunciation dictionary (licensed under GPL), and distribute the combined work under GPL?

 

My reading of the GPL and GFDL leaves more confused than before I started...:

As a general rule, the GFDLv1.2 says:

You may copy and distribute a Modified Version of the Document under
the conditions of sections 2 and 3 above, provided that you release
the Modified Version under precisely this License[...]

And section 5 of the GPLv3 says:

 You may convey a work based on the Program, or the modifications to
produce it from the Program, in the form of source code under the
terms of section 4, provided that you also meet all of these conditions:

[...]

c) You must license the entire work, as a whole, under this
License
to anyone who comes into possession of a copy.

(This assumes that the source code of the Program refers to the pronunciation dictionary.)

Therefore, these first two passages seem to say that the GFDL and the GPL are incompatible because you can only release a modified version of the Wikitionary under the GFDL (I am assuming that taking the words and pronunciations from the Wikitionary would be considered a modification of the original document...), but the GPL also requires any combined work based on a GPL dictionary to be release under the GPL.

The GFDL contains another passage that seems to allow it to be combined with other licenses, as long as the new aggregate work does not limit the users rights:

GFDLv1.2 (section 7) says:

A compilation of the Document or its derivatives with other separate
and independent documents or works, in or on a volume of a storage or
distribution medium, is called an "aggregate" if the copyright

resulting from the compilation is not used to limit the legal rights
of the compilation's users beyond what the individual works permit.
When the Document is included in an aggregate, this License does not

apply to the other works in the aggregate which are not themselves
derivative works of the Document.

This might allow a GFDL work to be combined with a GPL work, and the combination thereof licensed under the GPL, since it could be argued that a user's right would not be limited if a GFDL doc was licensed under the GPL...

However, the GPLv3 (section 5) has a similarly worded passage that permit combined works that don't limit a user's rights, but makes an exception in the case of taking two separate works (like one under the GFDL and another under the GPL) and combining them to form a new larger work (as in the case of creating a larger pronunciation dictionary):

  A compilation of a covered work with other separate and independent
works, which are not by their nature extensions of the covered work,
and which are not combined with it such as to form a larger program,


in or on a volume of a storage or distribution medium, is called an
"aggregate" if the compilation and its resulting copyright are not
used to limit the access or legal rights of the compilation's users

beyond what the individual works permit. Inclusion of a covered work
in an aggregate does not cause this License to apply to the other
parts of the aggregate.

Any clarification on this would be greatly appreciated,

thanks,

Ken MacLean

--
Free Speech... Recognition
http://www.voxforge.org

Re: Transcription dictionary
User: kmaclean
Date: 9/29/2008 7:09 pm
Views: 208
Rating: 19

Email auto-reply from FSF:

from FSF Licensing Questions via RT <[email protected]>
reply-to [email protected]
to kmaclean   voxforge  org
date Mon, Sep 29, 2008 at 8:11 PM
subject [gnu.org #379498] AutoReply concerning licensing question: Using GFDL derived work in GPL Pronunciation Dictionary
mailed-by gnu.org


This message has been automatically generated in response to a
licensing question you sent to the Free Software Foundation, with subject:
       "Using GFDL derived work in GPL Pronunciation Dictionary".

There is no need to reply to this message right now.  Your request has
been assigned an ID of [gnu.org #379498].

Please include the string:
        [gnu.org #379498]
in the subject line of all future correspondence about this issue.  To do
so, you may reply to this message.


This address is answered primarily by volunteers, overseen by one staff
member of FSF.  We have very limited resources to answer requests.  We
urge you to first read the following material:

Licensing FAQ page:    http://www.fsf.org/licenses/gpl-faq.html
Text of the GNU GPL:   http://www.fsf.org/copyleft/gpl.html
Text of the GNU LGPL:  http://www.fsf.org/copyleft/lgpl.html
Our license list page: http://www.fsf.org/licenses/license-list.html

If one of these web pages answers your questions, likely you will not receive
a reply from us.  We sadly don't have the ability to re-answer questions
that we've answered already on our website.

When we are able to answer questions, highest priority is given to Free
Software developers in the community who are actively working on Free
Software projects and need licensing help.

We do offer consulting services for companies who are working to develop
products that incorporate Free Software so that they can do so in ways
that comply with the terms of the GPL and other Free Software licenses.
If you are interested in this service, please write a separate message to
<[email protected]>.

                       Sincerely,
                       FSF GPL Compliance Lab Office

PreviousNext