TLUG Mailing List

Mailing List Archive

tlug.jp Mailing List tlug archive tlug Mailing List Archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [tlug] Open source license (wikipedia)

Date: Thu, 17 May 2018 17:49:13 +0100

From: Darren Cook <darren@example.com>

Subject: Re: [tlug] Open source license (wikipedia)

References: <01967dcf-dc9e-0f08-b0d1-7c844db58684@dcook.org> <23293.29039.66965.697994@turnbull.sk.tsukuba.ac.jp>

User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.7.0
>  > My real question, of course, is can I train a machine learning
>  > model on that text data, and release it under a more liberal
>  > license? Assuming the model is effectively a one-way hash, and
>  > cannot reproduce the original data.
> 
> It really depends on exactly what the model does.

I was lucky enough to be at an NLP conference last week where I asked
some people this same question, and got confident replies that what I
want to do is fine. Again, people were saying the the impossibility of
reconstructing the original is the key.

>  > This is your litmus test. Can you reliably reconstruct the original
>  > text? If so, it is a derivative work.  If not, then it isn't.
> 
> That's in the ballpark, but I'm pretty sure that's not the litmus
> test.  The test is the reverse, ie, more like "if you know the
> original content, can you recognize something that probably has copied
> the expression of it?"

The models I have in mind pass that test too.

Word embeddings [1] that use multiword expressions or n-grams might be a
more interesting grey area when "n" is high enough (because the text for
each embedding is stored).  (But I'll hazard a guess that n-grams up to
at least 4 or 5 is going to be okay.) ...oh, just realized, 1-way
hashing of the text will still allow the embeddings to work, and then it
passes your other test too.

Darren

[1]: https://en.wikipedia.org/wiki/Word_embedding
Follow-Ups:

Re: [tlug] Open source license (wikipedia)
From: Stephen J. Turnbull

References:

[tlug] Open source license (wikipedia)
From: Darren Cook

[tlug] Open source license (wikipedia)
From: Stephen J. Turnbull

Prev by Date: [tlug] Open source license (wikipedia)

Next by Date: Re: [tlug] Open source license (wikipedia)

Previous by thread: [tlug] Open source license (wikipedia)

Next by thread: Re: [tlug] Open source license (wikipedia)

Index(es):

Date

Thread

Home | Main Index | Thread Index

Home Page Mailing List Linux and Japan TLUG Members Links