j_bertolotti

7 months ago

j_bertolotti
7 months ago

"AI companies claim their tools couldn't exist without training on copyrighted material. It turns out, they could — it's just really hard. To prove it, AI researchers trained a new model that's less powerful but much more ethical. That's because the LLM's dataset uses only public domain and openly licensed material."

tl;dr: If you use public domain data (i.e. you don't steal from authors and creators) you can train a LLM just as good as what was cutting edge a couple of years ago. What makes it difficult is curating the data, but once the data has been curated once, in principle everyone can use it without having to go through the painful part.
So the whole "we have to violate copyright and steal intellectual property" is (as everybody already knew) total BS.

engadget.com/ai/it-turns-out-y…

It turns out you can train AI models without copyrighted material

It's just a pain in the ass.

^{Will Shanklin (Engadget)}

reshared this

in reply to j_bertolotti

Shane Celis

in reply to j_bertolotti 7 months ago

It'd be interesting if AI companies lobbied to increase the works in the public domain by decreasing copyright duration. That's something I'd actually support. Copyright is too long. And it would then be a legal, more ethical, industry instead of pack of VC-funded thieves. Strange bedfellows.

in reply to Shane Celis

miki

in reply to Shane Celis 7 months ago

@shanecelis IMO, the right approach here is to essentially copy what radio does. Give AI companies a blanket right to train on anything as long as they pay royalties to some central organization, which would then distribute them to content creators.

@Shane Celis

in reply to miki

Stargazer

in reply to miki 7 months ago

@miki @shanecelis
Not sure if it is that good. Current central organizations existing for, say, music or films, are assholes, and I don't see why this one would be any different.
Moreover, I wonder how does one pay royalties to social network users (whose posts are being crawled as well). That's millions of people

@miki @Shane Celis

in reply to Stargazer

miki

in reply to Stargazer 7 months ago

@stargazer @shanecelis It is the worst idea, except for all the others.

Good AI is more beneficial for humanity than bad AI, and giving authors *some* money is better than giving them no money, as we do now.

@Shane Celis @Stargazer

in reply to miki

Stargazer

in reply to miki 7 months ago

Just to iterate on your idea (I still believe we can do better), we can force social networks to act as intermediaries between end users and AI bros. If the AI bros love subscription capitalism so much, let them subscribe to, say, Meta to scrape Facebook, with Meta being obligated to distribute the fee obtained between users on the platform.

...with an option to opt-out from scraping.
The world where I can sell my kidney but can't sell my personal data is kinda weird.

This entry was edited (7 months ago)

⇧

j_bertolotti

j_bertolotti 7 months ago • •

It turns out you can train AI models without copyrighted material

Shane Celis

miki

Stargazer

miki

Stargazer

j_bertolotti
7 months ago