"AI companies claim their tools couldn't exist without training on copyrighted material. It turns out, they could — it's just really hard. To prove it, AI researchers trained a new model that's less powerful but much more ethical. That's because the LLM's dataset uses only public domain and openly licensed material."
tl;dr: If you use public domain data (i.e. you don't steal from authors and creators) you can train a LLM just as good as what was cutting edge a couple of years ago. What makes it difficult is curating the data, but once the data has been curated once, in principle everyone can use it without having to go through the painful part.
So the whole "we have to violate copyright and steal intellectual property" is (as everybody already knew) total BS.
engadget.com/ai/it-turns-out-y…
It turns out you can train AI models without copyrighted material
It's just a pain in the ass.Will Shanklin (Engadget)
reshared this
Shane Celis
in reply to j_bertolotti • • •miki
in reply to Shane Celis • • •Stargazer
in reply to miki • • •Not sure if it is that good. Current central organizations existing for, say, music or films, are assholes, and I don't see why this one would be any different.
Moreover, I wonder how does one pay royalties to social network users (whose posts are being crawled as well). That's millions of people
miki
in reply to Stargazer • • •@stargazer @shanecelis It is the worst idea, except for all the others.
Good AI is more beneficial for humanity than bad AI, and giving authors *some* money is better than giving them no money, as we do now.
Stargazer
in reply to miki • • •Just to iterate on your idea (I still believe we can do better), we can force social networks to act as intermediaries between end users and AI bros. If the AI bros love subscription capitalism so much, let them subscribe to, say, Meta to scrape Facebook, with Meta being obligated to distribute the fee obtained between users on the platform.
...with an option to opt-out from scraping.
The world where I can sell my kidney but can't sell my personal data is kinda weird.