"AI companies claim their tools couldn't exist without training on copyrighted material. It turns out, they could — it's just really hard. To prove it, AI researchers trained a new model that's less powerful but much more ethical. That's because the LLM's dataset uses only public domain and openly licensed material."

tl;dr: If you use public domain data (i.e. you don't steal from authors and creators) you can train a LLM just as good as what was cutting edge a couple of years ago. What makes it difficult is curating the data, but once the data has been curated once, in principle everyone can use it without having to go through the painful part.
So the whole "we have to violate copyright and steal intellectual property" is (as everybody already knew) total BS.

engadget.com/ai/it-turns-out-y…

reshared this

in reply to miki

Just to iterate on your idea (I still believe we can do better), we can force social networks to act as intermediaries between end users and AI bros. If the AI bros love subscription capitalism so much, let them subscribe to, say, Meta to scrape Facebook, with Meta being obligated to distribute the fee obtained between users on the platform.

...with an option to opt-out from scraping.
The world where I can sell my kidney but can't sell my personal data is kinda weird.

This entry was edited (6 months ago)