This is not going to be very popular but the use of data for corpus training is not stealing. It wasn't stealing when people copied music either. IP maximalism is not helpful; it wasn't then, it isn't now. The use of data for these purposes is explicitly allowed for in EU law, and probably part of fair use.

It's also nothing new. All sorts of vital accessibility tools (voice recognition, voice synthesis) or other things such as spell checking rely on corpora.

#AI #IP

#AI #IP
in reply to modulux

Sure. I don't know how the arguments are playing out in the EU, but from what I've watched here in the US, the typical argument is that the outputs -- or at least, some subset of the possible outputs -- of generative AI tools constitute derivative works of the data that they were trained on. And derivative works require permission from the copyright holder in order to commercialize them in most cases.
in reply to Preston Maness ☭

Yep, that's sensible. Certainly if they incorporate sufficient similarity to the work in the corpus. Where it gets iffy is in trying to create a purported right to determine which algorithms are allowed to run on data created by one person. I don't think this maximalism is bad just on a whim. By the same logic I should get permission from a copyright holder to change the equalisation settings on a song, for example. Or remove advertising from a website. The copyright conceit stretches in very dangerous directions when pulled on.
in reply to Alex

@yo Indeed I wouldn't, but copyright is often held by corporate entities, or where this is not directly allowed by law, the exercise of the exclusionary rights which it confers. Do I think that a musician will try to charge me for re-equalising their song at home? It sounds very unlikely. Their label, however; I wouldn't at all be surprised by something like that if it were legally permitted. It could be sold as a bonus.

Generally I think that corpus research has at least some justification on the common good. But that's so hazy that I think we can't draw this distinction successfully in law. Without continued data mining, certain things become either very difficult or impossible: updating spell checking databases, search engines, all kinds of very basic things.

@yo
in reply to Alex

@yo Agreed. I do buy CDs whenever possible, for example. I admit part of it is the desire for convenience and autonomy on my part. I prefer to rip and keep my copies of everything locally rather than relying on streaming, which requires constant payments, and which might always disappear or become unavailable. I also prefer to choose which media player I use and how I get to things, rather than having to use a specific one.
@yo