Thing is, the fact that AI could do it for you basically means that it has been done before and AI trained on it.
What you actually wanted to say is: "I spent some time rebuilding someone else's work because I wasn't able to find it on Google."
I know this is overdramatized, but also not totally wrong.
Matt Campbell reshared this.
André Polykanine
in reply to Toni Barth • • •Toni Barth
in reply to André Polykanine • • •wizzwizz4
in reply to Toni Barth • • •@menelion The part I find the most distressing is when someone has a really good idea, and then they try to get AI to do it, the AI claims to have implemented their idea (but hasn't: it can't), and then they think there's a problem with the idea.
These systems are the polar opposite of creativity.
Toni Barth
in reply to wizzwizz4 • • •Erion
in reply to Toni Barth • • •You are partially correct, but this is an oversimplification of how an AI model, for example a LLM works. It can, and does, use data that it got during its training phase, but that's not the entire story, otherwise it'd be called a database that regurgitates what it was trained on. On top of the trained data there's zero-shot learning, for example to figure out a dialect of a language it hasn't been trained on, based on statistical probability of weights from the trained data, as well as combine existing patterns into new patterns, thus coming up with new things, which are arguably part of creativity.
What it can't do though is, and this is very likely what you mean, it can't go outside it's pre-trained patterns. For example, if you have a model that was trained on dragons and another model that was trained on motorcycles, if you combine those two models, they can come up with a story where a dragon rides a motorcycle, even though that story has not been part of its training data. What it can't do is come up with a new programming language because that specific pattern does not exist. So the other part of creativity where you'd think outside the box is a no go. But a lot of people's boxes are different and they are very likely not as vast as what the models were trained on, and that's how an AI model can be inspiring.
This is why a lot of composers feel that AI is basically going to take over eventually, because they will have such a vast amount of patterns that a director, trailer library editor, or other content creator will be satisfied with the AI's results. The model's box will be larger than any human's.
reshared this
André Polykanine and Winter blue tardis reshared this.
wizzwizz4
in reply to Erion • • •@erion @menelion Most of the generative capabilities of an LLM come from linear algebra (interpolation), and statistical grammar compression. We can bound the capabilities of a model by considering everything that can be achieved using these tools: I've never seen this approach overestimate what a model is capable of.
"Zero-shot learning" only works as far as the input can be sensibly embedded in the parameter space. Many things, such as most mathematics, can't be viewed this way.
Erion
in reply to wizzwizz4 • • •It never will, because modern LLMs are far more capable.
They rely on non-linear activation functions (like ReLU, GELU, etc.) after the linear transformations. If the network were purely linear, it could only learn linear relationships, regardless of its depth. The non-linearities are what allow the network to learn complex, non-linear mappings and interactions between inputs.
There's also scaling, arguably an internal world model, being context-aware (which is definitely not something linear). If anything, this would underestimate a model.
reshared this
André Polykanine and Winter blue tardis reshared this.
wizzwizz4
in reply to Erion • • •Why do LLMs freak out over the seahorse emoji?
vgel.meErion
in reply to wizzwizz4 • • •reshared this
André Polykanine and Winter blue tardis reshared this.
wizzwizz4
in reply to Erion • • •@erion A model having "a hundred or more layers" doesn't make anything I said less true. "Chain-of-thought reasoning" isn't reasoning. I absolutely can dismiss claims of "a model's intelligence", because I have not once lost this argument when concrete evidence has come into play, and people have been saying for years that I should.
Can you give me an example of something you think a "smaller model that came out in the last year" can do, that you think I would predict it can't?
Erion
in reply to wizzwizz4 • • •Take a Gemma model for example, say 2b. Any linear prediction will not be able to predict how emergent capabilities will behave when faced with a complex task, simply because they don't work linearly, especially after crossing a scale threshold, rather than improving with each additional parameter.
You can see this as Gemma 2b can outperform larger models, which for you should not be possible. The model's intelligence is not a simple, additive function of its vector size, but a complex product of the billions of highly non-linear interactions created by the full architecture, making a purely linear prediction inadequate.
wizzwizz4
in reply to Erion • • •Erion
in reply to wizzwizz4 • • •Humor me. I'd love to know for example how you can predict architectural efficiency and training alignment, which are both necessary to measure a model's intelligence accurately.
I am sure the quality of the linear space for example can be a good indication, but it's not enough.
Winter blue tardis
in reply to Erion • • •Winter blue tardis
in reply to Winter blue tardis • • •wizzwizz4
in reply to Winter blue tardis • • •Sensitive content
@tardis I asked for a concrete example of something a model could do, that according to @erion's understanding of my understanding it shouldn't be able to. In response, I got vague talk about "emergent capabilities", "architectural efficiency", and "training alignment", and equivocation about the word "non-linear".
I can't argue about empty signifiers.
Winter blue tardis
in reply to wizzwizz4 • • •Sensitive content
Zach Bennoui
in reply to Toni Barth • • •It's not totally wrong, but I feel like maybe it's a slight oversimplification. LLMs don't just outright copy the training data, that's why it's called generative AI. That doesn't mean they will never reproduce anything in the training set, but they are very good at synthesizing multiple concepts from that data and turning them into something that technically didn't exist before.
If you look at something like Suno, which is using an LLM architecture under the hood, you're able to upload audio and have the model try to "cover" that material. If I upload myself playing a chord progression/melody that I made up, the model is able to use it's vast amount of training data to reproduce that chord progression/melody in whatever style.