A LLM

shafik@hachyderm.io

A LLM

Also a LLM

shafik@hachyderm.io

Somewhere between given you easy answers to problems overrepresented in their training set and giving you nonsense lies the problem you are trying to solve.

How do you determine where your problem lies in this space? How do you determine where the answer lies in this space?

When we test these LLMs in spaces we are deeply familiar with and problems that are well represented in the training sets we can easily be lulled into a false sense of trust.

We can add constraints and such not but these tricks don't really assure us of that the solution is correct. It just tells us that it founds a minimum that passes your constraints. That feels like a pretty low bar unless your constraints are very very comprehensive. Even then it does not answer other critical question, such as how maintainable or fragile the code is.

It seems like no one is really trying to quantify this. Folks are going to be building real systems using these tools but we will only understand how bad is years after these things are in production.

It will be interesting to read books twenty years from now on this topic.

gwozniak@discuss.systems

@shafik It's very hard for me to shake the feeling that the current state of development is actually very poor, but looks good because we quantify it by "shipping", essentially. And these tools aren't actually that amazing, it's more that we've impressed ourselves with a very low local maxima.

ashguy@infosec.exchange

@gwozniak @shafik Naw, are you saying there's more important metrics than LoC and releases per unit time? Won't you think of the shareholders??

gwozniak@discuss.systems

@shafik For example, it seems the "killer aspect" of LLMs is code generation. It's a remarkably inefficient way to do it, but it demonstrably works (to some degree) in a way that people find helpful. (And ignoring the externalities is a mistake.)

To me, this opens up an avenue of research into code reuse techniques without the bad parts. We know that LLMs are helpful ("work" is too strong a term). But more importantly we know what we can consider failures: frameworks, DSLs, modelling languages, and so forth.

"How do you determine where your problem lies in this space, if at all?" is a very good question. And I think the maintenance over time with this kind of code generation is going to bring the same bitrot we have now.

shafik@hachyderm.io

@gwozniak

This is also my feeling but coming up with a way to demonstrate these feels hard.

All the obvious problems are in the training set, all the problems that they obviously are unable to answer feel silly. Problems that don't lie in either of these spaces seems like a tremendous amount of work to construct.

It feels like a large epistemology problem, something Wittgenstein would love to work on but I am not Wittgenstein.

gwozniak@discuss.systems

@shafik I feel as though I'm trying to prove a negative all the time when talking to people who are overcome with the rush of euphoria when using these things.

jameswidman@mastodon.social

@gwozniak @shafik
> And I think the maintenance over time with this kind of code generation is going to bring the same bitrot we have now.

Oh no no no, we won't have to wait for the bitrot; code that no one understands is pre-rotted!

wbftw@hachyderm.io

@shafik idk, I keep seeing gen AI stumble on stupidest of things, that are *just* outside of common patterns (and sounding very sure about it too).

wbftw@hachyderm.io

@shafik and then I have to explain to my colleagues why gen AI is wrong in this particular instance.

Piero Bosio Social Web Site Personale

A LLM

Feed RSS

Gli ultimi otto messaggi ricevuti dalla Federazione

Post suggeriti

ma che cazzo sto vedendo?

Categoria piuttosto composita, come si nota, ma un primo posto non si butta mai via.

Tempo fa Figlio 4 mi ha regalato per il compleanno questo portachiavi fatto da lui

Questo #caturday non hanno acceso il camino, primavera in arrivo?