Spend the day talking to workers council members about "AI".

snoopj@hachyderm.io

@aud @tante @glyph well they do have metrics, it's just that they're generally ad-hoc and terrible metrics

and even when they aren't, Goodhart's Law ensures that relying on them turns the exercise into farce relatively soon.

arguably that kind of farce is the entire history of the false spring: "simply scale it up" worked surprisingly well, then worked surprisingly well again, and therefore we can extrapolate that it will work forever and [financial irresponsibility] and oops now it's not working anymore oh shit oh fuck uhhhh AGENTS, we're doing agents now! Yea, that's the ticket. (and so on)

olafke@muenchen.social

@tante unfortunately and increasingly, management is most interested in whatever looks good in PowerPoint rather than their product in the real world.

snoopj@hachyderm.io

@aud @tante @glyph the addition of "vision heads" has always been the brightest example of this to me, and came sooner than the craze for "agents".

They ran out of runway to scale up on text alone but clearly adding more parameters was the thing that needed doing. Bolting an entire vision system to the side of the model sure does add a lot of parameters and keeps you on the curve of projected growth.

It doesn't really solve any problems in a way that might generate revenue, but it demos quite well and a good demo is all you've ever really needed to separate tech speculators from their cash, *particularly* the ones gambling on "AI" at any point in tech history.

aud@fire.asta.lgbt

@SnoopJ@hachyderm.io @tante@tldr.nettime.org @glyph@mastodon.social ah, I meant for the boosters who are "seeing huge gains"; it's always anecdotal and then any outside measurements of it contradict said anecdotal claims...

but also, yes, what you just said, X 1000. Even the earlier "measurements" were horseshit: "we tested this by making it generate answers {for an extremely well documented standardized test for which answers appear many times in the training corpus} and it got a grade of 45%!" which they claim is good, except that's actually failing which they never seem to mention...

emma@orbital.horse

@tante do you have a link to that framework?

Also: https://labornotes.org/2026/03/four-union-strategies-fight-ai

otherdog@mastodon.social

@glyph @tante Indeed and if you’re using something like Claude Opus on a high thinking level the enterprise plan is going to churn through cash at a remarkable rate. I still see very little serious cost analysis.

tante@tldr.nettime.org

@emma haven't formalized it fully so it's not written up anywhere. It's in my head and a few phrases on slides right now.

aud@fire.asta.lgbt

@SnoopJ@hachyderm.io @tante@tldr.nettime.org @glyph@mastodon.social and now we have "so many models to choose from", so we get to play double extra bonus round roulette! Don't just vary your prompts, change models! Infinite combinatorics! You'll never run out of parameters to fiddle with! Burn those tokens, burn em good!

glyph@mastodon.social

@otherdog @tante In my own analysis I discounted that cost to zero and still found myself estimating very heavy costs, just from the downsides of using the model itself, which AFAICT are going _totally_ unmeasured almost everywhere

glyph@mastodon.social

@otherdog @tante I guess I'll drop the link again just for reference if you haven't seen it, I didn't do so above because I feel like I post this every single day now to the point where the self-promotion feels shameful. but it remains painfully, almost nauseatingly relevant, so, here you go https://blog.glyph.im/2025/08/futzing-fraction.html

Piero Bosio Social Web Site Personale

Spend the day talking to workers council members about "AI".

Feed RSS

Gli ultimi otto messaggi ricevuti dalla Federazione

Post suggeriti

Il direttore di Analisi Difesa analizza la guerra dell'Iran

The Daily Wire has a piece out scaremongering about DIY HRT for adults.

@attualita

NewPipe aggiorna l’app e lancia l’allarme: la libertà di Android è a rischio