Yesterday Cory Doctorow argued that refusal to use LLMs was mere "neoliberal purity culture".

algernon@come-from.mad-scientist.club

@pluralistic @clintruin @simonzerafa @tante

Which "couple million people" suffer harm when I run a model ON MY LAPTOP?

Anyone who's hosting a website, and is getting hammered by the bots that seek content to train the models on. Those of us are the ones who continue getting hurt.

Whether you run it locally or not, makes little difference. The models were trained, and training very likely involved scraping, and that continues to be a problem to this day. Not because of ethical concerns, but technical ones: a constant 100req/sec 24/7, with over 2.5k req/sec waves may sound little in this day and age, but at around 2.5k req/sec (sustained for about a week!), my cheap VPS's two vCPUs are bogged down trying to deal with all the TLS handshakes, let alone serving anything.

That is a cost many seem to forget. It costs bandwidth, CPU, and human effort to keep things online under the crawler DDoS - which often will require cold, hard cash too, to survive.

Ask Codeberg or LWN how they fare under crawler load, and imagine someone who just wants to have their stuff online having to deal with similar abuse.

That is the suffering you enable when using any LLM model, even locally.

onepict@chaos.social

@lrhodes Yes.

There's a desperation as well for some enthusiastic folks to justify this and impose their view on the rest of us. It's what disquieted me. The defensive attitude anticipating us stating and enforcing our boundaries.

But it's our culture in Tech and I wish as a whole Tech would step back and like take a minute, rather than reacting and negging.

https://dotart.blog/cobbles/ai-and-that-guy-at-the-bar

reflex@retrogaming.social

@shiri @pluralistic @mastodonmigration @tante Also it's incredibly unclear to me how a LLM is a good use case for punctuation and grammar checking,. something regular document editors have done incredibly well since the late 90's or so. Like that's your use case? Not promoting Microsoft here but Word has been fantastic at that since at least 2003.

Seems weird to use that as the case for an energy sucking plagiarism machine.

pluralistic@mamot.fr

@bazkie @prinlu @FediThing @tante

First: checking for punctuation errors and other typos *in my own work* in a model running on *my own laptop* has nothing - not one single, solitary thing - in common with your example.

Nothing.

Literally, nothing.

But second: I literally license my work for commercial republication and it is widely republished in commercial outlets without any payment or notice to me.

pluralistic@mamot.fr

@FediThing @bazkie @prinlu @tante

No one is defending "creating knock offs of works." Why would you raise it here? Who has suggested that this is a good way to use LLMs or a good outcome from scraping?

pluralistic@mamot.fr

@FediThing @bazkie @prinlu @tante

The argument was literally, "It's not OK to check the punctuation in *your own work* if the punctuation checker was created by examining other peoples' work, because performing mathematical analysis on other peoples' work is *per se* unethical."

pluralistic@mamot.fr

@FediThing @bazkie @prinlu @tante

By this standard the OED is unethical.

bazkie@beige.party

@pluralistic but then you consented to that, right? you are in control of that.

also my example IS similar - after all, it's data scraped without consent, used to create another work. the typo-checker changes your blogpost based on my training data, in the same way my copycat blog changes 'my' works based on your training data.

sure, it's on a way different scale - deliberately, to more clearly show the principle - but it's the same thing.

csara@vmst.io

@tante I appreciate this post. I have gotten into similar discussions of purity culture around generative AI use (me being against using AI) and you articulate many of the feelings I have about it well.

pluralistic@mamot.fr

@bazkie

Should we ban the OED?

There is literally no way to study language itself without acquiring vast corpora of existing language, and no one in the history of scholarship has ever obtained permission to construct such a corpus.

pluralistic@mamot.fr

@FediThing @bazkie @prinlu @tante

Once again, you a replying to a thread that started when someone wrote that using an LLM to check the punctuation in your own work is ethically impermissible because no one should assemble corpora of other peoples' works for analytical purposes under any circumstances, ever.

bazkie@beige.party

@pluralistic @FediThing @prinlu @tante I'd say "because performing [automated, mass scale] mathematical analysis on other peoples' work [without their consent] [with the goal of augmenting one's own work] is *per se* unethical" - and in that case, it's a statement I would agree with.

bazkie@beige.party

@pluralistic @FediThing @prinlu @tante sure, but I'm responding here specifically to your statement that scraping for training isn't unethical per se.

pluralistic@mamot.fr

@bazkie @FediThing @prinlu @tante

You've literally just made the case against:

* Dictionaries
* Encyclopedias
* Bibliographies

And also the entire field of computational linguistics.

If that's your position, fine, we have nothing more to say to one another because I think that's a very, very bad position.

bazkie@beige.party

@pluralistic @FediThing @prinlu @tante you keep conveniently malforming the aspect of "mass automated non-consensual scraping with the goal of helping producing works" into "analytical purposes" and I find that in rather bad faith

cjpaloma@mstdn.social

@pluralistic @herrLorenz @tante Of course! Agreed.

The overlap ends around -when- reasons are "good" enough. Laws about how to treat other people are relatively easy.

But until enough people see rivers on fire, regulations on -doing certain things- aren't imposed, despite many people saying "hey, this isn't good" decades prior.

Not reining in/regulating until after -foreseeable- catastrophes results in all kinds of shit shows (from the MIC, to urban sprawl, to plastics, to tax laws, etc)

bazkie@beige.party

@pluralistic @FediThing @prinlu @tante I did not make that case, if you'd properly read my [additions] to the statement.

making dictionaries etc isn't automated on mass scales like feeding training data to LLMs is.

it's a very human job that involves a lot of expertise and takes a lot of time.

ursaashbear@mas.to

@pluralistic @Colman @FediThing @tante

This is...disappointing. To be fair, I'm disappointed in almost everyone in this thread for engaging in schoolyard shit throwing, but you're much higher in status and your shit sticks. Have a conversation. Figure out where these views can comingle. Find common understanding or you risk using your high status to fracture an already unstable alliance of people who want technology to operate safely and for the benefit of our shared humanity.

Do better.

zaire@fedi.absturztau.be

@tante the enshittification’s got to his head i guess

i’ll say people who go out of their way to be unethical and complain about “purity culture” when confronted about it are fucking annoying at best

no more respect for that guy

skyfaller@jawns.club

@pluralistic This seems like whataboutism. Valid criticisms can come from people who don't behave perfectly, because otherwise no one would be able to criticize anything. Similarly, we can criticize society while participating in it.

The point I'd like to make (that doesn't seem to be landing) is that LLMs aren't just made by bad people, but are also made through harmful processes. Harm dealt mostly during creation can be better than continuing harm, but still harmful.
@correl @FediThing @tante

Piero Bosio Social Web Site Personale

Yesterday Cory Doctorow argued that refusal to use LLMs was mere "neoliberal purity culture".

Feed RSS

Gli ultimi otto messaggi ricevuti dalla Federazione

Post suggeriti

https://thelocalstack.eu/posts/linkedin-identity-verification-privacy/

Ready for the biggest community-run open source event in North America?

Loffo, because it's almost Caturday.

Using filters on the Fediverse feels so good.