This is bad.
-
@joelle @nausicaa @astraluma @SnoopJ True, but that at least biases towards false negatives instead of false positives, which seems like a fair tradeoff?
-
@dave @SnoopJ @theorangetheme As code changes grow, it's even harder to do that mitigation, especially when those code changes interact with a highly complex code base. There's times where `y = x + 1` would be a catastrophic error due to someone else doing pointer math and whatnot, say.
Beyond that, though, it's not clear to what degree *if any* extruded code can be copyrighted. If it can't be, what impact does that have on the project.
@dave @SnoopJ @theorangetheme What happens if, as sometimes happens, the code extruded by a generator is a verbatim quotation of code in its training set, and that comes from a different license? I'm not a lawyer, so I don't understand these risks well enough to always know what is and isn't safe for me to accept, especially if slop extruders are involved.
-
@dave @SnoopJ @theorangetheme What happens if, as sometimes happens, the code extruded by a generator is a verbatim quotation of code in its training set, and that comes from a different license? I'm not a lawyer, so I don't understand these risks well enough to always know what is and isn't safe for me to accept, especially if slop extruders are involved.
@xgranade @dave @theorangetheme IANAL either but it is worth pointing out that generation and *distribution* are separate activities, and humans are still holding all the liability for the latter (which is also the only legally-enforceable part to begin with)
-
@xgranade @dave @theorangetheme IANAL either but it is worth pointing out that generation and *distribution* are separate activities, and humans are still holding all the liability for the latter (which is also the only legally-enforceable part to begin with)
@SnoopJ @dave @theorangetheme That's fair, yeah. My point is more I don't understand the exact shape of the risk... if I redistribute code that was generated by an AI agent, what additional risk if any do I incur?
-
@nausicaa @astraluma As @joelle pointed out, Claude is also a name that real people have. @SnoopJ's cantrip is going to be less susceptible to false positives by filtering on "anthropic.com" as well.
@xgranade @astraluma @joelle @SnoopJ Fair. Given the current scale, I just clicked through to check the different commits, but that doesn't scale as well as SnoopJ's approach.
-
@xgranade @astraluma @joelle @SnoopJ Fair. Given the current scale, I just clicked through to check the different commits, but that doesn't scale as well as SnoopJ's approach.
@nausicaa @astraluma @joelle @SnoopJ That's fair, too, this is so far a small handful and it's not too hard to manually validate that positives are actually true positives.
-
@SnoopJ @dave @theorangetheme That's fair, yeah. My point is more I don't understand the exact shape of the risk... if I redistribute code that was generated by an AI agent, what additional risk if any do I incur?
@xgranade @dave @theorangetheme IMO the risk profile from a legal liability standpoint is exactly the same as if you'd written it by hand
that is, if you distribute a machine-generated copy of a protected work, that doesn't really factor into the ability of that work's owner to sue you for said distribution. the owner has as much standing (in the legalistic sense) as they would if you'd copied and pasted by hand
now the actual *trial* that might arise could have some differences, especially where a judge's discretion is involved (e.g. in awarding damages), but considering how things have gone in the courts so far, I feel reasonably confident in saying that a litigant with a big enough warchest to be a pain in the ass in court over it is going to get treated about the same?
(which might be a complicated way to say "the legalistic arguments are moot, whoever has the deeper pockets wins" but I do enjoy pondering the legal theory even if I know how little it matters to the legal system that actually exists)
-
I'm gonna be real with folks here. I fucked up, and bad, with my participation in the open-slopware list. As a result, I'm not the right person to do it, but there has to be some kind of accounting for what damage AI is doing to open source.
For all the whinging about "supply chains" over the past few years, it *is* a problem when your code suddenly depends on AI, even if only indirectly.
@xgranade As someone who doesn't know anything about open-slopware, what was bad about it?
-
@ireneista @glyph I hope it doesn't, if only because I want to be focusing on my specfic and screenplays, but if it does come to that, I very very much so appreciate your support. ♥
-
@xgranade @dave @theorangetheme IMO the risk profile from a legal liability standpoint is exactly the same as if you'd written it by hand
that is, if you distribute a machine-generated copy of a protected work, that doesn't really factor into the ability of that work's owner to sue you for said distribution. the owner has as much standing (in the legalistic sense) as they would if you'd copied and pasted by hand
now the actual *trial* that might arise could have some differences, especially where a judge's discretion is involved (e.g. in awarding damages), but considering how things have gone in the courts so far, I feel reasonably confident in saying that a litigant with a big enough warchest to be a pain in the ass in court over it is going to get treated about the same?
(which might be a complicated way to say "the legalistic arguments are moot, whoever has the deeper pockets wins" but I do enjoy pondering the legal theory even if I know how little it matters to the legal system that actually exists)
@SnoopJ @dave @theorangetheme That's fair... I guess there's some nontrivial things that still come to mind for me such as "who owns machine generated code, given that the Copyright Office has said that no one does, and also given that the CO has no power to decide what judges will find in their rulings," but a lot of it for me comes down to whether I'd rather trust an individual person with my legal exposure, or a company that is trying to disrupt OSS as a whole?
-
@xgranade I also dislike it, but the cat's out of the bag, even if it wasn't allowed people would still be using it, just without revealing it
@MissingClara @xgranade na I don't believe that, actually. these fuckers seem to not be able to live without shoving their slop proudly in everyone's faces. stomp on their egos and they shrivel. ban it
-
@xgranade As someone who doesn't know anything about open-slopware, what was bad about it?
@jo My thread detailing what I do and do not apologize for may be a good start?
-
As a second addendum, since this has come up in several reply threads, the number of commits is limited so far, and doesn't date back past December 5, 2025 so far as I'm aware of.
The Python-specific part of that broader problem is, at least to my mind, that there's not a mechanism that I see for limiting that exposure to those commits, to preventing further and more expansive commits in the future.
@xgranade The harm minimisation is the same as the one for humans messing up: everything still goes through the same CI and review processes, and those are ultimately governed by the SC.
But yeah, while offering free professional licenses to maintainers of major open source projects is a somewhat common practice for tools vendors, it's a lot murkier when that tool is a full coding LLM rather than just a local development IDE (even one with fancy autocompletion) or an MSDN subscription.
-
@SnoopJ @dave @theorangetheme That's fair... I guess there's some nontrivial things that still come to mind for me such as "who owns machine generated code, given that the Copyright Office has said that no one does, and also given that the CO has no power to decide what judges will find in their rulings," but a lot of it for me comes down to whether I'd rather trust an individual person with my legal exposure, or a company that is trying to disrupt OSS as a whole?
@xgranade @dave @theorangetheme re: the CO guidance I think it's going to prove easy to cross the threshold of sufficient human authorship to 'heal' that ownership problem as long as you can spread that human authorship around.
even in the worst-case scenario, there's no liability in distributing an unowned work: by definition, nobody can sue you for infringement
so me the bigger threat is the risk that parts of the generated work are sufficiently-large verbatim repetitions of a protected work
as I punch that out, I'm realizing that there may be some interesting questions of whether the GPL can 'really' be applied to generated code but it probably comes back to the human authorship thing. gods know FSF probably aren't going to offer any useful guidance about that
-
@xgranade The harm minimisation is the same as the one for humans messing up: everything still goes through the same CI and review processes, and those are ultimately governed by the SC.
But yeah, while offering free professional licenses to maintainers of major open source projects is a somewhat common practice for tools vendors, it's a lot murkier when that tool is a full coding LLM rather than just a local development IDE (even one with fancy autocompletion) or an MSDN subscription.
@ancoghlan I mean, yes, but also AI is DDoSing the heck out of that process? More that point I was getting at is that I wasn't able to find any policies against AI-generated code in the first place, so there's very little in the way of safeguards to prevent more such commits in the future.
-
@xgranade @dave @theorangetheme re: the CO guidance I think it's going to prove easy to cross the threshold of sufficient human authorship to 'heal' that ownership problem as long as you can spread that human authorship around.
even in the worst-case scenario, there's no liability in distributing an unowned work: by definition, nobody can sue you for infringement
so me the bigger threat is the risk that parts of the generated work are sufficiently-large verbatim repetitions of a protected work
as I punch that out, I'm realizing that there may be some interesting questions of whether the GPL can 'really' be applied to generated code but it probably comes back to the human authorship thing. gods know FSF probably aren't going to offer any useful guidance about that
@SnoopJ @dave @theorangetheme Yeah, there's two broad categories of potential risk that I see: someone you don't expect owns the code and comes after you, and you expect that you own the code when you don't, and someone is able to violate the intent of your license as a result.
Not being a lawyer I'm happy to believe that the healing theory makes a lot of sense, but it's still an additional assumption I need to make that I wouldn't have needed to without AI.
-
@SnoopJ @dave @theorangetheme Yeah, there's two broad categories of potential risk that I see: someone you don't expect owns the code and comes after you, and you expect that you own the code when you don't, and someone is able to violate the intent of your license as a result.
Not being a lawyer I'm happy to believe that the healing theory makes a lot of sense, but it's still an additional assumption I need to make that I wouldn't have needed to without AI.
@SnoopJ @dave @theorangetheme Regardless, fully agree that the FSF is going to be worse than useless here.
-
@SnoopJ @dave @theorangetheme Regardless, fully agree that the FSF is going to be worse than useless here.
@xgranade force of habit, really
-
@xgranade force of habit, really
@SnoopJ @xgranade I appreciate this discussion, I hadn't thought about the licensing ramifications.
I think a simple solution is to again, not treat a LLM as an author but rather a tool. The code belongs to whoever prompted the tool. if it spits out copyrighted code verbatim and I commit it, that's on me.
The same way lawyers are on the hook for every word of output from a LLM that they file in court, we should be on the hook for every line of code that we commit.
-
@SnoopJ @xgranade I appreciate this discussion, I hadn't thought about the licensing ramifications.
I think a simple solution is to again, not treat a LLM as an author but rather a tool. The code belongs to whoever prompted the tool. if it spits out copyrighted code verbatim and I commit it, that's on me.
The same way lawyers are on the hook for every word of output from a LLM that they file in court, we should be on the hook for every line of code that we commit.