This is bad.
-
@xgranade As someone who doesn't know anything about open-slopware, what was bad about it?
@jo My thread detailing what I do and do not apologize for may be a good start?
-
As a second addendum, since this has come up in several reply threads, the number of commits is limited so far, and doesn't date back past December 5, 2025 so far as I'm aware of.
The Python-specific part of that broader problem is, at least to my mind, that there's not a mechanism that I see for limiting that exposure to those commits, to preventing further and more expansive commits in the future.
@xgranade The harm minimisation is the same as the one for humans messing up: everything still goes through the same CI and review processes, and those are ultimately governed by the SC.
But yeah, while offering free professional licenses to maintainers of major open source projects is a somewhat common practice for tools vendors, it's a lot murkier when that tool is a full coding LLM rather than just a local development IDE (even one with fancy autocompletion) or an MSDN subscription.
-
@SnoopJ @dave @theorangetheme That's fair... I guess there's some nontrivial things that still come to mind for me such as "who owns machine generated code, given that the Copyright Office has said that no one does, and also given that the CO has no power to decide what judges will find in their rulings," but a lot of it for me comes down to whether I'd rather trust an individual person with my legal exposure, or a company that is trying to disrupt OSS as a whole?
@xgranade @dave @theorangetheme re: the CO guidance I think it's going to prove easy to cross the threshold of sufficient human authorship to 'heal' that ownership problem as long as you can spread that human authorship around.
even in the worst-case scenario, there's no liability in distributing an unowned work: by definition, nobody can sue you for infringement
so me the bigger threat is the risk that parts of the generated work are sufficiently-large verbatim repetitions of a protected work
as I punch that out, I'm realizing that there may be some interesting questions of whether the GPL can 'really' be applied to generated code but it probably comes back to the human authorship thing. gods know FSF probably aren't going to offer any useful guidance about that
-
@xgranade The harm minimisation is the same as the one for humans messing up: everything still goes through the same CI and review processes, and those are ultimately governed by the SC.
But yeah, while offering free professional licenses to maintainers of major open source projects is a somewhat common practice for tools vendors, it's a lot murkier when that tool is a full coding LLM rather than just a local development IDE (even one with fancy autocompletion) or an MSDN subscription.
@ancoghlan I mean, yes, but also AI is DDoSing the heck out of that process? More that point I was getting at is that I wasn't able to find any policies against AI-generated code in the first place, so there's very little in the way of safeguards to prevent more such commits in the future.
-
@xgranade @dave @theorangetheme re: the CO guidance I think it's going to prove easy to cross the threshold of sufficient human authorship to 'heal' that ownership problem as long as you can spread that human authorship around.
even in the worst-case scenario, there's no liability in distributing an unowned work: by definition, nobody can sue you for infringement
so me the bigger threat is the risk that parts of the generated work are sufficiently-large verbatim repetitions of a protected work
as I punch that out, I'm realizing that there may be some interesting questions of whether the GPL can 'really' be applied to generated code but it probably comes back to the human authorship thing. gods know FSF probably aren't going to offer any useful guidance about that
@SnoopJ @dave @theorangetheme Yeah, there's two broad categories of potential risk that I see: someone you don't expect owns the code and comes after you, and you expect that you own the code when you don't, and someone is able to violate the intent of your license as a result.
Not being a lawyer I'm happy to believe that the healing theory makes a lot of sense, but it's still an additional assumption I need to make that I wouldn't have needed to without AI.
-
@SnoopJ @dave @theorangetheme Yeah, there's two broad categories of potential risk that I see: someone you don't expect owns the code and comes after you, and you expect that you own the code when you don't, and someone is able to violate the intent of your license as a result.
Not being a lawyer I'm happy to believe that the healing theory makes a lot of sense, but it's still an additional assumption I need to make that I wouldn't have needed to without AI.
@SnoopJ @dave @theorangetheme Regardless, fully agree that the FSF is going to be worse than useless here.
-
@SnoopJ @dave @theorangetheme Regardless, fully agree that the FSF is going to be worse than useless here.
@xgranade force of habit, really
-
@xgranade force of habit, really
@SnoopJ @xgranade I appreciate this discussion, I hadn't thought about the licensing ramifications.
I think a simple solution is to again, not treat a LLM as an author but rather a tool. The code belongs to whoever prompted the tool. if it spits out copyrighted code verbatim and I commit it, that's on me.
The same way lawyers are on the hook for every word of output from a LLM that they file in court, we should be on the hook for every line of code that we commit.
-
@SnoopJ @xgranade I appreciate this discussion, I hadn't thought about the licensing ramifications.
I think a simple solution is to again, not treat a LLM as an author but rather a tool. The code belongs to whoever prompted the tool. if it spits out copyrighted code verbatim and I commit it, that's on me.
The same way lawyers are on the hook for every word of output from a LLM that they file in court, we should be on the hook for every line of code that we commit.
-
@xgranade @dave @theorangetheme re: the CO guidance I think it's going to prove easy to cross the threshold of sufficient human authorship to 'heal' that ownership problem as long as you can spread that human authorship around.
even in the worst-case scenario, there's no liability in distributing an unowned work: by definition, nobody can sue you for infringement
so me the bigger threat is the risk that parts of the generated work are sufficiently-large verbatim repetitions of a protected work
as I punch that out, I'm realizing that there may be some interesting questions of whether the GPL can 'really' be applied to generated code but it probably comes back to the human authorship thing. gods know FSF probably aren't going to offer any useful guidance about that
There's already copyright case law regarding llm generated text.
Judges have ruled it is not human authored and therefore not subject to copyright.
The latest one i read specifically said that you must specifically state which portions were generated and exclude those sections from claimed copyright.
So "human put llm code chunks together" is likely only protected for the arrangement of the chunks and not any of the code itself. (Not a lawyer, making reasonable guess off of lots and lots of copyright knowledge and case law for things like remixing and collage work.)
-
I'm gonna be real with folks here. I fucked up, and bad, with my participation in the open-slopware list. As a result, I'm not the right person to do it, but there has to be some kind of accounting for what damage AI is doing to open source.
For all the whinging about "supply chains" over the past few years, it *is* a problem when your code suddenly depends on AI, even if only indirectly.
@xgranade What's wrong with the open-slopware list though? Are we talking about the one on codeberg?
-
There's already copyright case law regarding llm generated text.
Judges have ruled it is not human authored and therefore not subject to copyright.
The latest one i read specifically said that you must specifically state which portions were generated and exclude those sections from claimed copyright.
So "human put llm code chunks together" is likely only protected for the arrangement of the chunks and not any of the code itself. (Not a lawyer, making reasonable guess off of lots and lots of copyright knowledge and case law for things like remixing and collage work.)
@pathunstrom @xgranade I would appreciate case references if you happen to have any handy. I was trying to keep up with case law a few years ago, but not anymore.
-
@xgranade @theorangetheme yea I didn't mean to minimize the impact, just wanted to share the cantrip I've been using to check this when I run into the same thing
@SnoopJ echo 'alias cantrip=alias' | sudo tee /etc/profile
-
@pathunstrom @xgranade I would appreciate case references if you happen to have any handy. I was trying to keep up with case law a few years ago, but not anymore.
-
@pathunstrom @SnoopJ @xgranade same please include me on any reply:) I am working on a truly epic blog-length crash out and I would like to cite it there
-
There's already copyright case law regarding llm generated text.
Judges have ruled it is not human authored and therefore not subject to copyright.
The latest one i read specifically said that you must specifically state which portions were generated and exclude those sections from claimed copyright.
So "human put llm code chunks together" is likely only protected for the arrangement of the chunks and not any of the code itself. (Not a lawyer, making reasonable guess off of lots and lots of copyright knowledge and case law for things like remixing and collage work.)
@pathunstrom @SnoopJ @xgranade that matches my understanding (from reading about rulings), but it’s not clear to me what that means when an LLM reproduces already copyrighted material. Does the prior copyright mean the output can be a violation even tho it can’t be copyrighted itself? Does the non-copyrightability of output override the previous ownership? That sounds absurd, but to me treating LLM output as anything but a derivative work was already absurd
-
@pathunstrom @SnoopJ @xgranade that matches my understanding (from reading about rulings), but it’s not clear to me what that means when an LLM reproduces already copyrighted material. Does the prior copyright mean the output can be a violation even tho it can’t be copyrighted itself? Does the non-copyrightability of output override the previous ownership? That sounds absurd, but to me treating LLM output as anything but a derivative work was already absurd
@ShadSterling @xgranade my understanding is that if you distribute someone else's protected work¹, you have infringed by distributing (that portion of) their work, full stop.
the particular means by which the infringement occurred are AFAIK entirely irrelevant to legal standing (i.e. the right for the owner of that work to sue the infringer), but the cases @pathunstrom is referring to may represent a gap between my understanding and the current practice of law in the US.
---
¹ or more precisely in this case: enough of someone else's protected work that they can make a convincing argument that it *is* their protected work in court, since outputs of all the models people talk about are in some sense *always* built from the protected works of others -
@ShadSterling @xgranade my understanding is that if you distribute someone else's protected work¹, you have infringed by distributing (that portion of) their work, full stop.
the particular means by which the infringement occurred are AFAIK entirely irrelevant to legal standing (i.e. the right for the owner of that work to sue the infringer), but the cases @pathunstrom is referring to may represent a gap between my understanding and the current practice of law in the US.
---
¹ or more precisely in this case: enough of someone else's protected work that they can make a convincing argument that it *is* their protected work in court, since outputs of all the models people talk about are in some sense *always* built from the protected works of others@SnoopJ @xgranade @pathunstrom that’s what I would have expected before the (IMO nonsensical) rulings about LLM outputs; AFAIK, whether LLM output can be infringing in that way has not yet been tested in court. I don’t know what to expect when such a case is heard, and the way things have been going I’m not looking forward to finding out
-
@xgranade @ireneista "do you have five million dollars of disposable income to fund an alternative to the PSF" is a good place to start, if you want to frame it as a "hostile fork" situation. the only solution is to get involved in the messy process of politics and governance and try to figure out a way to negotiate a durable peace
@glyph @xgranade @ireneista why do we need an alternative to Pumpkin Spice Farts? And why does it have to cost so much?
-
I'm gonna be real with folks here. I fucked up, and bad, with my participation in the open-slopware list. As a result, I'm not the right person to do it, but there has to be some kind of accounting for what damage AI is doing to open source.
For all the whinging about "supply chains" over the past few years, it *is* a problem when your code suddenly depends on AI, even if only indirectly.
@xgranade why do you consider open-slopware a mistake, btw?