This is bad.
-
@ireneista @glyph I hope it doesn't, if only because I want to be focusing on my specfic and screenplays, but if it does come to that, I very very much so appreciate your support. ♥
@xgranade @ireneista @glyph *quickly scribbles out a short story involving a fantastical run for the PSF*
-
@xgranade @SnoopJ @theorangetheme I'm curious--how is Claude directly able to do commits? Why is it not "Claude on behalf of Dave Alvarado"? I understand somebody ran an agent against the code base, but someBODY ran the agent against the code base. Somebody prompted it saying "go find security vulnerabilities in Python".
It sure would be nice to know who, not just "Claude".
@dave @xgranade @theorangetheme I'm not sure I really understand the question. In the commits above, it's a co-author rather than a primary author.
But in the general case, it's able to do it by running the command that adds a commit, in a context where the configured name/email for use with `git` will be the name/email associated with the model (the author metadata includes the specific model as well)
Creating such commits without indication of the human involvement (wherever it originated, since Rube Goldberg contraptions are all the rage right now) is IMO unethical but far from unimaginable.
-
@xgranade
Huh, back to perl then I guess? :(@srtcd424 If that's what's useful to you? But I don't personally recommend moving away from Python, nor do I think that's an effective tactic for dealing with the problem.
As mentioned, this is a broad problem in OSS *in general*, and Python is now in the blast radius of that problem. Trying to create a dependency path that doesn't include any AI-vulnerable code is very difficult right now.
-
@dave @xgranade @theorangetheme I'm not sure I really understand the question. In the commits above, it's a co-author rather than a primary author.
But in the general case, it's able to do it by running the command that adds a commit, in a context where the configured name/email for use with `git` will be the name/email associated with the model (the author metadata includes the specific model as well)
Creating such commits without indication of the human involvement (wherever it originated, since Rube Goldberg contraptions are all the rage right now) is IMO unethical but far from unimaginable.
@SnoopJ @xgranade @theorangetheme gotcha. On second look, I see that you were grepping, I misunderstood what I was reading there.
As I've thought about it some more, I think I'm standing by my take. IMO the fact that you contributed with Claude is barely more interesting than the fact that you contributed with VS Code. I think that "oh I used an LLM/Agent" is not a defense against, well, anything.
-
@SnoopJ @xgranade @theorangetheme gotcha. On second look, I see that you were grepping, I misunderstood what I was reading there.
As I've thought about it some more, I think I'm standing by my take. IMO the fact that you contributed with Claude is barely more interesting than the fact that you contributed with VS Code. I think that "oh I used an LLM/Agent" is not a defense against, well, anything.
@SnoopJ @xgranade @theorangetheme I don't think we should be personifying LLMs by calling them "co-authors". Claude didn't author, it recursively autocompleted.
-
@SnoopJ @xgranade @theorangetheme gotcha. On second look, I see that you were grepping, I misunderstood what I was reading there.
As I've thought about it some more, I think I'm standing by my take. IMO the fact that you contributed with Claude is barely more interesting than the fact that you contributed with VS Code. I think that "oh I used an LLM/Agent" is not a defense against, well, anything.
@dave @SnoopJ @theorangetheme It's not interesting, but it is important as part of understanding the vulnerability surface introduced by that code. There are many things about code that are simultaneously boring as fuck and also critically important.
-
@SnoopJ @xgranade @theorangetheme I don't think we should be personifying LLMs by calling them "co-authors". Claude didn't author, it recursively autocompleted.
@dave @SnoopJ @theorangetheme I don't even disagree, but that's the signal that Claude gives us, and there's no Git metadata for "this code was extruded by $x slop machine."
-
@astraluma @xgranade If you search for 'claude' you can find the commits where Claude is a "co-author" https://github.com/search?q=repo%3Apython%2Fcpython+claude&type=commits
-
@astraluma @xgranade If you search for 'claude' you can find the commits where Claude is a "co-author" https://github.com/search?q=repo%3Apython%2Fcpython+claude&type=commits
@nausicaa @astraluma As @joelle pointed out, Claude is also a name that real people have. @SnoopJ's cantrip is going to be less susceptible to false positives by filtering on "anthropic.com" as well.
-
@srtcd424 If that's what's useful to you? But I don't personally recommend moving away from Python, nor do I think that's an effective tactic for dealing with the problem.
As mentioned, this is a broad problem in OSS *in general*, and Python is now in the blast radius of that problem. Trying to create a dependency path that doesn't include any AI-vulnerable code is very difficult right now.
@xgranade
Yeah, sorry, it was dark humour. I'm honestly terrified about where all this heading :( Not personally a python fan probably due to my vintage but it's used for a frightening proportion of software I rely on. -
@dave @SnoopJ @theorangetheme It's not interesting, but it is important as part of understanding the vulnerability surface introduced by that code. There are many things about code that are simultaneously boring as fuck and also critically important.
@xgranade @SnoopJ @theorangetheme yeah I've been thinking about that, and I'm not sure I agree. The outputted code is the outputted code. "y = x + 1" doesn't gain additional attack surface because Claude autocompleted it.
I think there are all sorts of *human* exploits that can happen and are happening, but those are all based on our laziness checking Claude's work, not Claude's output itself. Things like maintainers going "Jesus take the wheel" when Claude writes commits because it's easier
-
@xgranade
Yeah, sorry, it was dark humour. I'm honestly terrified about where all this heading :( Not personally a python fan probably due to my vintage but it's used for a frightening proportion of software I rely on.@srtcd424 No need to apologize, I just want to be clear about my own views on this rather than inadvertently implying criticism of Python *in particular* that I neither mean nor want to make.
-
@xgranade @SnoopJ @theorangetheme yeah I've been thinking about that, and I'm not sure I agree. The outputted code is the outputted code. "y = x + 1" doesn't gain additional attack surface because Claude autocompleted it.
I think there are all sorts of *human* exploits that can happen and are happening, but those are all based on our laziness checking Claude's work, not Claude's output itself. Things like maintainers going "Jesus take the wheel" when Claude writes commits because it's easier
@xgranade @SnoopJ @theorangetheme please don't read any of this as my endorsement of slop, I can't stand it. I'm just trying to pick apart how code autocompleted by Claude is different from the moral hazard of trusting Claude in the first place.
-
@xgranade @SnoopJ @theorangetheme yeah I've been thinking about that, and I'm not sure I agree. The outputted code is the outputted code. "y = x + 1" doesn't gain additional attack surface because Claude autocompleted it.
I think there are all sorts of *human* exploits that can happen and are happening, but those are all based on our laziness checking Claude's work, not Claude's output itself. Things like maintainers going "Jesus take the wheel" when Claude writes commits because it's easier
@dave @SnoopJ @theorangetheme My views here are complicated, but let me try and give a somewhat accurate condensed version?
First, to your `y = x + 1` example, if the code is simple enough, that vulnerability can be mitigated by human review — the problem is still there, I contend, but was contained by review. The problem is that humans *suck* at scanning for that kind of problem. Take the TSA looking for guns in x-ray scans... they keep failing at that, and incredibly badly.
-
@srtcd424 No need to apologize, I just want to be clear about my own views on this rather than inadvertently implying criticism of Python *in particular* that I neither mean nor want to make.
@xgranade
Yeah, fair. It feels like we're fish trapped in a pool of trustworthy software that's rapidly drying up & shrinking :( -
@dave @SnoopJ @theorangetheme My views here are complicated, but let me try and give a somewhat accurate condensed version?
First, to your `y = x + 1` example, if the code is simple enough, that vulnerability can be mitigated by human review — the problem is still there, I contend, but was contained by review. The problem is that humans *suck* at scanning for that kind of problem. Take the TSA looking for guns in x-ray scans... they keep failing at that, and incredibly badly.
@dave @SnoopJ @theorangetheme As code changes grow, it's even harder to do that mitigation, especially when those code changes interact with a highly complex code base. There's times where `y = x + 1` would be a catastrophic error due to someone else doing pointer math and whatnot, say.
Beyond that, though, it's not clear to what degree *if any* extruded code can be copyrighted. If it can't be, what impact does that have on the project.
-
@joelle @nausicaa @astraluma @SnoopJ True, but that at least biases towards false negatives instead of false positives, which seems like a fair tradeoff?
-
@dave @SnoopJ @theorangetheme As code changes grow, it's even harder to do that mitigation, especially when those code changes interact with a highly complex code base. There's times where `y = x + 1` would be a catastrophic error due to someone else doing pointer math and whatnot, say.
Beyond that, though, it's not clear to what degree *if any* extruded code can be copyrighted. If it can't be, what impact does that have on the project.
@dave @SnoopJ @theorangetheme What happens if, as sometimes happens, the code extruded by a generator is a verbatim quotation of code in its training set, and that comes from a different license? I'm not a lawyer, so I don't understand these risks well enough to always know what is and isn't safe for me to accept, especially if slop extruders are involved.
-
@dave @SnoopJ @theorangetheme What happens if, as sometimes happens, the code extruded by a generator is a verbatim quotation of code in its training set, and that comes from a different license? I'm not a lawyer, so I don't understand these risks well enough to always know what is and isn't safe for me to accept, especially if slop extruders are involved.
@xgranade @dave @theorangetheme IANAL either but it is worth pointing out that generation and *distribution* are separate activities, and humans are still holding all the liability for the latter (which is also the only legally-enforceable part to begin with)
-
@xgranade @dave @theorangetheme IANAL either but it is worth pointing out that generation and *distribution* are separate activities, and humans are still holding all the liability for the latter (which is also the only legally-enforceable part to begin with)
@SnoopJ @dave @theorangetheme That's fair, yeah. My point is more I don't understand the exact shape of the risk... if I redistribute code that was generated by an AI agent, what additional risk if any do I incur?