Can we just put it bluntly?
-
@soph Assuming that's understood. Can we, the humans, draw boundary between violating and meeting the novum IP rules? What use of LLMs as tools would let us pass the test?
My point here is not to use them, nor to try to salvage them. Unless the code being created can have IP assigned to it (which it can't in the case of LLMs, as we've now seen in court cases), then it can't be contributed to an open source project in a way that is compatible both with the letter and spirit of open source.
It's two parts, really. Even if you *only* trained on code from one license, because of the above it's still not compatible.
-
Can we just put it bluntly?
If you're vibe-coding open source, you are *not* doing open source.
To do open source, you must be creating source code that both has clear provenance *and* the new code you're writing is IP you have full rights to offer under compatible license. As is quickly becoming clear, that second one is getting tested and failing legal checks in places like the US.
@soph if my goal was to taint open source from a legal as well as a functional standpoint, LLMs would be a dream come true
-
My point here is not to use them, nor to try to salvage them. Unless the code being created can have IP assigned to it (which it can't in the case of LLMs, as we've now seen in court cases), then it can't be contributed to an open source project in a way that is compatible both with the letter and spirit of open source.
It's two parts, really. Even if you *only* trained on code from one license, because of the above it's still not compatible.
@soph But the same interpretation would apply to proprietary code generated using LLMs trained on whatever. If the copyright protection cannot be granted in result of lack of novum IP, that's violation of any license granted to customers in exchange for money. Am I wrong?
If someone would train the model from scratch on own code and materials they have license for (in this case not open-source), that would be their unique tool in their private bag that couldn't be rejected from protection.
-
@soph if my goal was to taint open source from a legal as well as a functional standpoint, LLMs would be a dream come true
-
@soph But the same interpretation would apply to proprietary code generated using LLMs trained on whatever. If the copyright protection cannot be granted in result of lack of novum IP, that's violation of any license granted to customers in exchange for money. Am I wrong?
If someone would train the model from scratch on own code and materials they have license for (in this case not open-source), that would be their unique tool in their private bag that couldn't be rejected from protection.
I'm specifically talking about open source here. I don't want to speak to other fields as I have less experience in them.
-
Can we just put it bluntly?
If you're vibe-coding open source, you are *not* doing open source.
To do open source, you must be creating source code that both has clear provenance *and* the new code you're writing is IP you have full rights to offer under compatible license. As is quickly becoming clear, that second one is getting tested and failing legal checks in places like the US.
@soph I'm not an expert at all when it comes to licenses but I think that if I was to release code that was heavily vibe coded, I would feel compelled to release it under public domain.
-
@soph I'm not an expert at all when it comes to licenses but I think that if I was to release code that was heavily vibe coded, I would feel compelled to release it under public domain.
unfortunately, this would also be problematic. If parts of the code it generated come from licensed code, and the connection to the original code is clear when the two are compared, then publishing it under public domain would potentially violate the original license.
-
Can we just put it bluntly?
If you're vibe-coding open source, you are *not* doing open source.
To do open source, you must be creating source code that both has clear provenance *and* the new code you're writing is IP you have full rights to offer under compatible license. As is quickly becoming clear, that second one is getting tested and failing legal checks in places like the US.
Am I right that by "vibe-coding" you mean "generating code, with no to little human involvement in the process". Which would be different than: "using tools to generate code, but with a human actively in the loop".
I believe the crux of the case in the US was that the defendant claimed they did not create the works, a machine did, and because non-humans cannot claim IP protections they lost the case. Or did I misunderstand something about that case?
-
Am I right that by "vibe-coding" you mean "generating code, with no to little human involvement in the process". Which would be different than: "using tools to generate code, but with a human actively in the loop".
I believe the crux of the case in the US was that the defendant claimed they did not create the works, a machine did, and because non-humans cannot claim IP protections they lost the case. Or did I misunderstand something about that case?
I guess I am slightly more cynical about copyright law. I view it as a tool by capital, for capital β doubly so in countries like the US where bribery is legal in all but name.
Right now the stock market is fully leveraged on AI. I don't see how the US would ever find itself in a position where a Supreme Court ruling would ever intentionally put the entire economy in peril.
-
I guess I am slightly more cynical about copyright law. I view it as a tool by capital, for capital β doubly so in countries like the US where bribery is legal in all but name.
Right now the stock market is fully leveraged on AI. I don't see how the US would ever find itself in a position where a Supreme Court ruling would ever intentionally put the entire economy in peril.
Perhaps, though the recent ruling could have massive impacts. I suspect you're still right, until it becomes convenient for the people in power to enforce things and attack their enemies with it.
-
Am I right that by "vibe-coding" you mean "generating code, with no to little human involvement in the process". Which would be different than: "using tools to generate code, but with a human actively in the loop".
I believe the crux of the case in the US was that the defendant claimed they did not create the works, a machine did, and because non-humans cannot claim IP protections they lost the case. Or did I misunderstand something about that case?
No, I mean generating code using an LLM at all. Though it'll be up to the lawyers how it's applied. If the machine is the one generating the code, even if you're the one telling it what to generate, then it's still producing things you didn't type or research or... etc
-
No, I mean generating code using an LLM at all. Though it'll be up to the lawyers how it's applied. If the machine is the one generating the code, even if you're the one telling it what to generate, then it's still producing things you didn't type or research or... etc
Ah ok! In practice I expect there is likely going to be a pretty big difference between the two.
Once you get down to brass tacks: if a human is the one driving then it becomes hard to come up with language that does ban LLMs, but does not also ban things like compilers and digital cameras.
Because both of those are also instances of: "I pressed a button and it automatically generated binary output β none of which was produced directly by me."
-
Ah ok! In practice I expect there is likely going to be a pretty big difference between the two.
Once you get down to brass tacks: if a human is the one driving then it becomes hard to come up with language that does ban LLMs, but does not also ban things like compilers and digital cameras.
Because both of those are also instances of: "I pressed a button and it automatically generated binary output β none of which was produced directly by me."
This is a bit different than those examples, though. In the case of code, we're talking about the source code itself, regardless of further application of tools to it.
If the code itself cannot be copyrighted, then how it plays into the IP required to participate in open source becomes the issue
-
This is a bit different than those examples, though. In the case of code, we're talking about the source code itself, regardless of further application of tools to it.
If the code itself cannot be copyrighted, then how it plays into the IP required to participate in open source becomes the issue
The funniest outcome for sure is that the resulting ruling would make all software defacto illegal.
"To the maintainers of this open-source project. We are the big co legal department. We would like to get your written sign-off that no 'AI assistive tooling' has ever been used in this project. It is important for our supply chain. We expect a reply within 5 days."
If anything AI has touched becomes devoid of legal protections, then that would probably implode the tech sector overnight.
-
The funniest outcome for sure is that the resulting ruling would make all software defacto illegal.
"To the maintainers of this open-source project. We are the big co legal department. We would like to get your written sign-off that no 'AI assistive tooling' has ever been used in this project. It is important for our supply chain. We expect a reply within 5 days."
If anything AI has touched becomes devoid of legal protections, then that would probably implode the tech sector overnight.
I think you're maybe saying something a bit more grandiose than what I'm trying to get at, which is around the code being generated itself by AI.
Here I'm not talking like autogenerated version bumps, but really truly stuff trained on unknown IP.
If some projects need to rollback and try again, I don't think that would be devastating. Sure, newer projects might suffer, but there was plenty of waste when the hype was around stuff like blockchain, too.
-
I think you're maybe saying something a bit more grandiose than what I'm trying to get at, which is around the code being generated itself by AI.
Here I'm not talking like autogenerated version bumps, but really truly stuff trained on unknown IP.
If some projects need to rollback and try again, I don't think that would be devastating. Sure, newer projects might suffer, but there was plenty of waste when the hype was around stuff like blockchain, too.
I guess what Im trying to get at is that if *any* amount of AI code is considered uncopyrightable, that would become a poison pill for any project that has had any amount of AI code contributed to it.
It's not like every line of code authored by an LLM has a label that says: "I was written by an LLM." If I'm not mistaken there are OSS projects like the Linux kernel which will accept PRs that were partially authored by LLMs. I don't see how that could be untangled.
-
I guess what Im trying to get at is that if *any* amount of AI code is considered uncopyrightable, that would become a poison pill for any project that has had any amount of AI code contributed to it.
It's not like every line of code authored by an LLM has a label that says: "I was written by an LLM." If I'm not mistaken there are OSS projects like the Linux kernel which will accept PRs that were partially authored by LLMs. I don't see how that could be untangled.
-
I guess what Im trying to get at is that if *any* amount of AI code is considered uncopyrightable, that would become a poison pill for any project that has had any amount of AI code contributed to it.
It's not like every line of code authored by an LLM has a label that says: "I was written by an LLM." If I'm not mistaken there are OSS projects like the Linux kernel which will accept PRs that were partially authored by LLMs. I don't see how that could be untangled.
-
I guess what Im trying to get at is that if *any* amount of AI code is considered uncopyrightable, that would become a poison pill for any project that has had any amount of AI code contributed to it.
It's not like every line of code authored by an LLM has a label that says: "I was written by an LLM." If I'm not mistaken there are OSS projects like the Linux kernel which will accept PRs that were partially authored by LLMs. I don't see how that could be untangled.
@yosh @soph That part doesn't really seem like a problem, honestly. As I understand.
It's already the case that Linux kernel contributors (like most OSS projects) retain copyright on their contributions. The "linux kernel" can't sue anyone for copyright infringement; only the specific copyright holders, for the code they own.
A particular contributor's contributions being public domain presumably is similar as far as actual copyright enforcement to that person not being interested in joining as a plaintiff in a copyright lawsuit.
(Of course, if the LLM's output were found to be *infringing* that could be a bigger problem.)
-
@yosh @soph That part doesn't really seem like a problem, honestly. As I understand.
It's already the case that Linux kernel contributors (like most OSS projects) retain copyright on their contributions. The "linux kernel" can't sue anyone for copyright infringement; only the specific copyright holders, for the code they own.
A particular contributor's contributions being public domain presumably is similar as far as actual copyright enforcement to that person not being interested in joining as a plaintiff in a copyright lawsuit.
(Of course, if the LLM's output were found to be *infringing* that could be a bigger problem.)
@yosh @soph Or perhaps rather it *is* a problem, but it's an existing problem, not a new one. For most projects.
The GNU project in contrast generally wants copyright assignment from contributors exactly to help avoid this sort of issue with license enforcement: https://www.gnu.org/licenses/why-assign.html