Skip to content

Piero Bosio Social Web Site Personale Logo Fediverso

Social Forum federato con il resto del mondo. Non contano le istanze, contano le persone

Can we just put it bluntly?

Uncategorized
35 15 7
  • Can we just put it bluntly?

    If you're vibe-coding open source, you are *not* doing open source.

    To do open source, you must be creating source code that both has clear provenance *and* the new code you're writing is IP you have full rights to offer under compatible license. As is quickly becoming clear, that second one is getting tested and failing legal checks in places like the US.

    @soph if my goal was to taint open source from a legal as well as a functional standpoint, LLMs would be a dream come true

  • @michalfita

    My point here is not to use them, nor to try to salvage them. Unless the code being created can have IP assigned to it (which it can't in the case of LLMs, as we've now seen in court cases), then it can't be contributed to an open source project in a way that is compatible both with the letter and spirit of open source.

    It's two parts, really. Even if you *only* trained on code from one license, because of the above it's still not compatible.

    @soph But the same interpretation would apply to proprietary code generated using LLMs trained on whatever. If the copyright protection cannot be granted in result of lack of novum IP, that's violation of any license granted to customers in exchange for money. Am I wrong?

    If someone would train the model from scratch on own code and materials they have license for (in this case not open-source), that would be their unique tool in their private bag that couldn't be rejected from protection.

  • @soph if my goal was to taint open source from a legal as well as a functional standpoint, LLMs would be a dream come true

    @tyzbit exactly this

  • @soph But the same interpretation would apply to proprietary code generated using LLMs trained on whatever. If the copyright protection cannot be granted in result of lack of novum IP, that's violation of any license granted to customers in exchange for money. Am I wrong?

    If someone would train the model from scratch on own code and materials they have license for (in this case not open-source), that would be their unique tool in their private bag that couldn't be rejected from protection.

    @michalfita

    I'm specifically talking about open source here. I don't want to speak to other fields as I have less experience in them.

  • Can we just put it bluntly?

    If you're vibe-coding open source, you are *not* doing open source.

    To do open source, you must be creating source code that both has clear provenance *and* the new code you're writing is IP you have full rights to offer under compatible license. As is quickly becoming clear, that second one is getting tested and failing legal checks in places like the US.

    @soph I'm not an expert at all when it comes to licenses but I think that if I was to release code that was heavily vibe coded, I would feel compelled to release it under public domain.

  • @soph I'm not an expert at all when it comes to licenses but I think that if I was to release code that was heavily vibe coded, I would feel compelled to release it under public domain.

    @PierricD

    unfortunately, this would also be problematic. If parts of the code it generated come from licensed code, and the connection to the original code is clear when the two are compared, then publishing it under public domain would potentially violate the original license.

  • Can we just put it bluntly?

    If you're vibe-coding open source, you are *not* doing open source.

    To do open source, you must be creating source code that both has clear provenance *and* the new code you're writing is IP you have full rights to offer under compatible license. As is quickly becoming clear, that second one is getting tested and failing legal checks in places like the US.

    @soph

    Am I right that by "vibe-coding" you mean "generating code, with no to little human involvement in the process". Which would be different than: "using tools to generate code, but with a human actively in the loop".

    I believe the crux of the case in the US was that the defendant claimed they did not create the works, a machine did, and because non-humans cannot claim IP protections they lost the case. Or did I misunderstand something about that case?

  • @soph

    Am I right that by "vibe-coding" you mean "generating code, with no to little human involvement in the process". Which would be different than: "using tools to generate code, but with a human actively in the loop".

    I believe the crux of the case in the US was that the defendant claimed they did not create the works, a machine did, and because non-humans cannot claim IP protections they lost the case. Or did I misunderstand something about that case?

    @soph

    I guess I am slightly more cynical about copyright law. I view it as a tool by capital, for capital — doubly so in countries like the US where bribery is legal in all but name.

    Right now the stock market is fully leveraged on AI. I don't see how the US would ever find itself in a position where a Supreme Court ruling would ever intentionally put the entire economy in peril.

  • @soph

    I guess I am slightly more cynical about copyright law. I view it as a tool by capital, for capital — doubly so in countries like the US where bribery is legal in all but name.

    Right now the stock market is fully leveraged on AI. I don't see how the US would ever find itself in a position where a Supreme Court ruling would ever intentionally put the entire economy in peril.

    @yosh

    Perhaps, though the recent ruling could have massive impacts. I suspect you're still right, until it becomes convenient for the people in power to enforce things and attack their enemies with it.

    https://www.engadget.com/ai/the-supreme-court-doesnt-care-if-you-want-to-copyright-your-ai-generated-art-171849407.html

  • @soph

    Am I right that by "vibe-coding" you mean "generating code, with no to little human involvement in the process". Which would be different than: "using tools to generate code, but with a human actively in the loop".

    I believe the crux of the case in the US was that the defendant claimed they did not create the works, a machine did, and because non-humans cannot claim IP protections they lost the case. Or did I misunderstand something about that case?

    @yosh

    No, I mean generating code using an LLM at all. Though it'll be up to the lawyers how it's applied. If the machine is the one generating the code, even if you're the one telling it what to generate, then it's still producing things you didn't type or research or... etc

  • @yosh

    No, I mean generating code using an LLM at all. Though it'll be up to the lawyers how it's applied. If the machine is the one generating the code, even if you're the one telling it what to generate, then it's still producing things you didn't type or research or... etc

    @soph

    Ah ok! In practice I expect there is likely going to be a pretty big difference between the two.

    Once you get down to brass tacks: if a human is the one driving then it becomes hard to come up with language that does ban LLMs, but does not also ban things like compilers and digital cameras.

    Because both of those are also instances of: "I pressed a button and it automatically generated binary output – none of which was produced directly by me."

  • @soph

    Ah ok! In practice I expect there is likely going to be a pretty big difference between the two.

    Once you get down to brass tacks: if a human is the one driving then it becomes hard to come up with language that does ban LLMs, but does not also ban things like compilers and digital cameras.

    Because both of those are also instances of: "I pressed a button and it automatically generated binary output – none of which was produced directly by me."

    @yosh

    This is a bit different than those examples, though. In the case of code, we're talking about the source code itself, regardless of further application of tools to it.

    If the code itself cannot be copyrighted, then how it plays into the IP required to participate in open source becomes the issue

  • @yosh

    This is a bit different than those examples, though. In the case of code, we're talking about the source code itself, regardless of further application of tools to it.

    If the code itself cannot be copyrighted, then how it plays into the IP required to participate in open source becomes the issue

    @soph

    The funniest outcome for sure is that the resulting ruling would make all software defacto illegal.

    "To the maintainers of this open-source project. We are the big co legal department. We would like to get your written sign-off that no 'AI assistive tooling' has ever been used in this project. It is important for our supply chain. We expect a reply within 5 days."

    If anything AI has touched becomes devoid of legal protections, then that would probably implode the tech sector overnight.

  • @soph

    The funniest outcome for sure is that the resulting ruling would make all software defacto illegal.

    "To the maintainers of this open-source project. We are the big co legal department. We would like to get your written sign-off that no 'AI assistive tooling' has ever been used in this project. It is important for our supply chain. We expect a reply within 5 days."

    If anything AI has touched becomes devoid of legal protections, then that would probably implode the tech sector overnight.

    @yosh

    I think you're maybe saying something a bit more grandiose than what I'm trying to get at, which is around the code being generated itself by AI.

    Here I'm not talking like autogenerated version bumps, but really truly stuff trained on unknown IP.

    If some projects need to rollback and try again, I don't think that would be devastating. Sure, newer projects might suffer, but there was plenty of waste when the hype was around stuff like blockchain, too.

  • @yosh

    I think you're maybe saying something a bit more grandiose than what I'm trying to get at, which is around the code being generated itself by AI.

    Here I'm not talking like autogenerated version bumps, but really truly stuff trained on unknown IP.

    If some projects need to rollback and try again, I don't think that would be devastating. Sure, newer projects might suffer, but there was plenty of waste when the hype was around stuff like blockchain, too.

    @soph

    I guess what Im trying to get at is that if *any* amount of AI code is considered uncopyrightable, that would become a poison pill for any project that has had any amount of AI code contributed to it.

    It's not like every line of code authored by an LLM has a label that says: "I was written by an LLM." If I'm not mistaken there are OSS projects like the Linux kernel which will accept PRs that were partially authored by LLMs. I don't see how that could be untangled.

  • @soph

    I guess what Im trying to get at is that if *any* amount of AI code is considered uncopyrightable, that would become a poison pill for any project that has had any amount of AI code contributed to it.

    It's not like every line of code authored by an LLM has a label that says: "I was written by an LLM." If I'm not mistaken there are OSS projects like the Linux kernel which will accept PRs that were partially authored by LLMs. I don't see how that could be untangled.

    @yosh @soph I think the issue at hand is that they may have very well screwed the pooch by doing that

  • @soph

    I guess what Im trying to get at is that if *any* amount of AI code is considered uncopyrightable, that would become a poison pill for any project that has had any amount of AI code contributed to it.

    It's not like every line of code authored by an LLM has a label that says: "I was written by an LLM." If I'm not mistaken there are OSS projects like the Linux kernel which will accept PRs that were partially authored by LLMs. I don't see how that could be untangled.

    @yosh @soph the funniest timeline would be someone leaving a FAANG-type company with all the code made after 2025 and be relaxed in court with the argument that since it was mostly LLM written, it’s not copyrighted

  • @soph

    I guess what Im trying to get at is that if *any* amount of AI code is considered uncopyrightable, that would become a poison pill for any project that has had any amount of AI code contributed to it.

    It's not like every line of code authored by an LLM has a label that says: "I was written by an LLM." If I'm not mistaken there are OSS projects like the Linux kernel which will accept PRs that were partially authored by LLMs. I don't see how that could be untangled.

    @yosh @soph That part doesn't really seem like a problem, honestly. As I understand.

    It's already the case that Linux kernel contributors (like most OSS projects) retain copyright on their contributions. The "linux kernel" can't sue anyone for copyright infringement; only the specific copyright holders, for the code they own.

    A particular contributor's contributions being public domain presumably is similar as far as actual copyright enforcement to that person not being interested in joining as a plaintiff in a copyright lawsuit.

    (Of course, if the LLM's output were found to be *infringing* that could be a bigger problem.)

  • @yosh @soph That part doesn't really seem like a problem, honestly. As I understand.

    It's already the case that Linux kernel contributors (like most OSS projects) retain copyright on their contributions. The "linux kernel" can't sue anyone for copyright infringement; only the specific copyright holders, for the code they own.

    A particular contributor's contributions being public domain presumably is similar as far as actual copyright enforcement to that person not being interested in joining as a plaintiff in a copyright lawsuit.

    (Of course, if the LLM's output were found to be *infringing* that could be a bigger problem.)

    @yosh @soph Or perhaps rather it *is* a problem, but it's an existing problem, not a new one. For most projects.

    The GNU project in contrast generally wants copyright assignment from contributors exactly to help avoid this sort of issue with license enforcement: https://www.gnu.org/licenses/why-assign.html

  • @soph

    Ah ok! In practice I expect there is likely going to be a pretty big difference between the two.

    Once you get down to brass tacks: if a human is the one driving then it becomes hard to come up with language that does ban LLMs, but does not also ban things like compilers and digital cameras.

    Because both of those are also instances of: "I pressed a button and it automatically generated binary output – none of which was produced directly by me."

    @yosh @soph From a copyright perspective object code is a direct translation of the human-written source code. It's a 'derivative work', like any other translation. In copyright, there is a distinction between the idea and the expression of the idea, so saying 'I have an idea for some code' aand then the AI does the work, the work is not copyrightable. It may, however, be basically plagiarizing other work in the process. One expects this will not be last we hear from lawyers.


Gli ultimi otto messaggi ricevuti dalla Federazione
Post suggeriti