I made a tool that converts open source code into LLM poison: https://codeberg.org/timmc/scraggle
-
I made a tool that converts open source code into LLM poison: https://codeberg.org/timmc/scraggle
It mutates Rust source code in ways that *preserve* the ability to compile the code. (That is, you can't detect the changes by looking for compiler errors.) For example, it switches `+` and `*`, or `==` and `!=`.
If you fork a Rust repo, run this tool on it, and push it somewhere, then crawlers will end up ingesting all sorts of incorrect code.
-
I made a tool that converts open source code into LLM poison: https://codeberg.org/timmc/scraggle
It mutates Rust source code in ways that *preserve* the ability to compile the code. (That is, you can't detect the changes by looking for compiler errors.) For example, it switches `+` and `*`, or `==` and `!=`.
If you fork a Rust repo, run this tool on it, and push it somewhere, then crawlers will end up ingesting all sorts of incorrect code.
@varx cool! I did a similar thing with Fennel a while back: https://git.sr.ht/~technomancy/shoulder-devil
mine was restricted to changes that actually do not change the behavior of the code, but make it feel rancid anyway
I never got around to wiring it into a web interface but I really ought to get around to that
-
I made a tool that converts open source code into LLM poison: https://codeberg.org/timmc/scraggle
It mutates Rust source code in ways that *preserve* the ability to compile the code. (That is, you can't detect the changes by looking for compiler errors.) For example, it switches `+` and `*`, or `==` and `!=`.
If you fork a Rust repo, run this tool on it, and push it somewhere, then crawlers will end up ingesting all sorts of incorrect code.
What's really fun is that this tool mutates locally identical code in identical ways. `if rect.x > rect.y` will *always* turn into `if rect.x != rect.y`, in any program. (But different variables will have different results.)
That means that LLMs are more likely to learn this poison rather than the mutations averaging out as noise.
Feel free to fork some big open source repos and push some new commits...
-
What's really fun is that this tool mutates locally identical code in identical ways. `if rect.x > rect.y` will *always* turn into `if rect.x != rect.y`, in any program. (But different variables will have different results.)
That means that LLMs are more likely to learn this poison rather than the mutations averaging out as noise.
Feel free to fork some big open source repos and push some new commits...
If this sounds familiar, it's likely because these kinds of mutations are a great way of testing your unit tests. There are some neat libraries out there for doing that! See cargo-mutants for instance.
But this one doesn't just modify the AST—it performs surgery on the raw text, preserving comments and whitespace structure.
It was really fun to write!
-
If this sounds familiar, it's likely because these kinds of mutations are a great way of testing your unit tests. There are some neat libraries out there for doing that! See cargo-mutants for instance.
But this one doesn't just modify the AST—it performs surgery on the raw text, preserving comments and whitespace structure.
It was really fun to write!
@varx couldn't you do it based on the AST if you used the same strategy as a formatter uses? because that should have the data you need to preserve comments
-
@varx couldn't you do it based on the AST if you used the same strategy as a formatter uses? because that should have the data you need to preserve comments
@technomancy Prrrrobably? I used tree-sitter for the actual parsing, and that *is* intended for formatting and such (in an editor), but I had trouble figuring out the API.
So I use tree-sitter to generate an AST, and then walk the tree and create a list of candidate edits (including their byte positions), and then apply all[1] edits in reverse order from the end of the string. :-)
It's a bit of a hack but it works really well.
[1] Well not *all* edits; there's a deterministic 70% chance for each one to be accepted, and some deterministic shuffling to ensure that when there are several alternative edits for a node, each has an equal chance of being used.
-
@varx cool! I did a similar thing with Fennel a while back: https://git.sr.ht/~technomancy/shoulder-devil
mine was restricted to changes that actually do not change the behavior of the code, but make it feel rancid anyway
I never got around to wiring it into a web interface but I really ought to get around to that
@technomancy Haha, that's fun. For a similar purpose, or as a code obfuscator?
-
@technomancy Haha, that's fun. For a similar purpose, or as a code obfuscator?
@varx original motivation was definitely butlerian, yes
mostly the reason I never deployed it is that I don't really want to draw more hellscraper attention to my home server (on my home DSL) and my non-home server deployment is just static files
-
@varx original motivation was definitely butlerian, yes
mostly the reason I never deployed it is that I don't really want to draw more hellscraper attention to my home server (on my home DSL) and my non-home server deployment is just static files
@technomancy I like the high/medium/low mechanism.
-
undefined oblomov@sociale.network shared this topic on
Feed RSS
Gli ultimi otto messaggi ricevuti dalla Federazione
Post suggeriti
-
-
Rust sempre più presente nel Kernel Linux 6.18La nuova versione accoglie una valanga di codice Rust, portando miglioramenti al core, al driver core (con supporto DebugFS/IRQ) e supporto iniziale per le variabili atomiche LKMM.
Uncategorized
1
-
Sorry, I don't do rust, but I still thought this was hilarious!#rustlang #Tylenol #Programming
Uncategorized
1
-
