It Only Takes a Handful of Samples To Poison Any Size LLM, Anthropic FindsIt stands to reason that if you have access to an LLMâs training data, you can influence whatâs coming out the other end of the inscrutable AIâs network. The obvious guess is that youâd need some percentage of the overall input, though exactly how much that was â 2%, 1%, or less â was an active research question. New research by Anthropic, the UK AI Security Institute, and the Alan Turing Institute shows it is actually a lot easier to poison the well than that.Weâre talking parts-per-million of poison for large models, because the researchers found that with just 250 carefully-crafted poison pills, they could compromise the output of any size LLM. Now, when we say poison the model, weâre not talking about a total hijacking, at least in this study. The specific backdoor under investigation was getting the model to produce total gibberish.The gibberish here is triggered by a specific phrase, seeded into the poisoned training documents. One might imagine an attacker could use this as a crude form of censorship, or a form of Denial of Service Attack â say the poisoned phrase is a web address, then any queries related to that address would output gibberish. In the tests, they specifically used the word âsudoâ, rendering the models (which ranged from 600 million to 13 billion parameters) rather useless for POSIX users. (Unless you use âdoasâ under *BSD, but if youâre on BSD you probably donât need to ask an LLM for help on the command line.)Our question is: Is it easier to force gibberish or lies? A denial-of-service gibberish attack is one thing, but if a malicious actor could slip such a relatively small number of documents into the training data to trick users into executing unsafe code, thatâs something entirely worse. Weâve seen discussion of data poisoning before, and that study showed it took a shockingly small amount of misinformation in the training data to ruin a medical model.Once again, the old rule rears its ugly head: âtrust, but verifyâ. If youâre getting help from the internet, be it random humans or randomized neural-network outputs, itâs on you to make sure that the advice youâre getting is sane. Even if you trust Anthropic or OpenAI to sanitize their training data, remember that even when the data isnât poisoned, there are other ways to exploit vibe coders. Perhaps this is what happened with the whole âseahorse emojiâ fiasco.hackaday.com/2025/12/14/it-onlâŠ