A $12 domain registration plus one Wikipedia self-citation laundered a fabricated 6 Nimmt! world championship into confident LLM answers.
Key Takeaways
The attack has three stacked layers: retrieval poisoning (immediate), training corpus absorption (months to years if the Wikipedia edit survives a scrape), and agent-layer exploitation (where bad retrieved facts become bad actions with real permissions).
Circular citation is the core mechanism: Wikipedia cites 6nimmt.com, 6nimmt.com echoes Wikipedia, two signals appear independent but share one source.
LLMs with web search inherit the trustworthiness of whatever ranks highest for a query; SEO poisoning predates LLMs but is now piped directly into confident-sounding generation.
Narrow query spaces are the sweet spot: a topic with ~10 indexed sources means a single well-placed edit dominates retrieval entirely.
Mitigation signals the author flags: provenance independence scoring as a first-class product feature, heuristic filters for Wikipedia edits citing freshly registered domains, and skepticism toward parallel phrasing across sources as a derivation signature.
Hacker News Comment Review
Commenters largely agreed the retrieval-layer attack is real but pushed back on framing it as LLM-specific: the same fabricated source retrieved via plain Google would fool a human researcher just as easily, which cuts both ways on severity claims.
The more unsettling thread was about hallucination compounding the attack: one LLM volunteered fabricated details about the tournament qualification circuit that did not appear even in the poisoned source, showing the model extrapolates confidently beyond what it retrieved.
The Wikipedia policy angle prompted debate: a press release from a newly registered domain already fails the reliable-sources policy, so the edit should have been caught by ordinary review, not a new LLM-aware filter. Under-resourced moderation, not a policy gap, is the real bottleneck.
Notable Comments
@simonw: Named a whale “Teresa T” with just a blog post and a YouTube caption in Sep 2024; search-enabled LLMs including Google previews confidently repeated it for weeks – no Wikipedia edit required.
@xeeeeeeeeeeenu: Poisoning works best by filling vacuums, not contradicting known facts – manufacturing new fake stories is far more efficient than distorting real ones.
@justusthane: An LLM offered to explain the qualification circuit for a tournament that doesn’t exist – pure hallucination layered on top of a poisoned retrieval, compounding the deception.