I won a championship that doesn't exist

· ai security web · Source ↗

TLDR

  • A $12 domain registration plus one Wikipedia self-citation laundered a fabricated 6 Nimmt! world championship into confident LLM answers.

Key Takeaways

  • The attack has three stacked layers: retrieval poisoning (immediate), training corpus absorption (months to years if the Wikipedia edit survives a scrape), and agent-layer exploitation (where bad retrieved facts become bad actions with real permissions).
  • Circular citation is the core mechanism: Wikipedia cites 6nimmt.com, 6nimmt.com echoes Wikipedia, two signals appear independent but share one source.
  • LLMs with web search inherit the trustworthiness of whatever ranks highest for a query; SEO poisoning predates LLMs but is now piped directly into confident-sounding generation.
  • Narrow query spaces are the sweet spot: a topic with ~10 indexed sources means a single well-placed edit dominates retrieval entirely.
  • Mitigation signals the author flags: provenance independence scoring as a first-class product feature, heuristic filters for Wikipedia edits citing freshly registered domains, and skepticism toward parallel phrasing across sources as a derivation signature.

Hacker News Comment Review

  • Commenters largely agreed the retrieval-layer attack is real but pushed back on framing it as LLM-specific: the same fabricated source retrieved via plain Google would fool a human researcher just as easily, which cuts both ways on severity claims.
  • The more unsettling thread was about hallucination compounding the attack: one LLM volunteered fabricated details about the tournament qualification circuit that did not appear even in the poisoned source, showing the model extrapolates confidently beyond what it retrieved.
  • The Wikipedia policy angle prompted debate: a press release from a newly registered domain already fails the reliable-sources policy, so the edit should have been caught by ordinary review, not a new LLM-aware filter. Under-resourced moderation, not a policy gap, is the real bottleneck.

Notable Comments

  • @simonw: Named a whale “Teresa T” with just a blog post and a YouTube caption in Sep 2024; search-enabled LLMs including Google previews confidently repeated it for weeks – no Wikipedia edit required.
  • @xeeeeeeeeeeenu: Poisoning works best by filling vacuums, not contradicting known facts – manufacturing new fake stories is far more efficient than distorting real ones.
  • @justusthane: An LLM offered to explain the qualification circuit for a tournament that doesn’t exist – pure hallucination layered on top of a poisoned retrieval, compounding the deception.

Original | Discuss on HN