Alignment whack-a-mole: Finetuning activates recall of copyrighted books in LLMs

· ai books coding · Source ↗

TLDR

  • Paper shows finetuning GPT-4o, Gemini-2.5-Pro, and DeepSeek-V3.1 on plot-summary-to-excerpt tasks unlocks verbatim recall of copyrighted books despite alignment guardrails.

Key Takeaways

  • Researchers finetune models via OpenAI, Vertex AI, and Tinker APIs using EPUB-derived instruction pairs: “Write a N-word excerpt emulating [Author] about [plot summary].”
  • Four memorization metrics are defined: BMC@k (book-level word coverage), longest contiguous memorized block, longest regurgitated span, and span count above threshold T.
  • 100 completions per excerpt at temperature 1.0 are sampled; cross-excerpt and cross-model Jaccard similarity analysis identifies which text regions are memorized and by which models.
  • The repo withholds full book content and generations because outputs contain large verbatim passages, directly illustrating the legal exposure the paper documents.
  • Alignment suppresses recall in base inference but finetuning on style-imitation tasks reactivates memorized training data, the “whack-a-mole” dynamic.

Hacker News Comment Review

  • Commenters see a clear path to downstream legal liability: a Napster-style reckoning is anticipated once a successful copyright suit targets an end user redistributing LLM output, not just the model provider.
  • Discussion splits on root cause: some blame training data sourced from shadow libraries, others argue the real issue is copyright term length making works like Lord of the Rings still protected decades later.
  • Practical reproducibility is noted: Claude prompted with the Hobbit opening immediately continues verbatim, confirming the recall behavior extends beyond finetuned models to production-aligned ones.

Notable Comments

  • @TFNA: Shadow library contributor says the prospect of querying LLMs on obscure field-specific content has motivated better OCR quality in their uploads.
  • @red75prime: Surfaces the exact elicitation prompt format used, showing how thin the instruction wrapper is between a plot summary and verbatim recall.

Original | Discuss on HN