Five publishers and Scott Turow sued Meta and Zuckerberg, alleging Meta torrented 267 TB of pirated material including LibGen to train Llama.
Key Takeaways
Lawsuit filed May 5 in SDNY by Hachette, Macmillan, McGraw Hill, Elsevier, and Cengage alongside author Scott Turow as a proposed class action.
Meta reportedly discussed a $200M dataset licensing budget in early 2023, then abruptly halted licensing after escalation to Zuckerberg, with one employee noting licensing once would undermine a fair-use defense.
Suit alleges Meta stripped copyright management information (CMI) from ingested works to conceal training sources, which goes beyond typical fair-use arguments.
Meta’s prior fair-use win (Judge Chhabria, June 2025, Silverman/Diaz case) covered transformative training use, but the Anthropic precedent found pirating the source material itself is separately infringing.
Meta did sign licensing deals with African-language publishers and news outlets (Fox News, CNN, USA Today), undermining a “no market exists” fair-use argument.
Hacker News Comment Review
Commenters are split on fair use: some see AI training as clearly transformative, others argue deliberately torrenting LibGen and stripping CMI puts Meta outside fair-use protection regardless of the training-use question.
The personal liability angle draws significant attention. Naming Zuckerberg directly is seen as a meaningful legal move, though corporate liability would exist either way. The Aaron Swartz comparisons are pointed and recurring.
Several builders report first-hand evidence of aggressive Meta scraping behavior, including ASN-level blocks after Meta ignored robots.txt across rotating netblocks.
Notable Comments
@jcalvinowens: Blocked Meta’s ASN on his personal cgit server after hundreds of megabytes of logs from Meta rotating netblocks to defeat IP limiting.
@ben_w: Cites Anthropic settlement (~$3K per pirated work, $1.5B total on 500K works) as the damages precedent most relevant here.