4TB of voice samples just stolen from 40k AI contractors at Mercor

Apr 27, 2026 · ai databases · Source ↗

TLDR

Lapsus$ leaked 4TB from AI labeling platform Mercor: voice samples plus ID scans for 40,000+ contractors, a ready-made deepfake kit.

Mercor’s onboarding captured passport/license scan, webcam selfie, and 2-5 min studio-quality voice recording in a single database row per contractor.
Off-the-shelf voice cloning needs ~15 seconds of clean audio; leaked recordings average 2-5 minutes each, well past the clone threshold.
Documented attack paths: bank voiceprint bypass, payroll-redirect vishing, Arup-style deepfake video calls, insurance fraud, and grandparent emergency scams.
Five contractor lawsuits filed within 10 days argue Mercor collected biometric voice prints without disclosing their permanent identifier status.
Mitigation: delete voice enrollments on Google Voice Match, Alexa Voice ID, Apple Personal Voice, and banking apps; opt out of voiceprint auth in writing; set verbal codewords with family and finance contacts.

The top criticism targets ORAVYS directly: their free forensic offer for breach victims requires submitting voice audio to yet another AI company, recreating the same consent gap that caused the breach.
Commenters converge on weak corporate liability as the structural root cause; at least one argues the collection pipeline looks more like a deliberate bulk biometric harvest than a legitimate AI training operation.
The thread surfaces a useful reframe: biometrics should be treated as unrotatable “forever passwords,” which changes the risk calculus for anyone still enrolled in banking or smart-home voice verification.

@eqvinox: Invokes “Datensparsamkeit” (German: data frugality) as the only real defense – data never collected cannot be stolen.