4TB of voice samples just stolen from 40k AI contractors at Mercor

· ai databases · Source ↗

TLDR

  • Lapsus$ leaked 4TB from AI labeling platform Mercor: voice samples plus ID scans for 40,000+ contractors, a ready-made deepfake kit.

Key Takeaways

  • Mercor’s onboarding captured passport/license scan, webcam selfie, and 2-5 min studio-quality voice recording in a single database row per contractor.
  • Off-the-shelf voice cloning needs ~15 seconds of clean audio; leaked recordings average 2-5 minutes each, well past the clone threshold.
  • Documented attack paths: bank voiceprint bypass, payroll-redirect vishing, Arup-style deepfake video calls, insurance fraud, and grandparent emergency scams.
  • Five contractor lawsuits filed within 10 days argue Mercor collected biometric voice prints without disclosing their permanent identifier status.
  • Mitigation: delete voice enrollments on Google Voice Match, Alexa Voice ID, Apple Personal Voice, and banking apps; opt out of voiceprint auth in writing; set verbal codewords with family and finance contacts.

Hacker News Comment Review

  • The top criticism targets ORAVYS directly: their free forensic offer for breach victims requires submitting voice audio to yet another AI company, recreating the same consent gap that caused the breach.
  • Commenters converge on weak corporate liability as the structural root cause; at least one argues the collection pipeline looks more like a deliberate bulk biometric harvest than a legitimate AI training operation.
  • The thread surfaces a useful reframe: biometrics should be treated as unrotatable “forever passwords,” which changes the risk calculus for anyone still enrolled in banking or smart-home voice verification.

Notable Comments

  • @eqvinox: Invokes “Datensparsamkeit” (German: data frugality) as the only real defense – data never collected cannot be stolen.

Original | Discuss on HN