TLDR
-
IEEE S&P paper shows imperceptible adversarial audio clips can hijack large audio-language models with 79-96% success across 13 models.
Key Takeaways
-
Technique dubbed AudioHijack embeds inaudible malicious instructions in audio clips; attacks are context-agnostic and reusable against the same model.
-
Tested against 13 open models plus commercial services from Microsoft and Mistral; demonstrated web searches, file downloads, and email exfiltration.
-
Training an attack signal takes ~30 minutes; once trained it works regardless of what the legitimate user says alongside the audio.
-
Real-world vectors include poisoned YouTube/music clips, voice notes, and live Zoom audio fed to AI transcription services.
-
Common defenses failed: few-shot instruction examples reduced success by only 7%, self-reflection caught only 28% of attacks.
Hacker News Comment Review
-
No substantive HN discussion yet.
Original | Discuss on HN