cua-driver lets AI agents click, type, and verify in any macOS app in the background without interrupting the user’s cursor or active focus.
Key Takeaways
Works on non-AX surfaces including Chromium web content, Blender, Figma, DAWs, and game engines where standard accessibility APIs fail.
Ships with CLI and MCP server integration for Claude Code and Cursor; every session records as a replayable trajectory for debugging and RL training.
The broader cua SDK offers one Python API for Linux, macOS, Windows, and Android sandboxes, local via QEMU or cloud-hosted.
Lume manages macOS and Linux VMs on Apple Silicon using Apple’s Virtualization.Framework with near-native performance.
Core is MIT licensed; optional cua-agent[omni] pulls in ultralytics under AGPL-3.0, which matters for any commercial deployment.
Hacker News Comment Review
An ex-Apple engineer validated the technical approach but flagged telemetry enabled by default as a trust issue; opt-in is the standard expectation for automation tooling.
Commenters questioned what separates this from a general macOS automation library, and noted Codex computer-use plans similar background control for Windows, confirming cross-platform demand exists.
One commenter abandoned Lume VMs in favor of direct supervised device access, surfacing a real trade-off between sandbox isolation and day-to-day workflow friction.
Notable Comments
@dtran: wonders whether Apple’s limited agent-facing APIs will push builders toward agent-friendlier Linux or Android alternatives if Apple does not open up.