Show HN: CPU-only transcription for YouTube, TikTok, X, Instagram videos

May 20, 2026 · hardware ai media · Source ↗

TLDR

yapsnap transcribes any yt-dlp-supported video URL or local audio file to plaintext using a CPU-only streaming Zipformer transducer, no GPU or API required.

Uses Kroko English INT8 ONNX model (~80 MB, cached after first run) via sherpa-onnx; runs at several times realtime on a laptop CPU.
Default 1.5x atempo speedup (pitch-preserved) cuts transcription time by ~33%; adjustable with --speed.
Sentence-level [MM:SS] timestamps scale back to original-audio time even at elevated speed factors.
Single Python module with three deps: sherpa-onnx, numpy, yt-dlp. ffmpeg required on PATH for decode.
English-only by default; other sherpa-onnx streaming transducers loadable via --model or KROKO_MODEL env var.