Show HN: CPU-only transcription for YouTube, TikTok, X, Instagram videos

· hardware ai media · Source ↗

TLDR

  • yapsnap transcribes any yt-dlp-supported video URL or local audio file to plaintext using a CPU-only streaming Zipformer transducer, no GPU or API required.

Key Takeaways

  • Uses Kroko English INT8 ONNX model (~80 MB, cached after first run) via sherpa-onnx; runs at several times realtime on a laptop CPU.
  • Default 1.5x atempo speedup (pitch-preserved) cuts transcription time by ~33%; adjustable with --speed.
  • Sentence-level [MM:SS] timestamps scale back to original-audio time even at elevated speed factors.
  • Single Python module with three deps: sherpa-onnx, numpy, yt-dlp. ffmpeg required on PATH for decode.
  • English-only by default; other sherpa-onnx streaming transducers loadable via --model or KROKO_MODEL env var.

Hacker News Comment Review

  • No substantive HN discussion yet.

Original | Discuss on HN