A few words on DS4

· ai devtools · Source ↗

TLDR

  • antirez built DwarfStar 4, a focused local inference runtime for DeepSeek v4 Flash, in one week, calling it the first local model he’d trust for serious work.

Key Takeaways

  • DS4 targets DeepSeek v4 Flash using a 2/8-bit asymmetric quant recipe; runs in ~80-96GB RAM on high-end Macs or DGX Spark hardware.
  • antirez frames this as model-agnostic long-term: DS4 will track the best “practically fast” open-weights model, with planned variants like ds4-coding, ds4-legal, ds4-medical.
  • Roadmap includes quality benchmarks, a bundled coding agent, CI hardware, more backend ports, and both serial and parallel distributed inference.
  • Vector steering is a first-class feature enabling freer, less restricted model interaction compared to typical local inference setups.
  • antirez explicitly argues: “AI is too critical to be just a provided service.”

Hacker News Comment Review

  • Commenters largely confirm the quality gap claim: multiple independent testers found DS4/DeepSeek v4 Pro competitive with Claude Sonnet at coding, though Opus still leads on harder benchmarks per Kilo agent tests.
  • The model-specific runtime vs. llama.cpp debate is live: critics note duplicated effort as PRs land against both codebases, while supporters point to the focused Metal/CUDA/ROCm backend design and tighter single-model optimization.
  • AMD ROCm support exists on a separate branch but lacks direct maintainer hardware; Strix Halo 128GB unified RAM users are watching but no confirmed results yet.

Notable Comments

  • @simonw: Confirmed painless setup on 128GB M5; model fits in ~80GB RAM with strong code and tool-execution performance.
  • @0xbadcafebee: Raises sustainability concern – building a model-specific engine risks obsolescence and splits contributor attention with llama.cpp.

Original | Discuss on HN