DeepSeek V4

· ai ai-agents coding · Source ↗

TLDR

  • DeepSeek releases V4 open-weights with 1M context as the new default: V4-Pro (1.6T total / 49B active params) rivals top closed models; V4-Flash targets speed and cost.

Key Takeaways

  • V4-Pro leads all open models in Math, STEM, and agentic coding benchmarks, trailing only Gemini-3.1-Pro on world knowledge.
  • V4-Flash (284B total / 13B active params) approaches V4-Pro reasoning quality at smaller size, faster response, and lower API cost.
  • Novel DSA (DeepSeek Sparse Attention) plus token-wise compression makes 1M context the standard default across all official DeepSeek services.
  • API migration requires only a model name update to deepseek-v4-pro or deepseek-v4-flash; both support OpenAI ChatCompletions and Anthropic-compatible APIs with Thinking/Non-Thinking modes.
  • deepseek-chat and deepseek-reasoner are fully retired after Jul 24, 2026; they currently route to V4-Flash equivalents.

Hacker News Comment Review

  • Consensus: V4-Flash is the safe production pick now. V4-Pro is capable on benchmarks but rate-limited and slow, with DeepSeek citing Ascend 950 deployment as the bottleneck before pricing drops further.
  • DeepSeek’s end-to-end bitwise-deterministic, batch-invariant inference kernels drew specific technical praise – commenters flagged this as a first among frontier-scale models, including Google.
  • The $3.48/M output price for a 1.6T-parameter model challenged the common framing that frontier labs subsidize inference at a loss; several builders argued the economics already suggest profitability.

Notable Comments

  • @throwa356262: praised the thinking_mode API docs as unusually concise – “No BS, just a concise description of exactly what I need.”
  • @jari_mustonen: flagged zero CUDA dependency – V4 runs entirely on Huawei chips, marking a complete sovereign Chinese AI stack.
  • @chenzhekl: surfaced DeepSeek’s own note that Pro throughput is constrained until Ascend 950 reaches production, after which pricing is expected to drop significantly.

Original | Discuss on HN