DeepSeek V4

Apr 24, 2026 · ai ai-agents coding · Source ↗

TLDR

DeepSeek releases V4 open-weights with 1M context as the new default: V4-Pro (1.6T total / 49B active params) rivals top closed models; V4-Flash targets speed and cost.

V4-Pro leads all open models in Math, STEM, and agentic coding benchmarks, trailing only Gemini-3.1-Pro on world knowledge.
V4-Flash (284B total / 13B active params) approaches V4-Pro reasoning quality at smaller size, faster response, and lower API cost.
Novel DSA (DeepSeek Sparse Attention) plus token-wise compression makes 1M context the standard default across all official DeepSeek services.
API migration requires only a model name update to deepseek-v4-pro or deepseek-v4-flash; both support OpenAI ChatCompletions and Anthropic-compatible APIs with Thinking/Non-Thinking modes.
deepseek-chat and deepseek-reasoner are fully retired after Jul 24, 2026; they currently route to V4-Flash equivalents.

Consensus: V4-Flash is the safe production pick now. V4-Pro is capable on benchmarks but rate-limited and slow, with DeepSeek citing Ascend 950 deployment as the bottleneck before pricing drops further.
DeepSeek’s end-to-end bitwise-deterministic, batch-invariant inference kernels drew specific technical praise – commenters flagged this as a first among frontier-scale models, including Google.
The $3.48/M output price for a 1.6T-parameter model challenged the common framing that frontier labs subsidize inference at a loss; several builders argued the economics already suggest profitability.

@throwa356262: praised the thinking_mode API docs as unusually concise – “No BS, just a concise description of exactly what I need.”
@jari_mustonen: flagged zero CUDA dependency – V4 runs entirely on Huawei chips, marking a complete sovereign Chinese AI stack.
@chenzhekl: surfaced DeepSeek’s own note that Pro throughput is constrained until Ascend 950 reaches production, after which pricing is expected to drop significantly.