DeepSeek V4 -- almost on the frontier, a fraction of the price

May 2, 2026 · ai · Source ↗

TLDR

DeepSeek releases V4-Pro (1.6T params, 49B active) and V4-Flash (284B, 13B active) with MIT license, priced far below all frontier competitors.

V4-Pro at $1.74/$3.48 per million tokens undercuts Gemini 3.1 Pro ($2/$12) and GPT-5.4 ($2.50/$15); V4-Flash at $0.14/$0.28 beats GPT-5.4 Nano.
Efficiency gains explain the pricing: V4-Pro uses only 27% of V3.2’s single-token FLOPs and 10% of KV cache at 1M-token context, via new HCA and mCH attention methods.
V4-Pro is now the largest open-weights MoE at 1.6T total parameters, surpassing Kimi K2.6 (1.1T) and DeepSeek V3.2 (685B); weights are 865GB on Hugging Face.
DeepSeek’s own benchmarks place V4-Pro roughly 3-6 months behind GPT-5.4 and Gemini 3.1 Pro on reasoning; V4-Pro-Max with extended reasoning tokens closes some of that gap.
Both models support 1M-token context; quantized Flash may run on a 128GB M5 MacBook Pro, Pro may be streamable from disk.

Commenters broadly confirm quality and cost advantages in real coding workflows, with flash handling routine refactors cheaply and pro doing architectural planning; cost per task is dramatically lower than frontier alternatives.
A practical caveat surfaces around true effective cost: Pro and K2.6 can emit heavy reasoning token streams, eroding the headline price advantage on short or moderate context tasks; official API discounts also complicate direct comparisons.
Data privacy is a recurring concern – DeepSeek’s official API makes no data privacy guarantees, and using Chinese-hosted models via OpenRouter routes data accordingly; commenters note the issue gets less scrutiny than equivalent Western provider controversies.

@antirez: live demo of V4-Flash running locally on a 128GB MacBook, confirming the local-inference path is already viable.
@rurban: asked V4-Pro to produce a working arm64 compiler port; it delivered in 30 minutes, a task previously impractical at frontier pricing.