Mistral Medium 3.5

Apr 29, 2026 · ai ai-agents systems · Source ↗

TLDR

Mistral ships Medium 3.5, a 128B dense open-weights model with 77.6% SWE-Bench Verified, plus async cloud coding agents in Vibe and a new Work mode in Le Chat.

Medium 3.5 is a 128B dense model (not MoE), 256k context, configurable reasoning effort per request, trained vision encoder from scratch for variable image sizes.
Self-hostable on as few as four GPUs; open weights on Hugging Face under modified MIT license; API pricing at $1.5/$7.5 per million input/output tokens.
Vibe remote agents run async in isolated cloud sandboxes, spawn from CLI or Le Chat, support parallel sessions, and can open GitHub PRs when done; local sessions can be teleported to cloud mid-run.
Work mode in Le Chat runs a multi-step agent with connectors on by default, parallel tool calls, and explicit approval gates before sensitive writes like sending messages or modifying data.
Scores 91.4 on tau-bench Telecom and 77.6% SWE-Bench Verified, ahead of Devstral 2 and Qwen3.5 397B A17B on Mistral’s reported evals.

The dominant self-hosting angle: at Q4 quantization, 128B dense fits in roughly 70GB VRAM, putting it in Mac Studio territory (~$3,500 for 128GB), which is the real differentiator versus GLM 5.1 (~400GB Q4) or Kimi K2.5 (~600GB Q4).
antirez raises the sharpest counter: DeepSeek v4 Flash at 2-bit runs 30 t/s generation and 400 t/s prefill on an M3 Ultra, has reliable tool calling, and at that speed a 120B dense model structurally cannot compete on throughput per dollar for local agent use.
Broader consensus is that Mistral shipping a credible independent model matters for market diversity and pricing leverage, but several commenters note it does not beat frontier models and the gap between top labs and everyone else is widening in agentic workflows.

@antirez: argues DeepSeek v4 Flash at 2-bit is the actual local-agent baseline Medium 3.5 must beat, not other large dense models.
@seb_lz: notes the pricing discontinuity – previous mistral-medium-2508 was $0.4/$2 per 1M tokens; Medium 3.5 is nearly 4x more expensive on input.
@deferredgrant: “Buyers need more than a two-company choice if they want pricing and deployment leverage.”