releaseMarch 20, 2026

Mistral releases Small 4 with 256K context, image input, and $0.15/$0.6 pricing

Mistral Small 4 combines reasoning and non-reasoning modes in one 119B MoE, adds native image input, and expands context to 256K at $0.15/$0.6 per million tokens. It improves sharply over Small 3.2, but still trails similarly sized open peers on several evals.

3 min read

Mistral releases Small 4 with 256K context, image input, and $0.15/$0.6 pricing

TL;DR

Mistral Small 4 is a 119B MoE with 6.5B active parameters per token, and Artificial Analysis says it combines reasoning and non-reasoning modes in one model rather than splitting them into separate variants launch thread.
The release adds native image input, expands context to 256K tokens, and is priced at $0.15 per 1M input tokens and $0.60 per 1M output tokens according to full breakdown.
On Artificial Analysis' evals, reasoning mode reaches 27 on the Intelligence Index, up from 15 for Small 3.2, while non-reasoning mode scores 19; the same thread says GDPval-AA rises to 871 Elo from 339 on the prior model launch thread.
The tradeoff is that Small 4 still trails similarly sized open-weight peers on headline intelligence benchmarks, though Artificial Analysis says it is more token-efficient at about 52M output tokens for its reasoning run intelligence tradeoff token efficiency.

What shipped in Small 4

Artificial Analysis describes Small 4 as a multimodal open-weights release with "hybrid reasoning" in a single model, meaning engineers can switch between reasoning and non-reasoning behavior without changing to a separate checkpoint launch thread. The same post says the model takes image and text input, produces text output, and doubles context from 128K in Small 3.2 to 256K.

The implementation details are concrete enough to matter for deployment. Small 4 is listed at $0.15/$0.60 per million input/output tokens, licensed under Apache 2.0, and available through Mistral's first-party API; Artificial Analysis' model page also notes strong throughput, summarizing output speed at 151.2 tokens per second full breakdown. The thread adds a self-hosting caveat: at native FP8, the 119B-parameter weights need about 119GB, which is more than the 80GB on a single H100 launch thread.

How does it compare with Small 3.2 and open peers?

The clearest gain is over Mistral's own prior small model. Artificial Analysis says reasoning mode jumps 12 points on its Intelligence Index, from 15 on Small 3.2 to 27 on Small 4, while non-reasoning mode reaches 19 launch thread. On agentic work, the same source reports GDPval-AA improving from 339 Elo to 871, putting Small 4 close to Mistral Large 3 at 880.

The peer comparison is more mixed. Artificial Analysis says 27 still trails open models in the same size class, including gpt-oss-120B at 33, Nemotron 3 Super 120B A12B at 36, and Qwen3.5 122B A10B at 42 intelligence tradeoff. On multimodal evals, Small 4 scores 57% on MMMU-Pro, ahead of Mistral Large 3 at 56% but well behind Qwen3.5's 75%, and on hallucination the model's -30 AA-Omniscience score is better than the comparable open peers cited in the thread full breakdown. Artificial Analysis also says its reasoning run used about 52M output tokens versus roughly 78M, 110M, and 91M for those three peers, suggesting a cheaper reasoning profile even if the absolute benchmark ceiling is lower token efficiency.

TL;DR

What shipped in Small 4

How does it compare with Small 3.2 and open peers?

Discussion across the web