MiniMax M3 launches with 1M context and 59.0 SWE-Bench Pro
MiniMax shipped M3 with a 1M-token context window, native multimodal input, and frontier coding claims across SWE-Bench Pro, Terminal Bench, and MCP Atlas. It also appeared on OpenRouter, Ollama Cloud, Venice, Hermes, Cline, Together, and Arena on day one.

TL;DR
- MiniMax shipped M3 with three headline claims in one model, frontier coding and agentic scores, a 1M-token context window via MSA, and native multimodal image and video input, according to MiniMax_AI's launch post and the official launch post.
- The benchmark sheet is unusually broad for a day-one open-weight launch: MiniMax_AI's launch post listed 59.0% on SWE-Bench Pro, 66.0% on Terminal-Bench 2.1, 34.8% on SWE-fficiency, 28.8% on KernelBench Hard, and 74.2% on MCP Atlas.
- Distribution was immediate and wide, with OpenRouter, ollama, AskVenice, cline, and Teknium all posting day-one availability.
- MiniMax is pitching M3 as cheap long-context infrastructure: bridgemindai highlighted $0.30 input and $1.20 output during the first-week discount, while the official pricing docs add that pricing doubles above 512K input tokens and that public >512K access is still limited.
- The model is live now, but the open-weight part is still partially deferred, because MiniMax_AI's launch post said weights and the technical report are coming in about 10 days, a timeline MiniMax_AI's reply repeated later.
You can read the full launch post, check the model page where MiniMax says 1M context has a guaranteed minimum of 512K, and inspect the pay-as-you-go pricing, which quietly shows the first-week discount and the higher price tier above 512K. The launch post also claims M3 can operate a desktop computer, and says one internal Hopper kernel optimization run lasted about 24 hours, made 147 benchmark submissions, and used 1,959 tool calls.
MSA and 1M context
The architectural pitch is MSA, short for MiniMax Sparse Attention. In the official launch post, MiniMax says MSA gives M3 a 1M-token window, cuts per-token compute at 1M context to 1/20 of the previous generation, and speeds up prefilling by more than 9x and decoding by more than 15x.
The company is also drawing a line between headline context and practical access. The official model page says the API supports up to 1M tokens with a guaranteed minimum of 512K, while the pricing docs say input above 512K is available only in limited quantity for a limited time.
Coding and agent benchmarks
The main benchmark card is coding and terminal work. MiniMax_AI's launch post puts M3 at:
- SWE-Bench Pro: 59.0%
- Terminal-Bench 2.1: 66.0%
- SWE-fficiency: 34.8%
- KernelBench Hard: 28.8%
- MCP Atlas: 74.2%
The official launch post adds a few more comparison claims that were not in the tweet: M3 surpasses GPT-5.5 and Gemini 3.1 Pro on SWE-Bench Pro, surpasses Opus 4.7 on SVG-Bench, scores above Gemini 3.1 Pro on OmniDocBench, and tops Claw-Eval.
Some early reaction focused on price-performance rather than taking every benchmark at face value. bridgemindai's SWE-Bench post noted that SWE-Bench Pro is contaminated, but still framed the combination of score and price as the eye-catching part.
Day-one rollout
MiniMax did the thing model vendors keep promising and rarely deliver, broad day-one distribution. Within the first hour, AskVenice said M3 was live anonymously on Venice, ollama put it on Ollama Cloud with Claude Code and Codex launch commands, and OpenRouter listed it on OpenRouter.
The rollout kept spreading across agent and eval surfaces. Teknium said it appeared automatically in Hermes Agent, cline said it was free to try in Cline, arena put it into Text, Vision, Document, and Code Arena, and togethercompute said Together was handling inference.
- Venice: anonymous access, per AskVenice
- Ollama Cloud:
ollama launch claude --model minimax-m3:cloud, per ollama's command list - OpenRouter: day-one routing and a first-week discount, per OpenRouter and MiniMax_AI's OpenRouter post
- Hermes Agent on Nous Portal: auto-added in the model picker, per Teknium
- Cline: free trial access, per cline
- Arena: live across multiple arenas, per arena
The official web surfaces expanded too. M3 has a live OpenRouter model page and a Vercel AI Gateway listing with reasoning, tool use, vision, file input, and implicit caching flags.
MiniMax Code
MiniMax paired the model launch with a product update to MiniMax Code. The official launch post says MiniMax Code was designed specifically for M3 and trained together with it.
The product claims are squarely in agent harness territory:
- Agent Team breaks large tasks into multi-stage, concurrent, dynamically adjustable workflows, per the official launch post
- A Producer + Verifier loop continuously produces, reflects, and corrects during execution, per the official launch post
- Persistent Memory remembers what users have shared, according to testingcatalog's post
- Evolving Skills turn repeated collaboration into reusable skills, according to testingcatalog's post
- Unified Billing ties the product to Token Plan, according to testingcatalog's post
The launch post also slips in two internal long-horizon demos. One had M3 reproduce the core experiments from an ICLR 2025 Outstanding Paper-winning paper over nearly 12 hours, producing 18 commits and 23 figures, while another had it optimize a Hopper FP8 GEMM kernel over roughly 24 hours, with 147 benchmark submissions and a 9.4x speedup from the original version.
Pricing and weight timing
The cheapest number circulating on launch day was real, but it came with two footnotes. The official pay-as-you-go page lists MiniMax-M3 at $0.30 per million input tokens, $1.20 per million output tokens, and $0.06 per million cached reads during a 7-day 50% off window for prompts up to 512K tokens.
Above 512K input tokens, the same pricing page jumps to $1.20 input, $4.80 output, and $0.24 cached reads per million tokens, and says public availability is expected in the next few days. Separately, the Token Plan docs map M3 usage to about 1.633B monthly tokens on Plus, 5.053B on Max, and 9.796B on Ultra.
That leaves the biggest caveat on the launch framing. MiniMax is calling M3 the first open-weight model with this combination of capabilities, but the same launch post says the weights and technical report will arrive in about 10 days. Day one is a hosted-model launch with unusually wide distribution, while the actual weight release is still on the clock.