releaseJune 12, 2026

MiniMax opens M3 weights: 428B total, 23B active, 1M context

MiniMax published M3 weights on Hugging Face with 428B total parameters, 23B active parameters, 1M context, and multimodal support. Unsloth quickly added local GGUF builds, so teams can try 2-bit runs at 138GB RAM or VRAM and 3-bit at 165GB.

3 min read

MiniMax opens M3 weights: 428B total, 23B active, 1M context

TL;DR

MiniMax published the M3 weights on Hugging Face, and MiniMax_AI's release post sized the model at roughly 428B total parameters with about 23B active parameters.
According to the M3 model card, the model is natively multimodal across text, image, and video, and stretches to a 1M-token context; MiniMax_AI's capability summary framed that as one model for coding, long-horizon agents, and multimodal work.
UnslothAI's local-run post quickly turned the release into a local inference story, with Dynamic 2-bit GGUF builds targeting about 138GB RAM or VRAM and 3-bit builds targeting about 165GB.
The Unsloth guide adds an important caveat: the smallest practical local setups start above 130GB total memory, while better-quality quants climb well past that.

You can browse the official weights, skim the MiniMax Sparse Attention paper, and jump straight to Unsloth's GGUF release. AMD also posted day-0 support notes for Instinct GPUs, which is a fast sign that the ecosystem treated this as more than a model card upload.

Model card

The Hugging Face model card makes three concrete claims:

~428B total parameters
~23B activated parameters
1,000,000 token context

MiniMax also says M3 was trained as a native multimodal model from the start, not a text model with vision bolted on later. In the same model card, the company ties the 1M context to MiniMax Sparse Attention and claims 9x prefill and 15x decode speedups versus M2 at that length.

That combination, frontier coding, long-horizon agentic tasks, and text-image-video input, is the real pitch. Open-weight models usually force a tradeoff somewhere in that list.

Local GGUF builds

Unsloth moved first on packaging M3 for local runs. Its tweet gave the headline numbers, 138GB for Dynamic 2-bit and 165GB for 3-bit, but the full guide is more specific about the tradeoffs:

bf16 weights are about 855GB
the smallest 1-bit GGUF drops to 128GB on disk, but Unsloth says to budget at least 133GB total memory after KV cache and context allocation
2-bit lands around 148GB total memory
3-bit lands around 164GB to 200GB
4-bit starts above 213GB

The GGUF repo adds two early-adopter catches. MiniMax Sparse Attention is not supported yet in this path, so inference falls back to dense attention, and llama.cpp support was still sitting behind PR #24523 rather than a released build.

Free API window

MiniMax also spent the release window leaning on price. In MiniMax_AI's pricing post, the company said M3 would be free on PBDTokenRouter through June 17, with the rest of the MiniMax lineup 50 percent off.

That matters mostly because a 1M-context open-weight release usually arrives with a hardware bottleneck. MiniMax shipped the weights, Unsloth made local runs possible, and the company still offered a zero-cost hosted path for anyone who did not have 138GB to spare.