releaseApril 12, 2026

MiniMax M2.7 supports 128 GB GGUF runs and day-0 cloud hosting

MiniMax M2.7 moved from announcement to deployment, with GGUF guidance for 128 GB local systems and same-day availability on Together, Fireworks, Hugging Face, and ModelScope. Use the local and managed serving options now, but check the non-commercial license before adopting the 230B model.

4 min read

MiniMax M2.7 supports 128 GB GGUF runs and day-0 cloud hosting

TL;DR

MiniMax's launch thread says M2.7 is now open on Hugging Face, and the follow-up ModelScope post shows the company pushed the same release across multiple public distribution channels on day one.
Unsloth's local-run guide puts the practical local target at a 108 GB 4-bit GGUF on a 128 GB machine, while the same guide says the full bf16 model needs 457 GB.
MiniMax's Together announcement confirms day-0 hosted availability on Together AI, and Merve Noyan's post adds that Fireworks and Novita were also in the launch mix.
According to Unsloth, M2.7 is a 230B-parameter MoE with 10B active parameters and roughly a 200K context window, while MiniMax highlighted 56.22% on SWE-Pro and 57.0% on Terminal Bench 2.
Community reaction turned fast to licensing, with Cedric Chee and a widely shared repost both flagging that the weights are not under a plain permissive open-source license.

You can read MiniMax's model post, grab the weights on Hugging Face, check Unsloth's local instructions, and hit Together's hosted endpoint. MiniMax also published an M2.7 coding-tools guide that routes the model into Claude Code through its own API.

Open release

MiniMax treated M2.7 like a full distribution push, not a single repo drop. The main announcement pointed to Hugging Face, the company blog, and MiniMax's own API in one shot, and a second post added ModelScope a few hours later.

The company blog says M2.7 is the first MiniMax model to participate deeply in its own improvement loop, and the GitHub README repeats the same framing around agent harnesses, complex skills, and dynamic tool search. The release date on the public model card and repo lines up with the April 12 rollout across those channels.

128 GB local runs

The local story is the part engineers will bookmark. Unsloth's launch post packaged M2.7 into a Dynamic 4-bit GGUF and spelled out the memory math instead of leaving users to guess.

The numbers from Unsloth's documentation are straightforward:

230B total parameters, 10B active per token
about 200K context, listed as 196,608 max in the guide
457 GB for bf16
108 GB for the 4-bit GGUF
243 GB for the Q8_0 build
a warning to avoid CUDA 13.2 because it can produce bad outputs

Unsloth says the 108 GB build fits a 128 GB unified-memory Mac at roughly 15+ tokens per second, or a box with a 16 GB GPU plus 96 GB system RAM at roughly 25+ tokens per second.

Managed hosting

Hosted inference landed just as quickly. MiniMax's Together post says M2.7 was live on Together the same day on both serverless and dedicated infrastructure.

Together's model page leans hard into long-horizon software engineering and repeats several of MiniMax's benchmark claims, including 56.22% on SWE-Pro, 55.6% on VIBE-Pro, and a 66.6% medal rate on MLE Bench Lite. Merve Noyan's post also pointed developers to Fireworks, Together, and Novita, which matches the broader "ecosystem together" rollout language in MiniMax's Fireworks partnership post.

Self-evolution and agent teams

The official pitch is bigger than raw benchmark bars. In MiniMax's blog post, the company frames M2.7 around four ideas:

model self-evolution
native Agent Teams
complex Skills
dynamic tool search

Together's page adds two sharper numbers that are easy to miss in the social rollout: MiniMax says the model went through 100+ internal optimization rounds and improved by about 30% on internal benchmarks, and Together says the model hits 97% skill compliance across 40+ production skills. That is Christmas-come-early language for agent framework people, because MiniMax is selling a workflow stack, not only a base model.

License

The catch surfaced almost immediately in community replies. Cedric Chee said people would complain about the license, and the reposted licensing thread focused on the fact that M2.7 uses a non-commercial scheme rather than a standard permissive release.

The public GitHub repo shows a custom LICENSE file and an "Other" license tag instead of an SPDX-style open-source label. That does not erase the practical significance of the weight release, but it does put M2.7 in the now-familiar bucket of public weights with tighter downstream terms than the word "open source" usually implies.