Together introduced Mamba-3 and open-sourced kernels for a new MIMO state-space variant that targets decode efficiency and beats Mamba-2, GDN, and Llama 3.2 1B at 1.5B scale. Test it when deployment speed matters more than chasing another generic Transformer baseline.

Mamba-3 is the next Mamba release, but this one is explicitly tuned for deployment-time inference rather than training speed. Together's launch thread frames the problem as decode becoming memory-bound in agentic workloads and inference-heavy RL rollouts, while the linked blog post says Mamba-2 had focused more on training efficiency.
The main architectural change is MIMO, short for multi-input, multi-output. According to Together's paper and repo post, the model swaps the recurrence from a vector outer-product to a matrix multiply, aiming for a "stronger model at the same decode speed." The same post says kernels are open-sourced, with implementations using Triton, TileLang, and CuTe DSL in the public Mamba repository.
The release claims are strongest at the 1.5B scale. Together's launch thread says Mamba-3 has the fastest prefill plus decode there and outperforms Mamba-2, GDN, and Llama-3.2-1B. The linked paper summary adds a concrete delta: versus Gated DeltaNet at 1.5B, Mamba-3 gains 0.6 points in downstream accuracy, and the MIMO variant adds another 1.2 points.
The shared table in the results screenshot breaks that out: Mamba-3-SISO-1.5B posts 56.4 average accuracy, while Mamba-3-MIMO-1.5B reaches 57.6, alongside stronger scores on Lambada accuracy, HellaSwag, PIQA, and ARC-C. That supports the release's core pitch: not just a faster linear model, but a higher-quality one that keeps decode speed intact.
A practitioner reaction from Cedric Chee's post sums up the engineering angle: the story looks less like replacing Transformers everywhere and more like trying to "win the deployment bottleneck."
Introducing Mamba-3 ๐ Inference speeds are more important than ever, driven by the rise in agents and inference-heavy RLย Show more
The newest model in the Mamba series is finally here ๐ Hybrid models have become increasingly popular, raising the importance of designing the next generation of linear models. We've introduced several SSM-centric ideas to significantly increase Mamba-2's modeling capabilities
Mamba-3 just got released
The newest model in the Mamba series is finally here ๐ Hybrid models have become increasingly popular, raising the importance of designing the next generation of linear models. We've introduced several SSM-centric ideas to significantly increase Mamba-2's modeling capabilities