Nemotron 3 Ultra
Open, efficient Mixture-of-Experts hybrid Mamba-Attention model for agentic reasoning.
Frontier-scale text-only large language model released by NVIDIA with 550B total parameters and 55B active parameters, using a hybrid LatentMoE Mamba-2 + MoE + Attention architecture and supporting up to 1M-token context.
Pricing
Model Intelligence
Recent stories
NVIDIA shipped Nemotron 3 Ultra, a 550B/55B-active hybrid Mamba-Transformer MoE with open weights, data, and recipe, plus broad runtime and host support. It matters because the model pairs frontier open benchmarks with immediate agent-serving options, though local use still needs heavy quantization or large-memory hardware.
Arena shipped Agent Mode, a benchmark that lets models use web search, bash, file writing, image generation, and follow-up questions, then ranks them on five live-session signals. It matters because agent evals move from static task sets to real user workflows, with GPT-5.5 High currently leading the leaderboard.
NVIDIA teased Nemotron 3 Ultra as a 550B open-weight model due later this week, with early messaging centered on 5x faster and 30% cheaper inference plus a hybrid SSM-MoE design. The rollout matters because early benchmark posts already place it near the top of open-weight leaderboards, widening NVIDIA’s open-model push beyond Cosmos.