breakingJune 1, 2026

NVIDIA claims Nemotron 3 Ultra 550B runs 5x faster and 30% cheaper

NVIDIA teased Nemotron 3 Ultra as a 550B open-weight model due later this week, with early messaging centered on 5x faster and 30% cheaper inference plus a hybrid SSM-MoE design. The rollout matters because early benchmark posts already place it near the top of open-weight leaderboards, widening NVIDIA’s open-model push beyond Cosmos.

3 min read

NVIDIA claims Nemotron 3 Ultra 550B runs 5x faster and 30% cheaper

TL;DR

In ArtificialAnlys' launch post, NVIDIA positioned Nemotron 3 Ultra as a 550B open-weight model with 55B active parameters, while NVIDIAAI's teaser via scaling01 said it ships later this week.
ctnzr's keynote summary surfaced NVIDIA's main commercial pitch early, namely "5X faster" and "30% cheaper," and Artificial Analysis' writeup tied that speed claim to a pre-release endpoint serving more than 300 output tokens per second.
According to rohanpaul_ai's keynote clip, Nemotron 3 Ultra uses a hybrid state-space-model plus mixture-of-experts design, a setup NVIDIA is pitching for longer reasoning and tool runs.
ArtificialAnlys scored the model at 48 on its Intelligence Index, and kilocode's PinchBench post said it averaged 89.9% across 147 OpenClaw agent tasks, enough to top PinchBench's open-weights view.

You can watch the Computex keynote replay, read Artificial Analysis' launch note, and inspect the live PinchBench open-weights leaderboard. NVIDIA's nightly Nemotron docs already expose a Nemotron 3 Ultra Base deployment guide, and the teaser clip behind Jensen Huang also flashed a separate "Nemotron 4 WIP" line.

Nemotron 3 Ultra

NVIDIA is trying to plant a flag in US open weights. In ArtificialAnlys' launch post, the model is described as the largest Nemotron 3 release so far, at roughly 550B total parameters with 90% sparsity, which works out to 55B active parameters.

The messaging is not subtle. ctnzr's keynote summary compressed the pitch to three lines, frontier-smart, 5X faster, 30% cheaper, while testingcatalog's post framed it as NVIDIA's most intelligent open-weight model so far.

Hybrid SSM-MoE stack

The interesting technical reveal is the architecture. rohanpaul_ai's keynote clip says Nemotron 3 Ultra combines state-space models with mixture-of-experts, with the SSM side meant to keep long sequences and tool loops from hitting the usual attention cost wall.

That matches how NVIDIA has already described the Nemotron 3 family elsewhere. The public Nemotron 3 Super model card calls that model a hybrid Mamba-Transformer MoE with 1M context, and eliebakouch's note adds that Ultra appears less sparse than peers such as Kimi K2 and DeepSeek V4, with about 10% of parameters active instead of roughly 3%.

Early scorecards

The early scorecards split into two buckets, general intelligence and agent tasks. In ArtificialAnlys' launch post, Artificial Analysis put Nemotron 3 Ultra at 48 on its Intelligence Index, ahead of Gemma 4 31B at 39, Nemotron 3 Super at 36, and gpt-oss-120b at 33, but still behind Kimi K2.6 at 54.

On agent work, kilocode's PinchBench post said Nemotron 3 Ultra averaged 89.9% across 147 OpenClaw tasks and was free to run. The live PinchBench page says scores come from automated checks plus an LLM judge, and kilocode's keynote-slide photo shows NVIDIA pulling PinchBench directly into its Computex comparison slide, where Nemotron tied for the lead on agent productivity at 91%.

Rollout hints

The rollout is still half teaser, half launch. NVIDIAAI's teaser via scaling01 promised availability this week, and Artificial Analysis' note says the public weights should include BF16 plus NVFP4 quantization for faster inference.

There are already signs NVIDIA has the packaging lined up. The nightly Nemotron docs expose a deployment-guide path for "Nemotron 3 Ultra Base," and ai_for_success's keynote clip is one of several posts noting the slide also teased "Nemotron 4 WIP."

TL;DR

Nemotron 3 Ultra

Hybrid SSM-MoE stack

Early scorecards

Rollout hints

Discussion across the web