Benchmarks Inference Optimization Evals Agent Readiness Nemotron Model serving GPU Infrastructure

NVIDIA Nemotron 3 Ultra

Nemotron 3 Ultra

A specific NVIDIA Nemotron 3 model release: a 550B-total, 55B-active-parameter hybrid Mamba-Attention Mixture-of-Experts language model for reasoning and agentic workflows, released with pre-trained, post-trained, and quantized checkpoints and training datasets.

Pricing

Model profile · Current snapshot

Input / 1M

$0.675

Output / 1M

$2.68

Blended / 1M

$1.18

Output TPS

198

TTFT (s)

0.92

Model Intelligence

Context window

1,000,000 tokens

Arena ranking

Benchmarkable

Yes

Model level

release

Intelligence Index

37.8

Coding Index

49.3

GPQA

0.87

HLE

0.27

SciCode

0.4

IFBench

0.81

LCR

0.67

TerminalBench Hard

0.36

TAU2

0.83

Recent stories

3 linked stories

releasePRIMARY2026-06-04

NVIDIA releases Nemotron 3 Ultra: 550B MoE, 1M context

NVIDIA shipped Nemotron 3 Ultra, a 550B/55B-active hybrid Mamba-Transformer MoE with open weights, data, and recipe, plus broad runtime and host support. It matters because the model pairs frontier open benchmarks with immediate agent-serving options, though local use still needs heavy quantization or large-memory hardware.

newsSECONDARY2026-06-04

Arena launches Agent Mode rankings with GPT-5.5 High leading

Arena shipped Agent Mode, a benchmark that lets models use web search, bash, file writing, image generation, and follow-up questions, then ranks them on five live-session signals. It matters because agent evals move from static task sets to real user workflows, with GPT-5.5 High currently leading the leaderboard.

newsPRIMARY2026-06-01

NVIDIA claims Nemotron 3 Ultra 550B runs 5x faster and 30% cheaper

NVIDIA teased Nemotron 3 Ultra as a 550B open-weight model due later this week, with early messaging centered on 5x faster and 30% cheaper inference plus a hybrid SSM-MoE design. The rollout matters because early benchmark posts already place it near the top of open-weight leaderboards, widening NVIDIA’s open-model push beyond Cosmos.