Skip to content
AI Primer

Nemotron 3 Ultra

Open, efficient Mixture-of-Experts hybrid Mamba-Attention model for agentic reasoning.

Frontier-scale text-only large language model released by NVIDIA with 550B total parameters and 55B active parameters, using a hybrid LatentMoE Mamba-2 + MoE + Attention architecture and supporting up to 1M-token context.

Pricing

Model profile · Current snapshot
Input / 1M
$0.60
Output / 1M
$2.60
Blended / 1M
$1.10
Output TPS
174
TTFT (s)
0.73

Model Intelligence

Context window
1,000,000 tokens
Arena ranking
48
Benchmarkable
Yes
Model level
release
Intelligence Index
47.7
Coding Index
37.6
GPQA
0.87
HLE
0.27
SciCode
0.4
IFBench
0.81
LCR
0.67
TerminalBench Hard
0.36
TAU2
0.83

Recent stories

3 linked stories
AI PrimerAI Primer

Your daily guide to AI tools, workflows, and creative inspiration.

© 2026 AI Primer. All rights reserved.