releaseMarch 12, 2026

NVIDIA releases Nemotron 3 Super on OpenRouter with 1M context and free access

NVIDIA released Nemotron 3 Super, a 120B open model with 12B active parameters and a 1M-token window, on OpenRouter with free access. Evaluate it for low-cost agent backends, especially if you need local or self-hosted deployment options.

3 min read

NVIDIA releases Nemotron 3 Super on OpenRouter with 1M context and free access

TL;DR

OpenRouter's free listing now exposes NVIDIA Nemotron 3 Super at a free endpoint, and Teknium's Hermes setup says Hermes Agent users can already select it as a custom OpenRouter model.
The OpenRouter page describes Nemotron 3 Super as a 120B open model with 12B active parameters, a hybrid Mamba-Transformer design, and a 1M-token context window for long-horizon agent tasks model page.
Early adopters are plugging it into agent tooling fast: OpenHands' early access note says it got early access, that it "works well," and calls it "a great new locally deployable LLM."
Benchmark positioning is the main pitch: the AA chart shared by Wes Roth places Nemotron 3 Super above gpt-oss-120B on Artificial Analysis' open-weight index, while OpenRouter's PinchBench post also boosted a report that it leads PinchBench on average for openclaw.

What shipped on OpenRouter?

The practical news is simple: Nemotron 3 Super is already callable through OpenRouter, and Teknium's Hermes setup shows one immediate path into agent workflows by pasting nvidia/nemotron-3-super-120b-a12b:free into Hermes Agent's custom model field. That makes this less of a research release and more of a drop engineers can test today.

According to the OpenRouter page, the model is a 120B open hybrid MoE system with only 12B parameters active at inference, a 1M-token context window, and multi-token prediction aimed at long-context reasoning and multi-step planning. The same listing says it is released with weights, datasets, and recipes under the NVIDIA Open License, and reports roughly 28 tokens/sec average throughput alongside benchmark strength on AIME 2025, TerminalBench, and SWE-Bench benchmark summary.

Where is it landing in agent stacks?

The first concrete implementation signal is from OpenHands: its team says it had early access, that the model "works well," and that they are "excited to have a great new locally deployable LLM" early access note. That lines up with the release's strongest engineering angle: a big-context open model positioned for agent backends that teams may want to run outside closed hosted APIs.

The performance case is still mostly benchmark-driven, but it is specific enough to watch. Wes Roth's AA chart cites an Artificial Analysis score of 36 for Nemotron 3 Super versus 33 for gpt-oss-120B, and claims it is "roughly 10% faster per GPU," while OpenRouter's PinchBench post amplified a separate report that it is the best model on average on PinchBench for openclaw. Nathan Lambert's interview post also framed this release as "a LONG time coming," pointing to NVIDIA's broader open-model push rather than a one-off model drop.

TL;DR

What shipped on OpenRouter?

Where is it landing in agent stacks?

Discussion across the web