breakingMarch 16, 2026

Artificial Analysis ranks Nemotron 3 VoiceChat at 77.8% conversational dynamics

Artificial Analysis published results for NVIDIA's Nemotron 3 VoiceChat, putting the 12B model at the open-weight pareto frontier across conversational dynamics and speech reasoning. Consider it for open voice agents, but compare against proprietary systems that still lead the category by a wide margin.

3 min read

Artificial Analysis ranks Nemotron 3 VoiceChat at 77.8% conversational dynamics

TL;DR

Artificial Analysis says NVIDIA's Nemotron 3 VoiceChat is a ~12B open-weight speech-to-speech model that now sits on the open-model Pareto frontier for both conversational dynamics and speech reasoning.
On Artificial Analysis' benchmarks, the benchmark thread puts Nemotron 3 VoiceChat at 77.8% on conversational dynamics and 29.2% on Big Bench Audio speech reasoning, making it the only open model in its comparison set that lands near the top on both axes.
The same comparison post says open speech-to-speech models still trail proprietary systems badly, with Step-Audio R1.1 at 96% on Big Bench Audio and Grok Voice Agent and Gemini 2.5 Flash (Thinking) both at 92%.
NVIDIA appears to be positioning the model as more than a quiet research drop: Artificial Analysis linked NVIDIA's early access and a live demo, and a separate stage photo shows Nemotron VoiceChat benchmarking on an NVIDIA keynote slide.

What actually shipped in the open-weight field

Artificial Analysis frames Nemotron 3 VoiceChat as a full-duplex speech-to-speech model tuned for both "raw intelligence" and the "natural rhythms of human conversation" like turn-taking and interruptions benchmark thread. In its open-weight comparison set, that makes the model unusual: PersonaPlex leads conversational dynamics at 91.0%, and Freeze-Omni leads speech reasoning at 33.9%, but Nemotron lands second on both at 77.8% and 29.2% respectively benchmark thread.

That matters for engineers evaluating open voice agents, because the tradeoff is usually stark. The [img:0|benchmark chart] shows Nemotron in the "most attractive quadrant" between responsiveness and reasoning, while Artificial Analysis notes it is also larger than most open speech-to-speech peers at roughly 12B parameters. NVIDIA also has an early access page and a live conversation demo linked from the analysis post.

How far it still is from production-leading voice models

Artificial Analysis is explicit that this is an open-model advance, not category leadership. Its comparison post says proprietary systems are still far ahead on Big Bench Audio, citing Step-Audio R1.1 at 96%, Grok Voice Agent at 92%, Gemini 2.5 Flash (Thinking) at 92%, and Nova 2.0 Sonic at 87%.

The practical read is that Nemotron improves the open-weight option set more than it resets the voice stack leaderboard. Even the NVIDIA keynote photo circulating from the event shows Nemotron VoiceChat as one benchmark tile inside a broader model lineup, which fits the current state of the market: open duplex voice is getting more credible, but proprietary systems still define the top end on reasoning quality and conversational polish.

TL;DR

What actually shipped in the open-weight field

How far it still is from production-leading voice models

Discussion across the web