Skip to content
AI Primer
breaking

Artificial Analysis ranks Nemotron 3 VoiceChat at 77.8% conversational dynamics

Artificial Analysis published results for NVIDIA's Nemotron 3 VoiceChat, putting the 12B model at the open-weight pareto frontier across conversational dynamics and speech reasoning. Consider it for open voice agents, but compare against proprietary systems that still lead the category by a wide margin.

3 min read
Artificial Analysis ranks Nemotron 3 VoiceChat at 77.8% conversational dynamics
Artificial Analysis ranks Nemotron 3 VoiceChat at 77.8% conversational dynamics

TL;DR

  • Artificial Analysis says NVIDIA's Nemotron 3 VoiceChat is a ~12B open-weight speech-to-speech model that now sits on the open-model Pareto frontier for both conversational dynamics and speech reasoning.
  • On Artificial Analysis' benchmarks, the benchmark thread puts Nemotron 3 VoiceChat at 77.8% on conversational dynamics and 29.2% on Big Bench Audio speech reasoning, making it the only open model in its comparison set that lands near the top on both axes.
  • The same comparison post says open speech-to-speech models still trail proprietary systems badly, with Step-Audio R1.1 at 96% on Big Bench Audio and Grok Voice Agent and Gemini 2.5 Flash (Thinking) both at 92%.
  • NVIDIA appears to be positioning the model as more than a quiet research drop: Artificial Analysis linked NVIDIA's early access and a live demo, and a separate stage photo shows Nemotron VoiceChat benchmarking on an NVIDIA keynote slide.

What actually shipped in the open-weight field

Artificial Analysis frames Nemotron 3 VoiceChat as a full-duplex speech-to-speech model tuned for both "raw intelligence" and the "natural rhythms of human conversation" like turn-taking and interruptions benchmark thread. In its open-weight comparison set, that makes the model unusual: PersonaPlex leads conversational dynamics at 91.0%, and Freeze-Omni leads speech reasoning at 33.9%, but Nemotron lands second on both at 77.8% and 29.2% respectively benchmark thread.

That matters for engineers evaluating open voice agents, because the tradeoff is usually stark. The [img:0|benchmark chart] shows Nemotron in the "most attractive quadrant" between responsiveness and reasoning, while Artificial Analysis notes it is also larger than most open speech-to-speech peers at roughly 12B parameters. NVIDIA also has an early access page and a live conversation demo linked from the analysis post.

How far it still is from production-leading voice models

Artificial Analysis is explicit that this is an open-model advance, not category leadership. Its comparison post says proprietary systems are still far ahead on Big Bench Audio, citing Step-Audio R1.1 at 96%, Grok Voice Agent at 92%, Gemini 2.5 Flash (Thinking) at 92%, and Nova 2.0 Sonic at 87%.

The practical read is that Nemotron improves the open-weight option set more than it resets the voice stack leaderboard. Even the NVIDIA keynote photo circulating from the event shows Nemotron VoiceChat as one benchmark tile inside a broader model lineup, which fits the current state of the market: open duplex voice is getting more credible, but proprietary systems still define the top end on reasoning quality and conversational polish.

Further reading

Discussion across the web

Where this story is being discussed, in original context.

On X· 2 threads
TL;DR1 post
How far it still is from production-leading voice models1 post
Share on X