releaseJune 23, 2026

AssemblyAI launches Universal-3.5 Pro Realtime with Context Carryover

AssemblyAI’s Universal-3.5 Pro Realtime now carries forward the agent side of a conversation to improve live transcription. The release also ships multilingual realtime ASR features, and one early deployment said critical-utterance errors fell from 26% to 9%.

3 min read

AssemblyAI launches Universal-3.5 Pro Realtime with Context Carryover

TL;DR

AssemblyAI says AssemblyAI's launch post introduces the first realtime speech-to-text model that uses the voice agent's side of the conversation as live context, a feature it calls Context Carryover in AssemblyAI's feature thread.
In AssemblyAI's feature thread, the company says one deployment cut errors on critical utterances from 26% to 9% when the model could anticipate fields like emails, order numbers, account IDs, medications, and spelled names.
According to AssemblyAI's feature thread, Universal-3.5 Pro Realtime also ships automatic language detection across 18 languages, mid-sentence code-switching, domain prompting, voice focus, and speaker diarization that self-corrects at the end of the stream.
AssemblyAI's demo thread says the model is available now in AssemblyAI's API, Playground, and partner integrations, while AssemblyAI's LiveKit quote confirms day-one availability on LiveKit Inference.

You can read AssemblyAI's launch link post, skim AssemblyAI's demo thread, and see partners use the same release to emphasize slightly different pain points: the Fireflies quote cares about latency and language switching, the Retell quote pushes accuracy for regulated phone calls, and the LiveKit quote frames Context Carryover as a way to avoid predefining key terms.

Context Carryover

AssemblyAI's core claim is simple: the transcriber now hears both sides of the interaction. In AssemblyAI's launch post, the company says the model can use the agent's prompt, such as asking for an email or account ID, to bias transcription toward the answer that follows.

That narrows in on one of the nastiest voice-agent failure modes, the fast critical utterance. AssemblyAI's feature thread says one team reduced errors on those fields from 26% to 9%.

Realtime ASR feature set

The rest of the release is a bundled realtime ASR upgrade, not just a context trick. In AssemblyAI's feature thread, AssemblyAI lists:

Leading accuracy across 18 languages with automatic language detection.
Code-switching within the same sentence.
Domain and context prompting.
Voice focus on the primary speaker.
Realtime speaker diarization with end-of-stream self-correction.

The company is also leaning on live demos over benchmark charts. AssemblyAI's demo thread explicitly says that live demos beat generic benchmarks.

Where it ships

AssemblyAI says the model is live today across its own surfaces and partner channels. AssemblyAI's demo thread lists the API, Playground, and partner integrations, and AssemblyAI's LiveKit quote says Universal-3.5 Pro is already available on LiveKit Inference.

Partner quotes also sketch the early adoption map:

The Fireflies quote says Fireflies tested the model in its voice-agent pipeline and found it strongest on accuracy, latency, and language switching.
The Retell quote ties the release to a high-accuracy mode for phone agents in healthcare and finance.
The Pipecat quote calls low-latency STT with more context a missing next-generation capability.

Context management in the voice stack

Kwindla Hultman Kramer, CEO of Daily and creator of Pipecat, uses the launch to make a broader point in his thread: production voice agents still spend a lot of engineering effort on context plumbing outside the models themselves. He lists state machines, multi-agent systems, subagents, and non-blocking compaction and summarization as the glue teams build to keep speech, language, and speech synthesis models aligned across a call.

That makes AssemblyAI's move notable for a narrow reason. Instead of asking application teams to bolt more context logic around STT, it pushes one part of that orchestration down into the transcription model itself.

TL;DR

Context Carryover

Realtime ASR feature set

Where it ships

Context management in the voice stack

Discussion across the web