Together AI launches unified voice stack with co-located STT, LLM, and TTS
Together AI launched a single-cloud stack for realtime voice agents that hosts Deepgram, Cartesia, MiniMax, and other voice components on one platform. Use it to cut latency and deployment overhead if you want one billing surface for production voice apps.

TL;DR
- Together AI launched a voice-agent stack that keeps STT, LLM, and TTS on one cloud, with Together's launch thread saying every handoff stays "inside one cluster" and the product post framing it as a replacement for multi-vendor voice pipelines.
- The initial stack includes native hosting for Deepgram and Cartesia, while Together's thread says builders can swap models across the stack without rebuilding integrations and Cartesia's announcement positions Cartesia as a dedicated model partner on the platform.
- Together's product post says the co-located architecture brings end-to-end latency under 700 ms, with a single API, billing surface, and deployment path for production voice apps blog details.
- Model coverage is already expanding: MiniMax's post says MiniMax Speech 2.6 Turbo is now part of the voice stack, and Deepgram availability confirms Deepgram STT is natively available on Together.
What shipped
Together's launch is a unified runtime for real-time voice agents: speech-to-text, LLM inference, and text-to-speech run on one cloud instead of hopping across separate vendors. In the announcement thread, the company says the practical change for builders is co-location, model swapping across the stack, and one surface for billing, deployment, and access.
The first-party and partner lineup is broader than a single STT/TTS pair. Together's [img:1|Voice stack diagram] shows Cartesia, MiniMax, Rime, Deepgram, Whisper, Voxtral, Kokoro, and Orpheus connected to the same "AI native cloud for voice," while Cartesia's post says Cartesia is now a dedicated model partner and the Deepgram note confirms Deepgram STT is hosted natively on Together infrastructure.
Why it matters for latency and operations
The engineering pitch is fewer network boundaries. Together's blog post says most current voice systems are "stitched together across vendors," which adds latency and operational overhead as audio and tokens move over the internet between STT, LLM, and TTS services. Its replacement is a modular but co-located stack, and the company says that gets end-to-end latency below 700 ms for live conversations latency details.
That matters operationally as much as interactively. The same product post says the platform exposes unified API access, security controls including zero data retention and SOC 2 Type II support, and deployment options aimed at enterprise voice workloads. Meanwhile, MiniMax's update shows Together is treating the stack as a multi-model platform rather than a fixed pipeline: MiniMax Speech 2.6 Turbo has already been added alongside Deepgram and Cartesia, which makes the "swap models" claim more concrete.