Cartesia releases Sonic-3.5 and Ink-2 for streaming TTS and STT
Cartesia launched Sonic-3.5 for text-to-speech and Ink-2 for speech-to-text, calling them its new top streaming voice models. The release pairs low-latency voice-agent claims with 42-language support and immediate partner availability.

TL;DR
- Cartesia's launch post introduced Sonic-3.5 for streaming text to speech and Ink-2 for streaming speech to text, with the company calling them its new top voice-agent models.
- According to Together AI's availability post, Sonic-3.5 ships with sub-90ms latency, native support for 42 languages, and improved handling for codes, IDs, and other structured speech.
- Cartesia's promo thread framed the release as a single-provider voice stack, pairing TTS and STT under one API for real-time agents.
- Day-one distribution was already broader than Cartesia's own surface, with Vapi's integration post and Together AI's launch post both announcing availability.
You can start with Cartesia's main launch thread, jump to Together AI's model page post, and even test a live demo through Cartesia's call link thread. One useful detail surfaced in replies: a Cartesia reply on open source said the models are not open source, while another Cartesia reply said they can plug into any LLM.
Sonic-3.5
Sonic-3.5 is the more concrete half of the release so far. Cartesia's launch framing in its main thread focuses on speed and quality, while Together AI's feature list adds the engineering details that matter in production.
Together's summary broke the model into four claims:
- Sub-90ms latency.
- Native support for 42 languages.
- Better transcript following for codes, IDs, and other structured speech.
- Context-aware pronunciation for heteronyms like "read," "bass," and "bow," according to Together AI's post.
Cartesia also used replies to fill in rollout details. one language support reply said Sonic-3.5 includes 9 Indic languages, and another reply reiterated the 42-language footprint with support for different accents and locales.
Ink-2
Ink-2 arrived with less public detail than Sonic-3.5, but the positioning is clear. Cartesia's product account called it the new top streaming speech-to-text model, and Albert Gu's thread said both launches landed within a week and were built from the ground up.
Gu, a Cartesia cofounder, tied the release to a broader architecture bet in his post: speech models need to fuse text and audio, and the work behind Sonic-3.5 and Ink-2 is meant to scale toward general real-time multimodal systems. That gives the launch a more ambitious read than a simple model refresh.
Where it shows up
The ecosystem rollout was immediate.
- Vapi's post said Cartesia models are available on Vapi for existing builds.
- Together AI's post said Sonic-3.5 is available on Together AI.
- the same Together thread also said developers can browse more than 150 Sonic-3.5 voices in its voice finder before deploying.
- a Cartesia reply said the stack can plug into any LLM.
That last point matters because Cartesia is selling the release as infrastructure, not a closed app surface.
Rollout details
A few practical details only showed up outside the main launch tweet.
- Cartesia's follow-up thread linked a live call demo and a signup promo for trying a real voice agent.
- another Cartesia post offered three free months for teams switching from an existing provider.
- a direct reply from Cartesia said the models are not open source.
- a data residency reply said Cartesia already works with customers that have India data residency requirements.
- a multilingual roadmap reply said multilingual Ink-2 is still "coming soon," even though Sonic-3.5 already supports 42 languages.
That last reply is the clearest caveat in the launch material. The TTS side looks broadly multilingual today, while STT multilingual support still appears to be rolling out.