Skip to content
AI Primer
release

Cartesia releases Sonic-3.5 and Ink-2 for streaming TTS and STT

Cartesia launched Sonic-3.5 for text-to-speech and Ink-2 for speech-to-text, calling them its new top streaming voice models. The release pairs low-latency voice-agent claims with 42-language support and immediate partner availability.

4 min read
Cartesia releases Sonic-3.5 and Ink-2 for streaming TTS and STT
Cartesia releases Sonic-3.5 and Ink-2 for streaming TTS and STT

TL;DR

  • Cartesia's launch post introduced Sonic-3.5 for streaming text to speech and Ink-2 for streaming speech to text, with the company calling them its new top voice-agent models.
  • According to Together AI's availability post, Sonic-3.5 ships with sub-90ms latency, native support for 42 languages, and improved handling for codes, IDs, and other structured speech.
  • Cartesia's promo thread framed the release as a single-provider voice stack, pairing TTS and STT under one API for real-time agents.
  • Day-one distribution was already broader than Cartesia's own surface, with Vapi's integration post and Together AI's launch post both announcing availability.

You can start with Cartesia's main launch thread, jump to Together AI's model page post, and even test a live demo through Cartesia's call link thread. One useful detail surfaced in replies: a Cartesia reply on open source said the models are not open source, while another Cartesia reply said they can plug into any LLM.

Sonic-3.5

Sonic-3.5 is the more concrete half of the release so far. Cartesia's launch framing in its main thread focuses on speed and quality, while Together AI's feature list adds the engineering details that matter in production.

Together's summary broke the model into four claims:

  • Sub-90ms latency.
  • Native support for 42 languages.
  • Better transcript following for codes, IDs, and other structured speech.
  • Context-aware pronunciation for heteronyms like "read," "bass," and "bow," according to Together AI's post.

Cartesia also used replies to fill in rollout details. one language support reply said Sonic-3.5 includes 9 Indic languages, and another reply reiterated the 42-language footprint with support for different accents and locales.

Ink-2

Ink-2 arrived with less public detail than Sonic-3.5, but the positioning is clear. Cartesia's product account called it the new top streaming speech-to-text model, and Albert Gu's thread said both launches landed within a week and were built from the ground up.

Gu, a Cartesia cofounder, tied the release to a broader architecture bet in his post: speech models need to fuse text and audio, and the work behind Sonic-3.5 and Ink-2 is meant to scale toward general real-time multimodal systems. That gives the launch a more ambitious read than a simple model refresh.

Where it shows up

The ecosystem rollout was immediate.

That last point matters because Cartesia is selling the release as infrastructure, not a closed app surface.

Rollout details

A few practical details only showed up outside the main launch tweet.

That last reply is the clearest caveat in the launch material. The TTS side looks broadly multilingual today, while STT multilingual support still appears to be rolling out.

Further reading

Discussion across the web

Where this story is being discussed, in original context.

On X· 3 threads
Sonic-3.52 posts
Where it shows up1 post
Rollout details3 posts
Share on X