ElevenLabs launches Speech Engine at 8¢ per minute for chat-to-voice agents
ElevenLabs launched Speech Engine, a layer that adds transcription, speech synthesis, turn-taking, and interruption handling on top of an existing chat agent. The release pairs SDKs, one-command setup, and 8¢-per-minute pricing for production voice agents.

TL;DR
- ElevenLabs' launch post says Speech Engine adds voice to an existing chat agent without rebuilding the LLM layer, bundling transcription, speech synthesis, and orchestration into one pipeline.
- According to ElevenLabsDevs' stack diagram post, the integration point is a WebSocket endpoint, while ElevenLabsDevs' conversation flow post says ElevenLabs handles turn-taking and interruption detection around each conversation.
- ElevenLabsDevs' setup post and ElevenLabs' skill install post both pitch a one-command path,
npx skills add elevenlabs/skills --skill speech-engine, to scaffold the server, client setup, and token endpoint. - ElevenLabsDevs' SDK post says the client side ships React and JavaScript SDKs with conversation tokens and
useConversation, while ElevenLabsDevs' server SDK post says Python and JavaScript server SDKs can stream responses directly from OpenAI, Anthropic, or Gemini. - Availability is live in ElevenAPI, and ElevenLabs' pricing post prices it from 8 cents per minute, while ElevenLabs' security post adds SOC 2, HIPAA, GDPR, EU data residency, and zero retention mode claims.
You can browse the product page, jump straight to the Speech Engine cookbook, and watch the ai.engineer London walkthrough. The interesting bit is how little of the existing agent stack ElevenLabs wants touched: ElevenLabs' integration post says the text agent stays untouched, while ElevenLabsDevs' server SDK post says sendResponse() already accepts streamed output from three major model vendors.
Speech Engine
ElevenLabs is selling this as an overlay, not a new agent framework.
The core pitch is consistent across both launch threads:
- keep the existing chat agent and LLM logic in place
- let ElevenLabs handle speech-to-text and text-to-speech
- add turn-taking and interruption handling in the middle
- stream replies back over a WebSocket connection
That makes this a fairly opinionated voice runtime for teams that already have a text agent and do not want to rewire the stack around a different orchestration system.
SDKs and scaffolding
The implementation details are more concrete than the headline.
The launch materials break the developer path into a few pieces:
- Scaffolding:
npx skills add elevenlabs/skills --skill speech-enginesets up the server, client SDK wiring, and token endpoint, per ElevenLabs' skill install post and ElevenLabsDevs' setup post. - Client side: React and JavaScript SDKs ship with conversation tokens so the API key stays out of the browser, plus hooks such as
useConversation, per ElevenLabsDevs' client SDK post. - Server side: Python and JavaScript SDKs manage the WebSocket and transcript flow, and
sendResponse()can consume streams from OpenAI, Anthropic, or Gemini, per ElevenLabsDevs' server SDK post. - Runtime model: each WebSocket connection maps to one conversation, according to ElevenLabsDevs' conversation flow post.
Languages, pricing, and migration
The rest of the launch fills in the production story.
ElevenLabs says Speech Engine supports expressive voices in 70-plus languages, with transcription tuned for conversational latency and messy real-world audio, according to ElevenLabs' languages post and ElevenLabs' transcription post. Pricing starts at 8 cents per minute through ElevenAPI, with lower rates at scale, per ElevenLabs' pricing post.
The enterprise checklist is also explicit. ElevenLabs' security post lists SOC 2, HIPAA, GDPR, EU data residency, and zero retention mode. For teams that want more than the API layer, ElevenLabs' migration post says Speech Engine projects can move into ElevenAgents later for deployment channels, monitoring, analytics, and the broader agent toolset.