ElevenLabs added on-prem and on-device deployment options alongside its existing VPC and cloud paths for the voice stack. The rollout gives government, automotive, and edge teams more data-boundary choices, with VPC available now and the new modes in early access.

You can read the official announcement, skim the new on-prem deployments page, and cross-check the old cloud controls in ElevenLabs' docs for data residency and Zero Retention Mode. The interesting bit is how explicitly the company now slices the stack by trust boundary: air-gapped infrastructure, embedded hardware, customer-owned cloud, and vendor-managed API.
The recap tweet is the cleanest version of the product packaging. ElevenLabs now splits its enterprise offer into four lanes:
That is a much clearer segmentation than the usual "private deployment" label. The product line now maps directly to where inference happens and who owns the boundary.
The official launch post says on-premise is built for organizations that cannot use public cloud in the required region. The linked deployment page adds two concrete details the tweet thread only hints at: the target is standard GPU-enabled servers on Confidential Computing infrastructure, and the local deployment supports 30-plus languages.
That page also frames the main value as locality, not model novelty. Inference and audio processing stay inside the customer's environment, with optional external connectivity rather than a hard dependency on ElevenLabs infrastructure.
ElevenLabs describes on-device as a lighter deployment tier for reliable inference on constrained hardware, with the tweet calling out vehicles and wearables as the obvious fits. The on-prem deployments page gives the more precise target, edge and embedded devices, and says the on-device models also cover 30-plus languages.
The technical trade is straightforward: smaller models in exchange for no network hop. ElevenLabs pitches that mainly as lower latency for real-time systems where milliseconds change the user experience.
The older private options did not disappear. VPC remains the in-between tier for teams that want ElevenLabs models inside their own cloud account, with AWS SageMaker and GCP Vertex called out explicitly, while the managed API still carries automatic scaling and the full hosted product surface.
The recap tweet also slips in one notable product detail: VPC includes ElevenAgents, not just base voice models. That makes the packaging less about a single text-to-speech endpoint and more about where the broader voice agent stack is allowed to run.
Availability is staggered. On-premise and on-device are only in early access today, with initial releases expected in the first half of 2026, while VPC is available now.
The control surface is also uneven across tiers. ElevenLabs' data residency docs say enterprise customers can choose isolated storage environments in the US, EU, and India, but the company notes that some processing may still occur outside the selected location for support and related purposes. Its Zero Retention Mode docs separately say the feature can restrict logging for TTS, STT, Voice Changer, and agent inputs and outputs, which helps explain why the new fully local options are being sold as a distinct class of deployment rather than just another privacy checkbox.