releaseMarch 22, 2026

KittenTTS releases 15M-to-80M ONNX voice models for CPU deployment

KittenTTS released nano, micro, and mini ONNX TTS models sized for CPU-first deployment instead of GPU-heavy stacks. Voice-agent builders should benchmark both dependency weight and real-time latency before treating tiny size as enough.

3 min read

KittenTTS releases 15M-to-80M ONNX voice models for CPU deployment

TL;DR

KittenTTS released a v0.8 ONNX-based text-to-speech stack with three models — nano, micro, and mini — spanning 15M to 80M parameters and roughly 25MB to 80MB, with the project positioned around CPU inference rather than a GPU-first serving path, according to the repo summary.
The engineering pitch is deployability: the HN writeup frames these as tiny models for edge hardware and offline inference, while the package is still labeled a "developer preview" with a basic Python API.
Early practitioner feedback says model size is only part of the story: in the discussion recap, one commenter calls dependency chains that pull in "torch + cuda" a "non-starter" for edge installs, even when the core model is small.
Reported performance is promising but not settled. The thread discussion cites one test on an Intel 9700 CPU at about "1.5x realtime" for the 80M model, with the same commenter saying it was not faster on a 3080 GPU.

What shipped

Hacker News

KittenML/KittenTTS

555 upvotes · 179 comments

KittenTTS is shipping as an Apache 2.0 open-source library built on ONNX, with three published model sizes in the current v0.8 release: nano at 15M parameters, micro at 40M, and mini at 80M the repo summary. The project description says those models land in a roughly 25MB-to-80MB footprint range and are meant for "CPU-based voice synthesis without GPU," which puts them closer to embedded or local-agent deployments than to conventional GPU-backed speech stacks GitHub repo.

The release also includes text preprocessing, a basic Python API, Hugging Face-hosted models, and a demo surface, but the repo labels the package a developer preview rather than a finished production runtime the repo summary. That matters because the main novelty here is not just another TTS checkpoint; it is a small-footprint ONNX packaging choice aimed at teams that need voice output where GPU access is expensive, unavailable, or operationally awkward.

Where the deployment caveats are

Hacker News

Show HN: Three new Kitten TTS models – smallest less than 25MB

555 upvotes · 179 comments

The Hacker News thread immediately focused on the real bottleneck for voice agents: deployment ergonomics rather than raw model weights. In the HN summary, the core concerns were Python dependency size, Torch/CUDA leakage, latency, streaming support, and API shape — the parts that decide whether a small model actually stays small inside a shipping application.

Hacker News

Discussion around Show HN: Three new Kitten TTS models – smallest less than 25MB

555 upvotes · 179 comments

The most concrete practitioner quote in the discussion recap says "anything that pulls torch + cuda makes the whole thing a non-starter," while another commenter said 25MB is "genuinely exciting" for edge use. That split captures the practical test for this release: a tiny ONNX checkpoint only changes deployment economics if the surrounding install and runtime stay equally lean.

Performance data is still anecdotal. The same discussion recap cites one report of the 80M model running at about 1.5x realtime on an Intel 9700 CPU and "wasn't any faster" on a 3080 GPU. For engineers building offline assistants or embedded voice agents, that makes KittenTTS interesting less as a benchmark winner than as a CPU-first packaging experiment with enough early signal to justify local latency and dependency testing.