releaseMarch 22, 2026

KittenTTS releases 15M-to-80M ONNX voice models for CPU deployment

KittenTTS released nano, micro, and mini ONNX TTS models sized for CPU-first deployment instead of GPU-heavy stacks. Voice-agent builders should benchmark both dependency weight and real-time latency before treating tiny size as enough.

Voice Agents Realtime AI Developer Experience

3 min read

KittenTTS releases 15M-to-80M ONNX voice models for CPU deployment

TL;DR

KittenTTS released a v0.8 ONNX-based text-to-speech stack with three models — nano, micro, and mini — spanning 15M to 80M parameters and roughly 25MB to 80MB, with the project positioned around CPU inference rather than a GPU-first serving path, according to the repo summary.
The engineering pitch is deployability: the HN writeup frames these as tiny models for edge hardware and offline inference, while the package is still labeled a "developer preview" with a basic Python API.
Early practitioner feedback says model size is only part of the story: in the discussion recap, one commenter calls dependency chains that pull in "torch + cuda" a "non-starter" for edge installs, even when the core model is small.
Reported performance is promising but not settled. The thread discussion cites one test on an Intel 9700 CPU at about "1.5x realtime" for the 80M model, with the same commenter saying it was not faster on a 3080 GPU.

What shipped

Hacker Newspage555 points179 comments

KittenML/KittenTTS

Posted by rohan_joshi

Kitten TTS is an open-source, lightweight text-to-speech library built on ONNX with models from 15M to 80M parameters (25-80 MB), enabling high-quality CPU-based voice synthesis without GPU. Latest v0.8 release includes nano (15M/25-56MB), micro (40M/41MB), and mini (80M/80MB) models on Hugging Face. Features text preprocessing, basic Python API (pip install from GitHub release), demo on HF Spaces, and commercial support. Apache 2.0 licensed, developer preview.

Open linked page Open HN thread

KittenTTS is shipping as an Apache 2.0 open-source library built on ONNX, with three published model sizes in the current v0.8 release: nano at 15M parameters, micro at 40M, and mini at 80M the repo summary. The project description says those models land in a roughly 25MB-to-80MB footprint range and are meant for "CPU-based voice synthesis without GPU," which puts them closer to embedded or local-agent deployments than to conventional GPU-backed speech stacks GitHub repo.

The release also includes text preprocessing, a basic Python API, Hugging Face-hosted models, and a demo surface, but the repo labels the package a developer preview rather than a finished production runtime the repo summary. That matters because the main novelty here is not just another TTS checkpoint; it is a small-footprint ONNX packaging choice aimed at teams that need voice output where GPU access is expensive, unavailable, or operationally awkward.

Where the deployment caveats are

Hacker Newscore555 points179 comments

Show HN: Three new Kitten TTS models – smallest less than 25MB

Posted by rohan_joshi

The main engineering angle is deployability: tiny ONNX-based TTS models that can run CPU-only on edge hardware, but with real-world concerns around Python dependency size, Torch/CUDA leakage, latency, streaming, and API ergonomics. The thread is useful if you build voice agents or offline inference stacks.

Discussed by

dawdler-purge on dependency bloat and CPU-only installs
baibai008989 on edge deployment
bobokaytop on latency and performance

Open HN thread Open linked page

The Hacker News thread immediately focused on the real bottleneck for voice agents: deployment ergonomics rather than raw model weights. In the HN summary, the core concerns were Python dependency size, Torch/CUDA leakage, latency, streaming support, and API shape — the parts that decide whether a small model actually stays small inside a shipping application.

Hacker Newsdiscussion555 points179 comments

Discussion around Show HN: Three new Kitten TTS models – smallest less than 25MB

Posted by rohan_joshi

Thread discussion highlights: - dawdler-purge on dependency bloat and CPU-only installs: the dependency chain issue is a real barrier for edge deployment... anything that pulls torch + cuda makes the whole thing a non-starter. - baibai008989 on edge deployment: the dependency chain issue is a real barrier for edge deployment... 25MB is genuinely exciting for that use case. - bobokaytop on latency and performance: Running on an intel 9700 CPU, it's about 1.5x realtime using the 80M model. It wasn't any faster running on a 3080 GPU though.

Discussed by

dawdler-purge on dependency bloat and CPU-only installs
baibai008989 on edge deployment
bobokaytop on latency and performance

Open HN thread Open linked page

The most concrete practitioner quote in the discussion recap says "anything that pulls torch + cuda makes the whole thing a non-starter," while another commenter said 25MB is "genuinely exciting" for edge use. That split captures the practical test for this release: a tiny ONNX checkpoint only changes deployment economics if the surrounding install and runtime stay equally lean.

Performance data is still anecdotal. The same discussion recap cites one report of the 80M model running at about 1.5x realtime on an Intel 9700 CPU and "wasn't any faster" on a 3080 GPU. For engineers building offline assistants or embedded voice agents, that makes KittenTTS interesting less as a benchmark winner than as a CPU-first packaging experiment with enough early signal to justify local latency and dependency testing.

releaseMarch 22, 2026

KittenTTS releases 15M-to-80M ONNX voice models for CPU deployment

Voice Agents Realtime AI Developer Experience

3 min read

TL;DR

KittenTTS released a v0.8 ONNX-based text-to-speech stack with three models — nano, micro, and mini — spanning 15M to 80M parameters and roughly 25MB to 80MB, with the project positioned around CPU inference rather than a GPU-first serving path, according to the repo summary.
The engineering pitch is deployability: the HN writeup frames these as tiny models for edge hardware and offline inference, while the package is still labeled a "developer preview" with a basic Python API.
Early practitioner feedback says model size is only part of the story: in the discussion recap, one commenter calls dependency chains that pull in "torch + cuda" a "non-starter" for edge installs, even when the core model is small.
Reported performance is promising but not settled. The thread discussion cites one test on an Intel 9700 CPU at about "1.5x realtime" for the 80M model, with the same commenter saying it was not faster on a 3080 GPU.

What shipped

Hacker Newspage555 points179 comments

KittenML/KittenTTS

Posted by rohan_joshi

Open linked page Open HN thread

Where the deployment caveats are

Hacker Newscore555 points179 comments

Show HN: Three new Kitten TTS models – smallest less than 25MB

Posted by rohan_joshi

Discussed by

dawdler-purge on dependency bloat and CPU-only installs
baibai008989 on edge deployment
bobokaytop on latency and performance

Open HN thread Open linked page

Hacker Newsdiscussion555 points179 comments

Discussion around Show HN: Three new Kitten TTS models – smallest less than 25MB

Posted by rohan_joshi

Discussed by

dawdler-purge on dependency bloat and CPU-only installs
baibai008989 on edge deployment
bobokaytop on latency and performance

Open HN thread Open linked page