KittenTTS releases 25MB nano voice model with CPU-only ONNX runtime
KittenTTS 0.8 ships new 15M, 40M and 80M models, including an int8 nano model around 25MB that runs on CPU without GPU. It is a fit for narration, character voices and lightweight assistants that need offline or edge-friendly speech.

TL;DR
- KittenTTS 0.8 adds three new open-weight voice models — mini at 80M, micro at 40M, and nano at 15M — and the smallest int8 version lands at about 25MB, according to the repo page.
- The release is built around ONNX and CPU inference rather than GPU requirements, which makes it more relevant for offline narration tools, lightweight assistants, and edge-style voice apps, as described in the launch thread.
- The creative angle is less “tiny demo” than expressive speech: the discussion around the HN post centers on prosody, number pronunciation, and how much control users get over delivery.
- Early practitioner feedback in the discussion summary says deployment size is promising, but low-power latency and streaming architecture may still matter more than model footprint alone.
What shipped
KittenML/KittenTTS
560 upvotes · 183 comments
KittenTTS 0.8 ships as an Apache 2.0, ONNX-based text-to-speech library with a Python API, text preprocessing, and a Hugging Face demo, per the project page. For creators, the key update is the model spread: an 80M mini, 40M micro, and 15M nano, with the smallest int8 build coming in around 25MB. That makes the release unusually compact for voice workflows that need local synthesis instead of cloud calls.
The project positioning in the launch thread also leans toward usable expressive speech, not just bare intelligibility. The stated focus on prosody and pronunciation is what makes this more interesting for narration, character voices, and embedded voice agents than a generic “small TTS model” drop.
What the early caveats look like
Discussion around Show HN: Three new Kitten TTS models – smallest less than 25MB
560 upvotes · 183 comments
The first practical read from the Hacker News discussion is that package size solves only part of the problem. In the thread summary, one commenter calls a 25MB model genuinely exciting for edge deployment because it avoids the usual Torch-and-CUDA dependency chain, while another says inference latency on low-power hardware and audio streaming design are still the real bottlenecks.
There is at least one concrete integration datapoint: a commenter cited in the main thread says they wired the repo into Discord voice messages within minutes and saw about 1.5x realtime on an Intel 9700 CPU using the 80M model. The same discussion also raises open questions about expressive tags and fine-grained delivery control, which are still the make-or-break details for creative voice work.