KittenTTS releases 25MB nano model for CPU text-to-speech
KittenTTS now offers nano, micro and mini text-to-speech models, with the smallest int8 build under 25MB and built for ONNX CPU inference. Creators can run local voice tools without a cloud round trip.

TL;DR
- KittenTTS has released three local text-to-speech models — nano, micro, and mini — with the smallest int8 nano build coming in under 25MB for ONNX-based CPU inference, according to the GitHub page.
- The stack is aimed at fully local voice generation: the project page says it runs without a GPU, ships with eight built-in voices, supports speed control, and outputs 24 kHz audio.
- For creators, the practical hook is lightweight offline voice work for prototypes, tools, and embedded experiences; the HN launch post frames the release around compact multi-voice, expressive speech synthesis.
- The main caveat is that small model size does not automatically mean friction-free deployment, as the discussion roundup highlights questions about dependency bloat, latency, streaming, and expressive control.
What shipped
KittenML/KittenTTS: State-of-the-art TTS model under 25MB
560 upvotes · 182 comments
KittenTTS v0.8.1 packages three model sizes: nano at 15M parameters, micro at 40M, and mini at 80M, with the nano model quantized to roughly 25MB in int8 form the project page. The library is open source, built on ONNX, installable from Python, and positioned for CPU-first use rather than a cloud API round trip project details.
For creative workflows, the concrete features are simple but useful: eight built-in voices, adjustable speed, text preprocessing, and 24 kHz output the launch thread. That makes it more relevant for local narration, character placeholders, interactive installs, and quick voice mockups than for fully directed studio voice performance.
Where the creative limits are
Discussion around Show HN: Three new Kitten TTS models – smallest less than 25MB
560 upvotes · 182 comments
The early discussion is less about whether 25MB is impressive and more about what happens after install. In the thread summary, commenters say dependency chains can pull in far larger packages than the headline model size suggests, which undercuts the appeal for edge setups.
The other open questions are real-time behavior and control. Commenters ask about first-chunk latency, streaming output, Raspberry Pi performance, and whether creators get finer expressive controls such as pitch, volume, or explicit style tags latency questions expressive control.