Skip to content
AI Primer
release

KittenTTS releases 25MB nano model for CPU text-to-speech

KittenTTS now offers nano, micro and mini text-to-speech models, with the smallest int8 build under 25MB and built for ONNX CPU inference. Creators can run local voice tools without a cloud round trip.

2 min read
KittenTTS releases 25MB nano model for CPU text-to-speech
KittenTTS releases 25MB nano model for CPU text-to-speech

TL;DR

  • KittenTTS has released three local text-to-speech models — nano, micro, and mini — with the smallest int8 nano build coming in under 25MB for ONNX-based CPU inference, according to the GitHub page.
  • The stack is aimed at fully local voice generation: the project page says it runs without a GPU, ships with eight built-in voices, supports speed control, and outputs 24 kHz audio.
  • For creators, the practical hook is lightweight offline voice work for prototypes, tools, and embedded experiences; the HN launch post frames the release around compact multi-voice, expressive speech synthesis.
  • The main caveat is that small model size does not automatically mean friction-free deployment, as the discussion roundup highlights questions about dependency bloat, latency, streaming, and expressive control.

What shipped

KittenML/KittenTTS: State-of-the-art TTS model under 25MB

Kitten TTS is an open-source, lightweight text-to-speech library built on ONNX with models from 15M to 80M parameters (25-80 MB). It supports CPU inference without GPU, features 8 built-in voices, adjustable speed, text preprocessing, and 24 kHz output. Latest release v0.8.1 (Feb 2026) includes nano (15M/25MB int8), micro (40M), and mini (80M) models. Python pip install available, with basic API for generation. 13k+ stars, Apache 2.0 license.

KittenTTS v0.8.1 packages three model sizes: nano at 15M parameters, micro at 40M, and mini at 80M, with the nano model quantized to roughly 25MB in int8 form the project page. The library is open source, built on ONNX, installable from Python, and positioned for CPU-first use rather than a cloud API round trip project details.

For creative workflows, the concrete features are simple but useful: eight built-in voices, adjustable speed, text preprocessing, and 24 kHz output the launch thread. That makes it more relevant for local narration, character placeholders, interactive installs, and quick voice mockups than for fully directed studio voice performance.

Where the creative limits are

Discussion around Show HN: Three new Kitten TTS models – smallest less than 25MB

Thread discussion highlights: - tredre3 on dependency bloat: The package pulls a chain of dependencies including spacy and, via uv, torch/CUDA packages that are several GB, which the commenter says undermines the appeal of a tiny edge model. - baibai008989 on edge deployment and latency: A Raspberry Pi/home automation use case is cited as exactly where a sub-25MB model matters, but the commenter asks about first-chunk latency and whether the system supports streaming output for interactive use. - bobokaytop on quality vs latency on low-power hardware: The commenter says the real bottleneck for edge deployments is often inference latency and audio streaming architecture, not just model size, and asks how it performs on a Raspberry Pi 4 in real time.

The early discussion is less about whether 25MB is impressive and more about what happens after install. In the thread summary, commenters say dependency chains can pull in far larger packages than the headline model size suggests, which undercuts the appeal for edge setups.

The other open questions are real-time behavior and control. Commenters ask about first-chunk latency, streaming output, Raspberry Pi performance, and whether creators get finer expressive controls such as pitch, volume, or explicit style tags latency questions expressive control.

Further reading

Discussion across the web

Where this story is being discussed, in original context.

Share on X