releaseMarch 22, 2026

KittenTTS releases v0.8 with a 25MB int8 model and CPU-only speech synthesis

KittenML's latest open-source TTS release spans 15M to 80M models, with the smallest coming in under 25MB and the larger one reportedly running faster than realtime on CPU. Audio creators should test pronunciation and install overhead before betting on it for edge or local voice tools.

2 min read

KittenTTS releases v0.8 with a 25MB int8 model and CPU-only speech synthesis

TL;DR

KittenML's v0.8 repo page says KittenTTS v0.8 now spans 15M to 80M models, with the smallest int8 checkpoint coming in around 25MB for local or edge voice use.
The same project page positions the release as CPU-friendly ONNX TTS, with text preprocessing, a Python API, Hugging Face models, and a browser demo already available.
Early user reports in the HN discussion suggest the 80M model can run at about 1.5x realtime on an Intel 9700 CPU, which makes local preview and lightweight app integration plausible.
Creators should still treat this as promising rather than solved: the Show HN thread includes complaints about install bloat and pronunciation failures on numbers.

What shipped

Hacker News

KittenML/KittenTTS

555 upvotes · 179 comments

According to the GitHub page, KittenTTS v0.8 is an open-source ONNX text-to-speech library with model sizes from 15M to 80M parameters. The smallest int8 model is listed at 25MB, while the larger 80M model is framed as high-quality synthesis that can run on CPU without a GPU. For creative tooling, the practical package is the Python API, downloadable Hugging Face checkpoints, and a browser demo linked from the same project page.

Where it looks useful — and where it may break

Hacker News

Discussion around Show HN: Three new Kitten TTS models – smallest less than 25MB

555 upvotes · 179 comments

The strongest creative angle is local voice generation where size and runtime matter more than studio-grade polish. In the discussion roundup, one user reports about 1.5x realtime on an Intel 9700 CPU with the 80M model, while another calls a 25MB model genuinely exciting for edge deployment because dependency chains often block small-device shipping.

The same thread also shows why audio teams should test before committing. A commenter in the main HN thread says Linux installation pulled in enough NVIDIA libraries to become a disk problem, and another reports that number pronunciation degraded into noise. That makes v0.8 more compelling as an experimental local voice layer than a drop-in production narrator.

TL;DR

What shipped

Where it looks useful — and where it may break

Discussion across the web