updateMarch 27, 2026

KittenTTS supports 25MB ONNX voice models as HN debates prosody

Hacker News discussion around KittenTTS has shifted to edge deployment, streaming latency, expressive control, and prosody rather than new model changes. The 25MB ONNX footprint keeps it attractive for CPU and on-device use, but voice quality is still the production boundary.

Local Inference Voice

2 min read

KittenTTS supports 25MB ONNX voice models as HN debates prosody

TL;DR

KittenTTS is the new part here: the GitHub page describes three ONNX text-to-speech models from 15M to 80M parameters, with the smallest int8 model landing at 25MB and targeting CPU use without a GPU.
For creative voice workflows, the HN thread has moved past launch hype to a narrower question: whether these compact models sound expressive enough for narration, character voices, and other production work.
The strongest upside in the discussion roundup is deployability: commenters call out the small footprint as unusually practical for edge and on-device setups, especially compared with heavier torch-and-CUDA stacks.
The same HN thread also surfaces the main limit: creators are still asking about prosody, expressive tags, and low-power streaming latency, which suggests quality control remains the real boundary rather than download size.

What is the real creative takeaway?

Hacker Newspage560 points182 comments

KittenML/KittenTTS: State-of-the-art TTS model under 25MB

Posted by rohan_joshi

Kitten TTS is an open-source, lightweight text-to-speech library built on ONNX with models from 15M to 80M parameters (25-80 MB). It supports CPU inference without GPU, features 8 built-in voices, adjustable speed, text preprocessing, and 24 kHz output. Latest release v0.8.1 (Feb 2026) includes nano (15M/int8 25MB), micro (40M), and mini (80M) models. Python pip install available, with basic API for generation and file output. Repo has 13k stars, Apache 2.0 license.

Open linked page Open HN thread

KittenTTS looks useful because the packaging is unusually light, not because it solves voice performance. The repo says v0.8.1 ships nano, micro, and mini models, supports eight built-in voices, adjustable speed, text preprocessing, 24 kHz output, and a simple Python install path through the project page.

Hacker Newsdiscussion560 points182 comments

Discussion around Show HN: Three new Kitten TTS models – smallest less than 25MB

Posted by rohan_joshi

Thread discussion highlights: - baibai008989 on edge deployment / dependency bloat: the dependency chain issue is a real barrier for edge deployment... anything that pulls torch + cuda makes the whole thing a non-starter. 25MB is genuinely exciting for that use case. - bobokaytop on latency and real-time use: the practical bottleneck for most edge deployments isn't model size -- it's the inference latency on low-power hardware and the audio streaming architecture around it. - altruios on expressive control: One of the core features I look for is expressive control... How does it handle expressive tags?

Discussed by

baibai008989 on edge deployment / dependency bloat
bobokaytop on latency and real-time use
altruios on expressive control

Open HN thread Open HN thread

What the community is stress-testing is the part creatives actually feel in finished work. In the Hacker News discussion, one commenter says 25MB is exciting because dependency bloat can kill edge deployment, while others push on the harder questions: whether inference stays responsive on low-power hardware, whether audio streaming is smooth, and whether expressive control and prosody are good enough for real narration instead of just demo clips, as reflected in the HN thread.

🧾 More sources

Hacker Newscore560 points182 comments

Show HN: Three new Kitten TTS models – smallest less than 25MB

Posted by rohan_joshi

For creatives and voice-tool users, the interesting part is the promise of small, expressive TTS models with multiple voices and on-device running. The discussion centers on whether the voices sound good, how well prosody works, and whether expressive control is strong enough for production narration or voice apps.

Discussed by

baibai008989 on edge deployment / dependency bloat
bobokaytop on latency and real-time use
altruios on expressive control

Open HN thread Open HN thread

updateMarch 27, 2026

KittenTTS supports 25MB ONNX voice models as HN debates prosody

Local Inference Voice

2 min read

TL;DR

KittenTTS is the new part here: the GitHub page describes three ONNX text-to-speech models from 15M to 80M parameters, with the smallest int8 model landing at 25MB and targeting CPU use without a GPU.
For creative voice workflows, the HN thread has moved past launch hype to a narrower question: whether these compact models sound expressive enough for narration, character voices, and other production work.
The strongest upside in the discussion roundup is deployability: commenters call out the small footprint as unusually practical for edge and on-device setups, especially compared with heavier torch-and-CUDA stacks.
The same HN thread also surfaces the main limit: creators are still asking about prosody, expressive tags, and low-power streaming latency, which suggests quality control remains the real boundary rather than download size.

What is the real creative takeaway?

Hacker Newspage560 points182 comments

KittenML/KittenTTS: State-of-the-art TTS model under 25MB

Posted by rohan_joshi

Open linked page Open HN thread

Hacker Newsdiscussion560 points182 comments

Discussion around Show HN: Three new Kitten TTS models – smallest less than 25MB

Posted by rohan_joshi

Discussed by

baibai008989 on edge deployment / dependency bloat
bobokaytop on latency and real-time use
altruios on expressive control

Open HN thread Open HN thread

🧾 More sources

Hacker Newscore560 points182 comments

Show HN: Three new Kitten TTS models – smallest less than 25MB

Posted by rohan_joshi

Discussed by

baibai008989 on edge deployment / dependency bloat
bobokaytop on latency and real-time use
altruios on expressive control

Open HN thread Open HN thread