Skip to content
AI Primer
TOPIC14 stories

Local Inference

Stories, products, and related signals connected to this tag in Explore.

RELEASE12th May
LTX 2.3 launches video-to-video mode with Depth control

LTX 2.3 added video-to-video restyling, and creators are using frame-derived reference images plus Depth mode to flip clips into new looks. Reddit and ComfyUI users also report Ampere INT8 runs dropping from 118.77s to 66.45s and easier batch assembly in agent pipelines.

RELEASE3w ago
DeepSeek V4 Preview opens 1M context with Flash and Pro variants

DeepSeek V4 Preview surfaced as an open-source 1M-context model family, with early docs and community testing pointing to Flash and Pro variants. The release matters for creators and vibe coders looking at self-hosted options, but most performance claims are still coming from first-wave community benchmarks.

RELEASE3w ago
Modly releases local image-to-3D mesh generation

Posts introduced Modly as a fully local image-to-3D tool that turns one image into a mesh with drag-and-drop input and no cloud API. The release matters because 3D asset generation stays on-device, with current reporting concentrated in a single launch thread.

RELEASE3w ago
LTX 2.3 adds distilled LoRA v1.1 for better motion-audio sync

Stable Diffusion and VFX creators say LTX 2.3's distilled LoRA v1.1 improves motion and custom-audio sync. Posts show local short-film and flight-shot workflows running through ComfyUI and Resolve on consumer GPUs.

RELEASE4w ago
VoxCPM releases 2B voice model with 3-second cloning and 30-language support

OpenBMB released VoxCPM on GitHub with text-described voice design, 3-second cloning, 48kHz audio, and 30-language support. The Apache 2.0 release makes multilingual voice work and local self-hosting cheaper.

RELEASE1mo ago
Google DeepMind releases Gemma 4 under Apache 2.0 with 31B Dense, 26B MoE, and 256K context

Google DeepMind shipped four Gemma 4 models with multimodal input, including 31B Dense, 26B MoE, and two edge variants available through AI Studio, Hugging Face, Kaggle, and Ollama. Early community tests say local performance and usable context windows still vary by runtime, quantization, and GPU memory.

RELEASE1mo ago
Cohere opens Transcribe 2B weights with a browser demo

Browser demo posts and a Hugging Face release surfaced Cohere Transcribe 2B as part of a wider open-audio week that also featured Voxtral 4B TTS. The model gives creators a multilingual ASR option that can live closer to local or browser workflows.

NEWS1mo ago
KittenTTS supports 25MB ONNX voice models as HN debates prosody

Hacker News discussion around KittenTTS has shifted to edge deployment, streaming latency, expressive control, and prosody rather than new model changes. The 25MB ONNX footprint keeps it attractive for CPU and on-device use, but voice quality is still the production boundary.

RELEASE1mo ago
KittenTTS releases 25MB nano model for CPU text-to-speech

KittenTTS now offers nano, micro and mini text-to-speech models, with the smallest int8 build under 25MB and built for ONNX CPU inference. Creators can run local voice tools without a cloud round trip.

RELEASE1mo ago
ComfyUI adds App mode for simpler local image generation with Z-Image

A new creator tutorial says ComfyUI now has a simpler App-style mode and pairs it with Z-Image for fast local image generation. Local workflows are getting easier to start, so try it if you want to avoid node-heavy graph building on day one.

RELEASE1mo ago
KittenTTS releases 25MB nano voice model with CPU-only ONNX runtime

KittenTTS 0.8 ships new 15M, 40M and 80M models, including an int8 nano model around 25MB that runs on CPU without GPU. It is a fit for narration, character voices and lightweight assistants that need offline or edge-friendly speech.

RELEASE1mo ago
KittenTTS releases v0.8 with a 25MB int8 model and CPU-only speech synthesis

KittenML's latest open-source TTS release spans 15M to 80M models, with the smallest coming in under 25MB and the larger one reportedly running faster than realtime on CPU. Audio creators should test pronunciation and install overhead before betting on it for edge or local voice tools.

WORKFLOW2mo ago
Claude Code supports local Ollama backends with qwen3-coder 30b and qwen2.5-coder 7b

A tutorial thread showed how to route Claude Code through Ollama, choose a local coding model, and point Claude at a local base URL for private work. Use it if you want agent-style coding on your own machine without cloud API spend.

RELEASE2mo ago
Black Forest Labs claims FLUX.2 [klein] 9B adds 2x faster multi-reference editing

Black Forest Labs says FLUX.2 [klein] 9B is now up to 2x faster for multi-reference editing at the same price, with new FP8 weights for leaner local runs. Retest reference-heavy edit pipelines if speed or local deployment was a blocker.

AI PrimerAI Primer

Your daily guide to AI tools, workflows, and creative inspiration.

© 2026 AI Primer. All rights reserved.