Local Inference
Stories, products, and related signals connected to this tag in Explore.
Stories
Filter storiesLTX 2.3 added video-to-video restyling, and creators are using frame-derived reference images plus Depth mode to flip clips into new looks. Reddit and ComfyUI users also report Ampere INT8 runs dropping from 118.77s to 66.45s and easier batch assembly in agent pipelines.
DeepSeek V4 Preview surfaced as an open-source 1M-context model family, with early docs and community testing pointing to Flash and Pro variants. The release matters for creators and vibe coders looking at self-hosted options, but most performance claims are still coming from first-wave community benchmarks.
Posts introduced Modly as a fully local image-to-3D tool that turns one image into a mesh with drag-and-drop input and no cloud API. The release matters because 3D asset generation stays on-device, with current reporting concentrated in a single launch thread.
Stable Diffusion and VFX creators say LTX 2.3's distilled LoRA v1.1 improves motion and custom-audio sync. Posts show local short-film and flight-shot workflows running through ComfyUI and Resolve on consumer GPUs.
OpenBMB released VoxCPM on GitHub with text-described voice design, 3-second cloning, 48kHz audio, and 30-language support. The Apache 2.0 release makes multilingual voice work and local self-hosting cheaper.
Google DeepMind shipped four Gemma 4 models with multimodal input, including 31B Dense, 26B MoE, and two edge variants available through AI Studio, Hugging Face, Kaggle, and Ollama. Early community tests say local performance and usable context windows still vary by runtime, quantization, and GPU memory.
Browser demo posts and a Hugging Face release surfaced Cohere Transcribe 2B as part of a wider open-audio week that also featured Voxtral 4B TTS. The model gives creators a multilingual ASR option that can live closer to local or browser workflows.
Hacker News discussion around KittenTTS has shifted to edge deployment, streaming latency, expressive control, and prosody rather than new model changes. The 25MB ONNX footprint keeps it attractive for CPU and on-device use, but voice quality is still the production boundary.
KittenTTS now offers nano, micro and mini text-to-speech models, with the smallest int8 build under 25MB and built for ONNX CPU inference. Creators can run local voice tools without a cloud round trip.
A new creator tutorial says ComfyUI now has a simpler App-style mode and pairs it with Z-Image for fast local image generation. Local workflows are getting easier to start, so try it if you want to avoid node-heavy graph building on day one.
KittenTTS 0.8 ships new 15M, 40M and 80M models, including an int8 nano model around 25MB that runs on CPU without GPU. It is a fit for narration, character voices and lightweight assistants that need offline or edge-friendly speech.
KittenML's latest open-source TTS release spans 15M to 80M models, with the smallest coming in under 25MB and the larger one reportedly running faster than realtime on CPU. Audio creators should test pronunciation and install overhead before betting on it for edge or local voice tools.
A tutorial thread showed how to route Claude Code through Ollama, choose a local coding model, and point Claude at a local base URL for private work. Use it if you want agent-style coding on your own machine without cloud API spend.
Black Forest Labs says FLUX.2 [klein] 9B is now up to 2x faster for multi-reference editing at the same price, with new FP8 weights for leaner local runs. Retest reference-heavy edit pipelines if speed or local deployment was a blocker.