Skip to content
AI Primer
TOPIC25 stories

Local Inference

Stories, products, and related signals connected to this tag in Explore.

WORKFLOW24th June
Krea 2 Turbo community releases GGUF ports: RTX 3090 tests report 1.9x int8 speedups

Builders published GGUF conversions, loader nodes, and local benchmarks for Krea 2 Turbo after yesterday’s open-weights release, alongside new multi-style and watercolor tests. The follow-up matters because creators now have clearer ways to run, tune, and style-push Krea locally on smaller VRAM budgets.

RELEASE23rd June
Krea 2 Turbo releases open weights with ComfyUI workflows and 8 GB community ports

Krea 2 Turbo arrived with open weights, commercial rights, ComfyUI workflows, and community GGUF and FP8 ports that users say can run locally on modest hardware. Early benchmarks praise speed and style range, while some testers flag drift toward recognizable IP.

NEWS1w ago
GLM 5.2 claims 1M-token local coding as builders compare it with Opus 4.8

Builders across X described GLM 5.2 as a surprisingly capable local coding model, citing MIT licensing, a 1M-token context window, and experiments on desktop or distributed GPU setups. The shift matters because it reopens local-first website and code workflows for vibe coders, though hardware cost and throughput still lag cloud subscriptions.

NEWS2w ago
Fable 5 restricts access during jailbreak dispute as creators post last-build demos

Creators reported that Fable 5 access was pulled or restricted during a jailbreak dispute, then shared games, sites, and videos made before the cutoff. The restriction pushes users back to Opus 4.8 or local setups for one-shot creative coding.

RELEASE2w ago
Topaz Labs releases 2x faster Mac image enhancement for Wonder, Denoise Max, and Super Focus 3

Topaz Labs says Mac users can now run Wonder, Denoise Max, Super Focus 3, and Face Recovery 3 at roughly 2x local speed, with up to 4x depending on hardware and image size. Faster local processing cuts waiting time on large still enhancement workflows.

RELEASE3w ago
Gemma 4 12B releases with 256K context and unified audio-vision input

Google’s new Gemma 4 12B ships as an encoder-free open model for text, image, audio, and video tasks with a 256K context window. Early GGUF ports and local benchmarks make it a plausible on-device multimodal option for creator tooling and experimentation.

RELEASE3w ago
RTX Spark launches with 128GB unified memory and 1 petaflop AI compute

NVIDIA launched RTX Spark as a 128GB unified-memory, 1-petaflop AI PC platform, with 30 laptops and 10 desktops due this fall. Watch for local Photoshop, Premiere, Substance 3D, and upscaling workflows to move onto the box.

RELEASE4w ago
VibeMotion-1 releases pre-alpha Figma import and prompt-to-MP4 renders

VibeMotion-1 released a pre-alpha local editor that imports Figma frames and layers, animates them from prompts, previews with LTX 2.3, and renders MP4s. The repo targets motion work without After Effects or DaVinci, but the launch is explicitly early and seeking breakage reports.

RELEASE1mo ago
BenchLocal releases v0.2.6 with offline-skip runs for tight-VRAM tests

BenchLocal v0.2.6 adds reachability checks so offline local models are skipped and resumed instead of breaking side-by-side tests. The update is aimed at tight-VRAM setups where creators and tinkerers load providers one after another on the same machine.

RELEASE1mo ago
Supertone opens Supertonic with ONNX on-device TTS

Supertone open-sourced Supertonic, a local TTS engine that runs faster than real time on phone CPUs with ONNX models and cross-language runtimes. Voice apps and audiobook workflows can use it to avoid per-character API billing and keep audio generation private.

RELEASE1mo ago
Harbor releases v0.4.18 with Open Design and Voicebox

Harbor 0.4.18 added one-command access to Open Design and Voicebox, bundling a local-first design app and a voice cloning and TTS studio inside one homelab layer. The release cuts setup friction, so users can migrate both tools into a single local install path.

RELEASE1mo ago
LTX 2.3 launches video-to-video mode with Depth control

LTX 2.3 added video-to-video restyling, and creators are using frame-derived reference images plus Depth mode to flip clips into new looks. Reddit and ComfyUI users also report Ampere INT8 runs dropping from 118.77s to 66.45s and easier batch assembly in agent pipelines.

RELEASE2mo ago
DeepSeek V4 Preview opens 1M context with Flash and Pro variants

DeepSeek V4 Preview surfaced as an open-source 1M-context model family, with early docs and community testing pointing to Flash and Pro variants. The release matters for creators and vibe coders looking at self-hosted options, but most performance claims are still coming from first-wave community benchmarks.

RELEASE2mo ago
Modly releases local image-to-3D mesh generation

Posts introduced Modly as a fully local image-to-3D tool that turns one image into a mesh with drag-and-drop input and no cloud API. The release matters because 3D asset generation stays on-device, with current reporting concentrated in a single launch thread.

RELEASE2mo ago
LTX 2.3 adds distilled LoRA v1.1 for better motion-audio sync

Stable Diffusion and VFX creators say LTX 2.3's distilled LoRA v1.1 improves motion and custom-audio sync. Posts show local short-film and flight-shot workflows running through ComfyUI and Resolve on consumer GPUs.

RELEASE2mo ago
VoxCPM releases 2B voice model with 3-second cloning and 30-language support

OpenBMB released VoxCPM on GitHub with text-described voice design, 3-second cloning, 48kHz audio, and 30-language support. The Apache 2.0 release makes multilingual voice work and local self-hosting cheaper.

RELEASE2mo ago
Google DeepMind releases Gemma 4 under Apache 2.0 with 31B Dense, 26B MoE, and 256K context

Google DeepMind shipped four Gemma 4 models with multimodal input, including 31B Dense, 26B MoE, and two edge variants available through AI Studio, Hugging Face, Kaggle, and Ollama. Early community tests say local performance and usable context windows still vary by runtime, quantization, and GPU memory.

RELEASE3mo ago
Cohere opens Transcribe 2B weights with a browser demo

Browser demo posts and a Hugging Face release surfaced Cohere Transcribe 2B as part of a wider open-audio week that also featured Voxtral 4B TTS. The model gives creators a multilingual ASR option that can live closer to local or browser workflows.

NEWS3mo ago
KittenTTS supports 25MB ONNX voice models as HN debates prosody

Hacker News discussion around KittenTTS has shifted to edge deployment, streaming latency, expressive control, and prosody rather than new model changes. The 25MB ONNX footprint keeps it attractive for CPU and on-device use, but voice quality is still the production boundary.

RELEASE3mo ago
KittenTTS releases 25MB nano model for CPU text-to-speech

KittenTTS now offers nano, micro and mini text-to-speech models, with the smallest int8 build under 25MB and built for ONNX CPU inference. Creators can run local voice tools without a cloud round trip.

RELEASE3mo ago
KittenTTS releases 25MB nano voice model with CPU-only ONNX runtime

KittenTTS 0.8 ships new 15M, 40M and 80M models, including an int8 nano model around 25MB that runs on CPU without GPU. It is a fit for narration, character voices and lightweight assistants that need offline or edge-friendly speech.

RELEASE3mo ago
ComfyUI adds App mode for simpler local image generation with Z-Image

A new creator tutorial says ComfyUI now has a simpler App-style mode and pairs it with Z-Image for fast local image generation. Local workflows are getting easier to start, so try it if you want to avoid node-heavy graph building on day one.

RELEASE3mo ago
KittenTTS releases v0.8 with a 25MB int8 model and CPU-only speech synthesis

KittenML's latest open-source TTS release spans 15M to 80M models, with the smallest coming in under 25MB and the larger one reportedly running faster than realtime on CPU. Audio creators should test pronunciation and install overhead before betting on it for edge or local voice tools.

WORKFLOW3mo ago
Claude Code supports local Ollama backends with qwen3-coder 30b and qwen2.5-coder 7b

A tutorial thread showed how to route Claude Code through Ollama, choose a local coding model, and point Claude at a local base URL for private work. Use it if you want agent-style coding on your own machine without cloud API spend.

RELEASE3mo ago
Black Forest Labs claims FLUX.2 [klein] 9B adds 2x faster multi-reference editing

Black Forest Labs says FLUX.2 [klein] 9B is now up to 2x faster for multi-reference editing at the same price, with new FP8 weights for leaner local runs. Retest reference-heavy edit pipelines if speed or local deployment was a blocker.

AI PrimerAI Primer

Your daily guide to AI tools, workflows, and creative inspiration.

© 2026 AI Primer. All rights reserved.