Skip to content
AI Primer
TOPIC28 stories

Voice

Stories, products, and related signals connected to this tag in Explore.

RELEASE9th May
Grok Voice Think Fast 1.0 adds concierge and reservation tool templates

xAI rolled out Grok Voice Think Fast 1.0 with ready-made tool schemas for medical offices, restaurants, help desks, real estate, appointments, and hotel concierge tasks. The release lowers setup work because common service actions arrive pre-wired as callable tools.

WORKFLOW5th May
Seedance 2.0 supports multi-speaker lip sync in live-action and animation

Curious Refuge posted tests showing Seedance 2.0 syncing multiple speakers from a reference image plus blacked-out video or audio, using shot-by-shot dialogue prompts. The workflow moves Seedance closer to directed dialogue scenes, but prompt wording and voice guidance still affect stability.

RELEASE1w ago
Runway launches Characters with 24fps HD video agents

Runway launched Characters, a real-time system that turns one image into a conversational HD video agent. The company says replies start in 1.75 seconds and stream above 24 fps, so live avatar workflows are moving closer to production use.

RELEASE1w ago
Apocalypse Drone adds 128 AI players and ElevenLabs radio voices

Apocalypse Drone added 128 AI players, squad leader reassignment, and ElevenLabs radio chatter with location callouts in weekend dev updates. It matters for solo game builders because the project is simulating large-team coordination and voice comms on a lightweight stack instead of a bigger live-ops setup.

RELEASE1w ago
Pika launches Claude MCP with podcast, explainer and UGC ad skills

Pika launched a Claude connector that turns prompts, URLs and repos into explainers, podcast clips and UGC ads. The update keeps face, voice and identity controls inside Claude workflows, so creators can build video assets without switching apps.

RELEASE2w ago
OpenClaw adds voice personas with 43ms first output benchmarks

OpenClaw contributors posted a voice-persona feature and fresh performance numbers that cut first output from 1s to 43ms. Separate posts describe 300-user sandboxed deployments and stronger PR, CI, and testing workflows, pointing to team-scale use beyond hobby demos.

RELEASE2w ago
Grok Imagine adds lip sync and multi-speaker audio to video clips

Creator posts say Grok Imagine's video update can make one-shot clips with spoken audio, stronger lip sync and support for multiple speakers, pets and varied face angles. The demos also show selfie-to-scene transforms and timeline prompting, but the rollout is documented mainly through independent testing.

RELEASE2w ago
Cappy launches video editing in iMessage and RCS

Cappy launched as a text-message video editor that plans, cuts, captions, voices, and revises clips inside iMessage or RCS threads. Creators can start from raw footage, photos, audio, or URLs without opening a conventional timeline.

RELEASE3w ago
BytePlus launches Seedance 2.0 API with multimodal inputs and scene extension

BytePlus launched the Seedance 2.0 API, and creator tests showed image, video, audio, and text inputs, scene extension, voice-synced delivery, and steadier physics. The move brings Seedance from app-only access into repeatable production pipelines and custom workflows.

RELEASE3w ago
Gemini 3.1 Flash TTS adds Audio Tags, 70-language support, and SynthID

Gemini 3.1 Flash TTS added Audio Tags, 70-plus language support, and SynthID watermarking for generated speech. The preview spans Gemini API, AI Studio, Vertex AI, and Google Vids, so teams can test delivery control before adopting it.

RELEASE4w ago
Runway adds Character video-call links for Zoom, Meet, and Teams

Runway now lets a Character join video meetings from a pasted Zoom, Google Meet, or Teams link. The feature extends Runway Characters from rendered clips into live meeting stand-ins, so watch the launch demo and early reactions for reliability.

RELEASE4w ago
HeyGen launches CLI for one-command avatar video and batch translation

HeyGen released a Mac and Linux CLI that creates avatar videos, lip-synced translations, voice matches, and photo avatars from terminal commands. The binary returns structured JSON and wait flags, which makes video generation scriptable for localization and agent workflows.

RELEASE4w ago
Runway Characters adds custom voices from text prompts with API access

Runway added prompt-generated custom voices for Characters in the web app and API. Creators can now define tone and persona from text instead of recording or cloning a source voice first, which should speed up voice setup.

WORKFLOW1mo ago
Suno users report v5.5 misses duet tags and instrument cues despite stronger vocals

Reddit posts said v5.5 improved voice tone but still ignores gender-labeled sections, switches singers mid-part, and struggles with detailed instrument instructions. Creators are iterating on renders until the emotion fits, then generating lipsync video to work around the gaps.

RELEASE1mo ago
Pika launches PikaStream 1.0 video chat skill for Google Meet and any agent

Pika released a beta skill that lets Pika AI Selfs and third-party agents join Google Meet with real-time face and voice, and published the integration on GitHub. Pika says memory and personality persist across calls, while beta notes and user posts report glitches as the feature expands beyond Pika’s own agents.

RELEASE1mo ago
Cohere opens Transcribe 2B weights with a browser demo

Browser demo posts and a Hugging Face release surfaced Cohere Transcribe 2B as part of a wider open-audio week that also featured Voxtral 4B TTS. The model gives creators a multilingual ASR option that can live closer to local or browser workflows.

NEWS1mo ago
KittenTTS supports 25MB ONNX voice models as HN debates prosody

Hacker News discussion around KittenTTS has shifted to edge deployment, streaming latency, expressive control, and prosody rather than new model changes. The 25MB ONNX footprint keeps it attractive for CPU and on-device use, but voice quality is still the production boundary.

RELEASE1mo ago
KittenTTS releases 25MB nano model for CPU text-to-speech

KittenTTS now offers nano, micro and mini text-to-speech models, with the smallest int8 build under 25MB and built for ONNX CPU inference. Creators can run local voice tools without a cloud round trip.

RELEASE1mo ago
Gemini 3.1 Flash Live launches with 90.8% ComplexFuncBench audio score

Google says its new realtime voice model improves noisy-environment understanding, long conversations and function calling, and it's rolling into Gemini Live, Search Live and AI Studio. Voice creators can test it for lower-latency spoken interactions.

RELEASE1mo ago
Lightning V3.1 releases 10-second voice cloning with 44.1kHz output and sub-100ms latency

Smallest says Lightning V3.1 can clone a voice from about 10 seconds of audio with 44.1kHz output, sub-100ms latency and 50-plus languages on Waves. Test it for multilingual narration and dubbing, but get explicit permission before cloning any voice.

RELEASE1mo ago
KittenTTS releases 25MB nano voice model with CPU-only ONNX runtime

KittenTTS 0.8 ships new 15M, 40M and 80M models, including an int8 nano model around 25MB that runs on CPU without GPU. It is a fit for narration, character voices and lightweight assistants that need offline or edge-friendly speech.

RELEASE1mo ago
KittenTTS releases v0.8 with a 25MB int8 model and CPU-only speech synthesis

KittenML's latest open-source TTS release spans 15M to 80M models, with the smallest coming in under 25MB and the larger one reportedly running faster than realtime on CPU. Audio creators should test pronunciation and install overhead before betting on it for edge or local voice tools.

NEWS1mo ago
Variety reports As Deep as the Grave used generative AI for Val Kilmer's performance

Variety reports that As Deep as the Grave used generative AI to create Val Kilmer's performance, with material supplied by his family and their backing for the release. For filmmakers, it is an early consent-based case study in digital resurrection where rights and audience expectations matter.

RELEASE1mo ago
Fun-CineForge opens multi-speaker dubbing with temporal modality and a dataset pipeline

Tongyi Lab opened Fun-CineForge with multi-speaker dubbing, temporal modality for off-screen or blocked faces, and a full dataset-building pipeline. It matters for dialogue and localization workflows that break on hard cuts, overlapping speech, or missing lip cues.

RELEASE1mo ago
Grok launches Text-to-Speech API with expressive controls and LiveKit support

xAI released Grok's Text-to-Speech API with natural voices, expressive controls, and LiveKit support; creators are also using Grok Imagine in reference-image and cartoon animation workflows. Try it if you want Grok in a broader voice-and-motion stack instead of chat alone.

WORKFLOW2mo ago
Seedance 2.0 supports wildlife-documentary narration and character SFX, creators report

Creators report Seedance 2.0 is being used for wildlife-documentary scenes with built-in narration prompts and character clips with sound effects. Test it if you want a faster path from prompt to finished short without a separate voice pass.

RELEASE2mo ago
Freepik launches Speak: lip-synced videos in 30+ languages, up to 5 minutes

Freepik launched Speak, which turns an image plus text or audio into a lip-synced talking video with 30+ languages and a 5-minute cap. Use it for UGC ads, localized product demos, and fast talking-head tests without reshoots.

RELEASE2mo ago
Runway launches Characters API: real-time avatars with custom voices and knowledge banks

Runway opened Characters on its developer platform with API access, custom voices, embedded knowledge, and a free starter allowance. Use it to build interactive hosts, guides, and assistants that can talk through tasks instead of relying on passive video.

AI PrimerAI Primer

Your daily guide to AI tools, workflows, and creative inspiration.

© 2026 AI Primer. All rights reserved.