Voice
Stories, products, and related signals connected to this tag in Explore.
Stories
Filter storiesxAI rolled out Grok Voice Think Fast 1.0 with ready-made tool schemas for medical offices, restaurants, help desks, real estate, appointments, and hotel concierge tasks. The release lowers setup work because common service actions arrive pre-wired as callable tools.
Curious Refuge posted tests showing Seedance 2.0 syncing multiple speakers from a reference image plus blacked-out video or audio, using shot-by-shot dialogue prompts. The workflow moves Seedance closer to directed dialogue scenes, but prompt wording and voice guidance still affect stability.
Runway launched Characters, a real-time system that turns one image into a conversational HD video agent. The company says replies start in 1.75 seconds and stream above 24 fps, so live avatar workflows are moving closer to production use.
Apocalypse Drone added 128 AI players, squad leader reassignment, and ElevenLabs radio chatter with location callouts in weekend dev updates. It matters for solo game builders because the project is simulating large-team coordination and voice comms on a lightweight stack instead of a bigger live-ops setup.
Pika launched a Claude connector that turns prompts, URLs and repos into explainers, podcast clips and UGC ads. The update keeps face, voice and identity controls inside Claude workflows, so creators can build video assets without switching apps.
OpenClaw contributors posted a voice-persona feature and fresh performance numbers that cut first output from 1s to 43ms. Separate posts describe 300-user sandboxed deployments and stronger PR, CI, and testing workflows, pointing to team-scale use beyond hobby demos.
Creator posts say Grok Imagine's video update can make one-shot clips with spoken audio, stronger lip sync and support for multiple speakers, pets and varied face angles. The demos also show selfie-to-scene transforms and timeline prompting, but the rollout is documented mainly through independent testing.
Cappy launched as a text-message video editor that plans, cuts, captions, voices, and revises clips inside iMessage or RCS threads. Creators can start from raw footage, photos, audio, or URLs without opening a conventional timeline.
BytePlus launched the Seedance 2.0 API, and creator tests showed image, video, audio, and text inputs, scene extension, voice-synced delivery, and steadier physics. The move brings Seedance from app-only access into repeatable production pipelines and custom workflows.
Gemini 3.1 Flash TTS added Audio Tags, 70-plus language support, and SynthID watermarking for generated speech. The preview spans Gemini API, AI Studio, Vertex AI, and Google Vids, so teams can test delivery control before adopting it.
Runway now lets a Character join video meetings from a pasted Zoom, Google Meet, or Teams link. The feature extends Runway Characters from rendered clips into live meeting stand-ins, so watch the launch demo and early reactions for reliability.
HeyGen released a Mac and Linux CLI that creates avatar videos, lip-synced translations, voice matches, and photo avatars from terminal commands. The binary returns structured JSON and wait flags, which makes video generation scriptable for localization and agent workflows.
Runway added prompt-generated custom voices for Characters in the web app and API. Creators can now define tone and persona from text instead of recording or cloning a source voice first, which should speed up voice setup.
Reddit posts said v5.5 improved voice tone but still ignores gender-labeled sections, switches singers mid-part, and struggles with detailed instrument instructions. Creators are iterating on renders until the emotion fits, then generating lipsync video to work around the gaps.
Pika released a beta skill that lets Pika AI Selfs and third-party agents join Google Meet with real-time face and voice, and published the integration on GitHub. Pika says memory and personality persist across calls, while beta notes and user posts report glitches as the feature expands beyond Pika’s own agents.
Browser demo posts and a Hugging Face release surfaced Cohere Transcribe 2B as part of a wider open-audio week that also featured Voxtral 4B TTS. The model gives creators a multilingual ASR option that can live closer to local or browser workflows.
Hacker News discussion around KittenTTS has shifted to edge deployment, streaming latency, expressive control, and prosody rather than new model changes. The 25MB ONNX footprint keeps it attractive for CPU and on-device use, but voice quality is still the production boundary.
KittenTTS now offers nano, micro and mini text-to-speech models, with the smallest int8 build under 25MB and built for ONNX CPU inference. Creators can run local voice tools without a cloud round trip.
Google says its new realtime voice model improves noisy-environment understanding, long conversations and function calling, and it's rolling into Gemini Live, Search Live and AI Studio. Voice creators can test it for lower-latency spoken interactions.
Smallest says Lightning V3.1 can clone a voice from about 10 seconds of audio with 44.1kHz output, sub-100ms latency and 50-plus languages on Waves. Test it for multilingual narration and dubbing, but get explicit permission before cloning any voice.
KittenTTS 0.8 ships new 15M, 40M and 80M models, including an int8 nano model around 25MB that runs on CPU without GPU. It is a fit for narration, character voices and lightweight assistants that need offline or edge-friendly speech.
KittenML's latest open-source TTS release spans 15M to 80M models, with the smallest coming in under 25MB and the larger one reportedly running faster than realtime on CPU. Audio creators should test pronunciation and install overhead before betting on it for edge or local voice tools.
Variety reports that As Deep as the Grave used generative AI to create Val Kilmer's performance, with material supplied by his family and their backing for the release. For filmmakers, it is an early consent-based case study in digital resurrection where rights and audience expectations matter.
Tongyi Lab opened Fun-CineForge with multi-speaker dubbing, temporal modality for off-screen or blocked faces, and a full dataset-building pipeline. It matters for dialogue and localization workflows that break on hard cuts, overlapping speech, or missing lip cues.
xAI released Grok's Text-to-Speech API with natural voices, expressive controls, and LiveKit support; creators are also using Grok Imagine in reference-image and cartoon animation workflows. Try it if you want Grok in a broader voice-and-motion stack instead of chat alone.
Creators report Seedance 2.0 is being used for wildlife-documentary scenes with built-in narration prompts and character clips with sound effects. Test it if you want a faster path from prompt to finished short without a separate voice pass.
Freepik launched Speak, which turns an image plus text or audio into a lip-synced talking video with 30+ languages and a 5-minute cap. Use it for UGC ads, localized product demos, and fast talking-head tests without reshoots.
Runway opened Characters on its developer platform with API access, custom voices, embedded knowledge, and a free starter allowance. Use it to build interactive hosts, guides, and assistants that can talk through tasks instead of relying on passive video.