Voice Agents — Explore AI Tools & Stories

Fresh stories

OpenClaw releases 2026.5.20 with Discord voice follow and secret warnings

OpenClaw 2026.5.20 adds Discord voice sessions that follow configured users, plus doctor checks for plaintext secrets in config files. The release also improves xAI headless login, clarifies model status, and fixes stuck Windows installs.

ReleaseOpenClaw21st May

Breaking

ElevenLabs launches Speech Engine at 8¢ per minute for chat-to-voice agents

ElevenLabs launched Speech Engine, a layer that adds transcription, speech synthesis, turn-taking, and interruption handling on top of an existing chat agent. The release pairs SDKs, one-command setup, and 8¢-per-minute pricing for production voice agents.

New

Voice Agents·20th May·3 min read

New

OpenClaw 2026.5.18 ships Grok OAuth, Android Talk Mode, and dialog-aware browser actions

OpenClaw 2026.5.18 shipped Grok OAuth and sidecar auth fixes, realtime Android Talk Mode, Telegram forum-topic delivery fixes, and better browser dialog handling. The release removes several auth and UI dead-ends that can stall long agent runs.

ReleaseOpenClaw18th May

New

Gemini users report Canvas and Fast mode routing to 3.2 variants ahead of I/O

Multiple users posted reproducible steps and videos showing Gemini app UI changes, Thinking Level rollout, and Fast mode or Canvas sessions that look like 3.2 or 3.5-class routing. This matters because Google appears to be testing new model paths and app surfaces in production ahead of I/O, though the exact model names remain unconfirmed.

Gemini17th May

Breaking

Thinking Machines introduces interaction models with 200 ms full-duplex audio, video, and tool use

Thinking Machines previewed interaction models that process audio, video, and text in 200 ms micro-turns, letting the system listen, speak, and react at the same time. The demos matter because the interaction loop is trained into the model instead of stitched together from separate speech and tool layers.

New

Multimodal·1w ago·6 min read

New

Pi community ships `pi-listens`, `pi-kanban`, and `pi-codex-conversion` in one-day extension burst

Independent Pi builders shipped a voice layer, a kanban and observability dashboard, a Codex-conversion tool with `apply_patch`, and smaller UI extensions in the same window. The burst matters because it turns Pi from a single coding agent into a real local-first extension ecosystem with voice, review, and workflow primitives.

Coding Agents2w ago

New

ElevenLabs cuts Flash TTS 55%, Scribe 45%, and Agents 20% with pay-as-you-go billing

ElevenLabs lowered self-serve pricing for ElevenAPI and ElevenAgents and added pay-as-you-go billing. The biggest listed drops are to $0.05 per 1,000 tokens for Flash TTS, $0.22 for Scribe v2 speech-to-text, and $0.08 per minute for agent calls.

Voice Agents2w ago

Breaking

OpenAI adds GPT-Realtime-2, Translate, and Whisper to the Realtime API

OpenAI added GPT-Realtime-2, GPT-Realtime-Translate, and GPT-Realtime-Whisper to the Realtime API. The update gives voice agents live reasoning, translation, and transcription, but it remains API-only rather than part of ChatGPT voice mode.

New

Voice Agents·2w ago·6 min read

New

Realtime TTS-2 releases with sub-200 ms TTFA and 100+ languages

Realtime TTS-2 ships as a low-latency speech model that conditions on prior audio turns, not just text, and claims sub-200 ms time-to-first-audio across 100+ languages. The release matters for voice-agent stacks because Replicate and LiveKit are already exposing it for real-time integration work.

ReleaseVoice Agents2w ago

New

ElevenLabs releases Agent Templates with 50+ support, SDR, and training workflows

ElevenLabs launched Agent Templates, a library of pre-configured conversational agents for support, education, sales, and internal enablement. That shortens the setup path for teams that want to deploy voice or chat agents without starting from a blank flow.

Voice Agents3w ago

See all stories →

OpenClaw releases 2026.5.20 with Discord voice follow and secret warnings

ReleaseOpenClaw21st May

New20th May

ElevenLabs launches Speech Engine at 8¢ per minute for chat-to-voice agents

ReleaseVoice Agents20th May

New18th May

OpenClaw 2026.5.18 ships Grok OAuth, Android Talk Mode, and dialog-aware browser actions

ReleaseOpenClaw18th May

New17th May

Gemini users report Canvas and Fast mode routing to 3.2 variants ahead of I/O

Gemini17th May

Thinking Machines introduces interaction models with 200 ms full-duplex audio, video, and tool use

ReleaseMultimodal1w ago

Pi community ships `pi-listens`, `pi-kanban`, and `pi-codex-conversion` in one-day extension burst

Coding Agents2w ago

ElevenLabs cuts Flash TTS 55%, Scribe 45%, and Agents 20% with pay-as-you-go billing

Voice Agents2w ago

OpenAI adds GPT-Realtime-2, Translate, and Whisper to the Realtime API

ReleaseVoice Agents2w ago

Realtime TTS-2 releases with sub-200 ms TTFA and 100+ languages

ReleaseVoice Agents2w ago

ElevenLabs releases Agent Templates with 50+ support, SDR, and training workflows

Voice Agents3w ago

Briefs forMay 21

Daily AI Digest

Get the best stories delivered
to your inbox

Skills Spotlighttop by stars

View all skills

🎨 Design

New

baoyu-comic

Knowledge comics (知识漫画): educational, biography, tutorial.

by NousResearch · 5 days ago165.1k

🤖 ML/AI

comfyui

Generate images, video, and audio with ComfyUI — install, launch, manage nodes/models, run workflows with parameter injection. Uses the official comfy-cli for lifecycle and direct REST/WebSocket API for execution.

by NousResearch · 24 days ago165.1k

🤖 ML/AI

hyperframes

Create HTML-based video compositions, animated title cards, social overlays, captioned talking-head videos, audio-reactive visuals, and shader transitions using HyperFrames. HTML is the source of truth for video. Use when the user wants a rendered MP4/WebM from an HTML composition, wants to animate text/logos/charts over media, needs captions synced to audio, wants TTS narration, or wants to convert a website into a video.

by NousResearch · 19 days ago165.1k

Explore what's new in AI

Filters

Fresh stories

OpenClaw releases 2026.5.20 with Discord voice follow and secret warnings

ElevenLabs launches Speech Engine at 8¢ per minute for chat-to-voice agents

OpenClaw 2026.5.18 ships Grok OAuth, Android Talk Mode, and dialog-aware browser actions

Gemini users report Canvas and Fast mode routing to 3.2 variants ahead of I/O

Thinking Machines introduces interaction models with 200 ms full-duplex audio, video, and tool use

Pi community ships `pi-listens`, `pi-kanban`, and `pi-codex-conversion` in one-day extension burst

ElevenLabs cuts Flash TTS 55%, Scribe 45%, and Agents 20% with pay-as-you-go billing

OpenAI adds GPT-Realtime-2, Translate, and Whisper to the Realtime API

Realtime TTS-2 releases with sub-200 ms TTFA and 100+ languages

ElevenLabs releases Agent Templates with 50+ support, SDR, and training workflows

OpenClaw releases 2026.5.20 with Discord voice follow and secret warnings

ElevenLabs launches Speech Engine at 8¢ per minute for chat-to-voice agents

OpenClaw 2026.5.18 ships Grok OAuth, Android Talk Mode, and dialog-aware browser actions

Gemini users report Canvas and Fast mode routing to 3.2 variants ahead of I/O

Thinking Machines introduces interaction models with 200 ms full-duplex audio, video, and tool use

Pi community ships `pi-listens`, `pi-kanban`, and `pi-codex-conversion` in one-day extension burst

ElevenLabs cuts Flash TTS 55%, Scribe 45%, and Agents 20% with pay-as-you-go billing

OpenAI adds GPT-Realtime-2, Translate, and Whisper to the Realtime API

Realtime TTS-2 releases with sub-200 ms TTFA and 100+ languages

ElevenLabs releases Agent Templates with 50+ support, SDR, and training workflows

Briefs forMay 21

Daily AI Digest

Skills Spotlighttop by stars

baoyu-comic

comfyui

hyperframes