Fresh stories
ElevenLabs launches Speech Engine at 8¢ per minute for chat-to-voice agents
ElevenLabs launched Speech Engine, a layer that adds transcription, speech synthesis, turn-taking, and interruption handling on top of an existing chat agent. The release pairs SDKs, one-command setup, and 8¢-per-minute pricing for production voice agents.

Thinking Machines introduces interaction models with 200 ms full-duplex audio, video, and tool use
Thinking Machines previewed interaction models that process audio, video, and text in 200 ms micro-turns, letting the system listen, speak, and react at the same time. The demos matter because the interaction loop is trained into the model instead of stitched together from separate speech and tool layers.

OpenAI adds GPT-Realtime-2, Translate, and Whisper to the Realtime API
OpenAI added GPT-Realtime-2, GPT-Realtime-Translate, and GPT-Realtime-Whisper to the Realtime API. The update gives voice agents live reasoning, translation, and transcription, but it remains API-only rather than part of ChatGPT voice mode.


OpenClaw releases 2026.5.20 with Discord voice follow and secret warnings
OpenClaw 2026.5.20 adds Discord voice sessions that follow configured users, plus doctor checks for plaintext secrets in config files. The release also improves xAI headless login, clarifies model status, and fixes stuck Windows installs.

ElevenLabs launches Speech Engine at 8¢ per minute for chat-to-voice agents
ElevenLabs launched Speech Engine, a layer that adds transcription, speech synthesis, turn-taking, and interruption handling on top of an existing chat agent. The release pairs SDKs, one-command setup, and 8¢-per-minute pricing for production voice agents.

OpenClaw 2026.5.18 ships Grok OAuth, Android Talk Mode, and dialog-aware browser actions
OpenClaw 2026.5.18 shipped Grok OAuth and sidecar auth fixes, realtime Android Talk Mode, Telegram forum-topic delivery fixes, and better browser dialog handling. The release removes several auth and UI dead-ends that can stall long agent runs.

Gemini users report Canvas and Fast mode routing to 3.2 variants ahead of I/O
Multiple users posted reproducible steps and videos showing Gemini app UI changes, Thinking Level rollout, and Fast mode or Canvas sessions that look like 3.2 or 3.5-class routing. This matters because Google appears to be testing new model paths and app surfaces in production ahead of I/O, though the exact model names remain unconfirmed.
Thinking Machines introduces interaction models with 200 ms full-duplex audio, video, and tool use
Pi community ships `pi-listens`, `pi-kanban`, and `pi-codex-conversion` in one-day extension burst
ElevenLabs cuts Flash TTS 55%, Scribe 45%, and Agents 20% with pay-as-you-go billing
OpenAI adds GPT-Realtime-2, Translate, and Whisper to the Realtime API
Briefs forMay 21

Daily AI Digest
Get the best stories delivered
to your inbox
Skills Spotlighttop by stars
baoyu-comic
Knowledge comics (知识漫画): educational, biography, tutorial.
comfyui
Generate images, video, and audio with ComfyUI — install, launch, manage nodes/models, run workflows with parameter injection. Uses the official comfy-cli for lifecycle and direct REST/WebSocket API for execution.
hyperframes
Create HTML-based video compositions, animated title cards, social overlays, captioned talking-head videos, audio-reactive visuals, and shader transitions using HyperFrames. HTML is the source of truth for video. Use when the user wants a rendered MP4/WebM from an HTML composition, wants to animate text/logos/charts over media, needs captions synced to audio, wants TTS narration, or wants to convert a website into a video.



