Fresh stories
Local users report DeepSeek V4 Flash, Qwen 3.6, and Gemma 4 at 40-200 tok/s on Macs and 3090s
Developers posted new local-model measurements for DS4, Qwen 3.6, and Gemma 4: about 40 tok/s on an M3 Ultra, 70+ tok/s on MacBooks with MPS, and 120-200 tok/s for Qwen3.6-27B on a single RTX 3090. The numbers suggest coding-capable local runs are moving from demos toward regular use.

ElevenLabs cuts Flash TTS 55%, Scribe 45%, and Agents 20% with pay-as-you-go billing
ElevenLabs lowered self-serve pricing for ElevenAPI and ElevenAgents and added pay-as-you-go billing. The biggest listed drops are to $0.05 per 1,000 tokens for Flash TTS, $0.22 for Scribe v2 speech-to-text, and $0.08 per minute for agent calls.


Anthropic doubles Claude Code 5-hour limits after SpaceX Colossus 1 compute deal
Anthropic said a SpaceX compute deal will add 300+ MW and 220,000+ NVIDIA GPUs, and it doubled Claude Code 5-hour limits across paid plans. It also raised Opus API ceilings; users should still watch the unchanged weekly caps.

GPT-5.5 users report 3.3M cached tokens and 2.5x /fast credits
Engineers shared fresh measurements on GPT-5.5 cache reuse, /fast pricing, and bug-finding budgets after comparison posts for GPT-5.5 and Opus 4.7 led the coding round-up. The reports suggest Codex cost and quality now swing on cache behavior and effort settings as much as on list prices.

Local users report DeepSeek V4 Flash, Qwen 3.6, and Gemma 4 at 40-200 tok/s on Macs and 3090s
Developers posted new local-model measurements for DS4, Qwen 3.6, and Gemma 4: about 40 tok/s on an M3 Ultra, 70+ tok/s on MacBooks with MPS, and 120-200 tok/s for Qwen3.6-27B on a single RTX 3090. The numbers suggest coding-capable local runs are moving from demos toward regular use.

OpenRouter launches Pareto Code with min_coding_score tiers and Nitro routing
OpenRouter released Pareto Code, which routes requests to the cheapest coding model above a chosen score threshold and can re-rank for speed with Nitro. Use the API to trade cost against latency with benchmark-based routing controls.

Firecrawl adds Highlights to /scrape with 100x fewer tokens
Firecrawl added a Highlights mode to /scrape that returns matching text, code, or tables for a query instead of full-page payloads. The release matters because the company benchmarked the feature on 10,000 URLs against Exa Highlights and aims it at lower-token agent retrieval.
ElevenLabs cuts Flash TTS 55%, Scribe 45%, and Agents 20% with pay-as-you-go billing
Google releases Gemini 3.1 Flash Lite GA with 1M context and $0.25 input pricing
Ramp Sheets launches Fast Ask RL subagent with +4% exact-match gain over Opus at Haiku latency
Anthropic doubles Claude Code 5-hour limits after SpaceX Colossus 1 compute deal
Briefs forMay 11

Daily AI Digest
Get the best stories delivered
to your inbox
Skills Spotlighttop by stars
comfyui
Generate images, video, and audio with ComfyUI — install, launch, manage nodes/models, run workflows with parameter injection. Uses the official comfy-cli for lifecycle and direct REST/WebSocket API for execution.
hyperframes
Create HTML-based video compositions, animated title cards, social overlays, captioned talking-head videos, audio-reactive visuals, and shader transitions using HyperFrames. HTML is the source of truth for video. Use when the user wants a rendered MP4/WebM from an HTML composition, wants to animate text/logos/charts over media, needs captions synced to audio, wants TTS narration, or wants to convert a website into a video.
kanban-orchestrator
Decomposition playbook + specialist-roster conventions + anti-temptation rules for an orchestrator profile routing work through Kanban. The "don't do the work yourself" rule and the basic lifecycle are auto-injected into every kanban worker's system prompt; this skill is the deeper playbook when you're specifically playing the orchestrator role.



