Fresh stories
Briefs forMay 11
Top storiesthis week
DFlash adds Qwen3-8B speculator with 82.2% first-token acceptance
Posts said Qwen3-8B now has a DFlash speculator with 82.2% first-token acceptance and 3.74 accepted tokens per step, alongside broader DFlash claims of over 6x lossless acceleration. It matters because the release turns a decoding paper into a concrete speculative-inference artifact engineers can test against existing Qwen stacks.


ERNIE 5.1 Preview ranks No. 4 on Search Arena and claims 6% pretraining cost
Baidu pushed ERNIE 5.1 Preview with new leaderboard claims, including No. 4 on Search Arena and No. 13 on LMArena Text. Treat the 6% pretraining cost claim cautiously until an independent technical report confirms it.

Gemma 4 MTP benchmarks 138 tok/s in llama.cpp on M5 Max
Community ports brought Gemma 4 multi-token prediction into llama.cpp and MLX Swift, with one M5 Max report moving from 97 to 138 tok/s and another showing 30-40% faster decoding. The gains extend MTP into local runtimes used for on-device coding and long-context work.

GPT-5.5 vs Opus 4.7: users compare plan mode, frontend output, and 120K-context use
User posts and HN threads compared GPT-5.5 and Opus 4.7 across plan mode, frontend work, and 120K-context sessions. The split results mean token burn and instruction discipline matter as much as raw benchmark scores.

Anthropic reports 'Teaching Claude why' cuts agentic misalignment by 3x
Anthropic said training Claude on principled responses and aligned fictional stories removed previously observed blackmail behavior in Claude 4 lab tests. The post matters because Anthropic says the broader interventions generalized better than narrow eval-matching examples and survived RL fine-tuning.

Daily AI Digest
Get the best stories delivered
to your inbox
Skills Spotlighttop by stars
comfyui
Generate images, video, and audio with ComfyUI — install, launch, manage nodes/models, run workflows with parameter injection. Uses the official comfy-cli for lifecycle and direct REST/WebSocket API for execution.
hyperframes
Create HTML-based video compositions, animated title cards, social overlays, captioned talking-head videos, audio-reactive visuals, and shader transitions using HyperFrames. HTML is the source of truth for video. Use when the user wants a rendered MP4/WebM from an HTML composition, wants to animate text/logos/charts over media, needs captions synced to audio, wants TTS narration, or wants to convert a website into a video.
kanban-orchestrator
Decomposition playbook + specialist-roster conventions + anti-temptation rules for an orchestrator profile routing work through Kanban. The "don't do the work yourself" rule and the basic lifecycle are auto-injected into every kanban worker's system prompt; this skill is the deeper playbook when you're specifically playing the orchestrator role.







