Inference & Infrastructure — Explore AI Tools & Stories

Fresh stories

New

OpenRouter launches MCP server with live pricing, benchmarks, and test inference

OpenRouter released an MCP server that lets agents query live model pricing, benchmark scores, provider data, docs, and run test inference from the CLI. That replaces stale model knowledge with current routing data inside long-running agent workflows.

ReleaseMCP25th June

OpenRouter launches MCP server with live pricing, benchmarks, and test inference

ReleaseMCP25th June

Briefs forJune 25

Top storiesthis week

See all →

Breaking

OpenRouter launches Image API with typed capabilities and exact USD cost

OpenRouter released a dedicated Image API that normalizes request shapes across 30-plus models from eight providers. Agents can inspect limits, passthrough options, streaming, and exact per-call cost without hardcoding vendor quirks.

New

Multimodal·24th June·3 min read

New

Vercel AI Gateway adds GLM-5.2 Fast at 150-250 tok/s

Vercel and Wafer launched a serverless GLM-5.2 endpoint on AI Gateway with 1M context and published pricing. Teams get a high-throughput open-model option inside an existing gateway instead of managing GLM inference directly.

ReleaseGLM24th June

GLM-5.2 adds Perplexity Agent API and Droid support on Baseten at >280 TPS

GLM-5.2 added Perplexity Agent API, Droid, and more hosting options, while Baseten reported over 280 TPS and sub-0.8s TTFT. Builders should watch the cost and benchmark data as it moves into production agent stacks.

GLM22nd June

New

Morph supports Qwen, GLM-5.2, MiniMax M3, DeepSeek v4 with 20-35% higher code acceptance

Morph said its code-serving stack now exposes Qwen, GLM-5.2, MiniMax M3, and DeepSeek v4 with code-tuned speculative decoding. It claims 20-35% higher acceptance than Eagle 3.1 or DFlash, plus kernels for cheaper hardware.

ReleaseModel Routing21st June

GLM-5.2 ships to BrowserCode, Hyper, OpenCode, and Together in 3 days

BrowserCode, Hyper, OpenCode, Together, and other vendors added GLM-5.2 soon after release. That turns the open model into a deployable option across coding, browser automation, and hosted chat.

GLM20th June

New

Ollama raises GLM-5.2 cloud capacity on NVIDIA B300s

Ollama said it doubled GPU capacity for GLM-5.2 cloud usage and said the model is currently hosted only in the US. The rollout adds capacity as open-model demand climbs, so users should check hosting and privacy details before deploying.

GLM20th June

New

Wafer claims GLM-5.2 hits 222 tok/s and 12.6s end-to-end

Wafer said its GLM-5.2 deployment leads Artificial Analysis on throughput and latency, and priced usage at $1.20 input and $4.10 output per million tokens. Compare serverless and dedicated endpoints if you need speed at scale.

GLM20th June

New

ComputeSDK releases 2026 100k Scale Invitational results across 6 sandbox providers

ComputeSDK published results from its 2026 100k Scale Invitational after weeks of reruns and infra tuning across Modal, Tensorlake, Northflank, Declaw AI, E2B, and Isorun. It matters because sandbox and agent infra claims now have a shared public concurrency target instead of vendor-specific load demos.

Agent Infrastructure19th June

See all stories →

New

Morph supports Qwen, GLM-5.2, MiniMax M3, DeepSeek v4 with 20-35% higher code acceptance

ReleaseModel Routing21st June

GLM-5.2 ships to BrowserCode, Hyper, OpenCode, and Together in 3 days

BrowserCode, Hyper, OpenCode, Together, and other vendors added GLM-5.2 soon after release. That turns the open model into a deployable option across coding, browser automation, and hosted chat.

GLM20th June

Ollama raises GLM-5.2 cloud capacity on NVIDIA B300s

GLM20th June

Wafer claims GLM-5.2 hits 222 tok/s and 12.6s end-to-end

GLM20th June

ComputeSDK releases 2026 100k Scale Invitational results across 6 sandbox providers

Agent Infrastructure19th June

Daily AI Digest

Get the best stories delivered
to your inbox

Skills Spotlighttop by stars

View all skills

✍️ Writing

New

creative-ideation

Generate ideas via named methods from creative practice.

by NousResearch · 2 days ago203.5k

🎨 Design

baoyu-comic

Knowledge comics (知识漫画): educational, biography, tutorial.

by NousResearch · 1 month ago203.5k

🤖 ML/AI

comfyui

Generate images, video, and audio with ComfyUI — install, launch, manage nodes/models, run workflows with parameter injection. Uses the official comfy-cli for lifecycle and direct REST/WebSocket API for execution.

by NousResearch · 1 month ago203.5k

Explore what's new in AI

Filters

Fresh stories

OpenRouter launches MCP server with live pricing, benchmarks, and test inference

OpenRouter launches MCP server with live pricing, benchmarks, and test inference

Briefs forJune 25

Top storiesthis week

OpenRouter launches Image API with typed capabilities and exact USD cost

Vercel AI Gateway adds GLM-5.2 Fast at 150-250 tok/s

GLM-5.2 adds Perplexity Agent API and Droid support on Baseten at >280 TPS

Morph supports Qwen, GLM-5.2, MiniMax M3, DeepSeek v4 with 20-35% higher code acceptance

GLM-5.2 ships to BrowserCode, Hyper, OpenCode, and Together in 3 days

Ollama raises GLM-5.2 cloud capacity on NVIDIA B300s

Wafer claims GLM-5.2 hits 222 tok/s and 12.6s end-to-end

ComputeSDK releases 2026 100k Scale Invitational results across 6 sandbox providers

OpenRouter launches Image API with typed capabilities and exact USD cost

Vercel AI Gateway adds GLM-5.2 Fast at 150-250 tok/s

GLM-5.2 adds Perplexity Agent API and Droid support on Baseten at >280 TPS

Morph supports Qwen, GLM-5.2, MiniMax M3, DeepSeek v4 with 20-35% higher code acceptance

GLM-5.2 ships to BrowserCode, Hyper, OpenCode, and Together in 3 days

Ollama raises GLM-5.2 cloud capacity on NVIDIA B300s

Wafer claims GLM-5.2 hits 222 tok/s and 12.6s end-to-end

ComputeSDK releases 2026 100k Scale Invitational results across 6 sandbox providers

Daily AI Digest

Skills Spotlighttop by stars

creative-ideation

baoyu-comic

comfyui