SGLang
High-performance serving framework for LLM and VLM workloads.
Stories
Filter storiesLMSYS and Modal shipped DFlash plus Spec V2 in SGLang, claiming 4.3x baseline throughput and 1.5x native MTP on Qwen3.5-397B-A17B. It cuts latency and serving cost for very large open models.
SGLang v0.5.12 added native DeepSeek V4 support with ShadowRadix prefix caching, HiSparse CPU-extended KV, MegaMoE kernels, and Blackwell MLA work. The release broadens hardware targets and improves long-context serving efficiency for open runtimes.
OpenBMB released MiniCPM-V 4.6 1.3B, claiming 55.8 percent lower vision-encoding FLOPs, 75.7 ms TTFT on a 4090, and about 1.5x token throughput over Qwen3.5 0.8B. It targets edge deployment across mobile platforms and common inference stacks.
OpenClaw beta added live control of a real Chrome session through Chrome DevTools MCP; the project also added native SGLang provider support and parallel tool calling work. Try it if you need self-hosted agents to handle authenticated browser flows with local inference backends.