Inference Optimization Qwen Benchmarks SGLang LLM Serving

DFlash

Software product named DFlash associated with Z Lab per the provided hint; exact functionality could not be verified from accessible sources.

Recent stories

2 linked stories

releaseSECONDARY2026-06-15

SGLang adds DFlash and Spec V2 with 4.3x Qwen3.5-397B-A17B throughput

LMSYS and Modal shipped DFlash plus Spec V2 in SGLang, claiming 4.3x baseline throughput and 1.5x native MTP on Qwen3.5-397B-A17B. It cuts latency and serving cost for very large open models.

releasePRIMARY2026-05-10

DFlash adds Qwen3-8B speculator with 82.2% first-token acceptance

Posts said Qwen3-8B now has a DFlash speculator with 82.2% first-token acceptance and 3.74 accepted tokens per step, alongside broader DFlash claims of over 6x lossless acceleration. It matters because the release turns a decoding paper into a concrete speculative-inference artifact engineers can test against existing Qwen stacks.