Skip to content
AI Primer

Alibaba Cloud's Qwen family of language models.

Pricing

Model profile · Current snapshot
Input / 1M
$0.70
Output / 1M
$2.80
Blended / 1M
$1.23
Output TPS
69.24
TTFT (s)
1.15

Model Intelligence

Arena ranking
11
Benchmarkable
No
Model level
family
Intelligence Index
5.1
Math Index
24.3
MMLU Pro
0.64
GPQA
0.45
HLE
0.03
LiveCodeBench
0.2
SciCode
0.17
MATH-500
0.83
AIME
0.24
AIME 2025
0.24
IFBench
0.29
LCR
0
TerminalBench Hard
0.02
TAU2
0.25

Recent stories

26 linked stories
releaseSECONDARY2026-06-15
SGLang adds DFlash and Spec V2 with 4.3x Qwen3.5-397B-A17B throughput

LMSYS and Modal shipped DFlash plus Spec V2 in SGLang, claiming 4.3x baseline throughput and 1.5x native MTP on Qwen3.5-397B-A17B. It cuts latency and serving cost for very large open models.

newsPRIMARY2026-06-07
Framework Max+ 395 benchmarks close to M5 Max on Qwen3-TTS with GGML Vulkan

A local benchmark on a 128GB Framework system reported Qwen3-TTS performance close to an M5 Max using a GGML Vulkan backend. The result suggests AMD Strix hardware can approach Apple-class local TTS speed without MLX or Metal.

newsPRIMARY2026-06-03
Hyper, OpenCode, Kilo, and Vals add Qwen 3.7 Plus support within 72 hours

Two days after Qwen 3.7 Plus launched, Hyper, OpenCode, Kilo, and Vals shipped support or rankings around the 1M-context multimodal model. The rapid pickup shows Alibaba’s new model landing quickly in coding-agent tools and public eval stacks outside its own platform.

releasePRIMARY2026-06-01
Qwen releases Qwen 3.7 Plus with multimodal agent mode and browser demos

Alibaba released Qwen 3.7 Plus as a multimodal agent model for GUI, CLI, coding, and browser tasks. It ships with browser demos and immediate Cline support, giving teams another frontier-style agent model to compare against M3 and closed-source tools.

newsSECONDARY2026-05-27
Hermes Agent integrates MCP Catalog, Qwen3.7 Max, Venice, and Krea 2 in one window

Hermes Agent added a built-in MCP Catalog while separate builders shipped Qwen3.7 Max support, Venice private-model workflows, and Krea 2 image generation. The cluster shows Hermes moving beyond a single-model assistant toward a broader agent shell with tool, model, and media providers.

releasePRIMARY2026-05-26
Qwen3.7 Max ships implicit caching for no-setup context reuse

Alibaba rolled out implicit caching for Qwen3.7 Max, automatically reusing repeated context without user setup. The update also lands with fresh benchmark results and broader coding-agent support across OpenCode and Hermes Agent.

newsPRIMARY2026-05-22
Qwen 3.7 Max users report 5-minute cache creation, $43 vibe-coding bills, and uneven task quality

A day after Qwen 3.7 Max launched, users posted both standout benchmark wins and rough real-work reports, including 5-minute cache creation and $43 in 15 minutes of vibe coding. That matters because teams evaluating coding agents are seeing a gap between leaderboard strength and per-task reliability.

releasePRIMARY2026-05-21
Qwen3.7 Max launches with 1M context, 35-hour autonomy, and 56.6 AA Index

Alibaba launched Qwen3.7 Max as its new flagship agent model with 1M context, stronger coding and reasoning scores, and cross-harness benchmarks. OpenRouter, Together, AI Gateway, and Kilo support it on day one, making it ready for immediate deployment.

newsPRIMARY2026-05-18
Qwen opens 3.7 Max Preview and Plus Preview on Arena with a #10 coding rank

Alibaba put Qwen3.7 Max Preview and Qwen3.7 Plus Preview live on Arena and the Qwen site, with Arena placing Max Preview #13 overall and #10 for coding. That gives engineers an early read on the next Qwen generation before any broader API or open-weight release.

releaseSECONDARY2026-05-15
Unsloth updates Qwen3.5 MTP GGUFs with draft-mtp flags for 1.8x speed

Unsloth said its updated Qwen3.5 MTP GGUFs now run about 1.8x faster after llama.cpp added spec-draft-p-min 0.75 and renamed the mode to draft-mtp. The update also raises draft-token settings and expands the small-model MTP set for local runners.

newsSECONDARY2026-05-12
Perplexity benchmarks Qwen3 235B on GB200 NVL72: NVLS latency drops from 586 µs to 313 µs

Perplexity published serving results for post-trained Qwen3 235B on NVIDIA GB200 NVL72 and argues Blackwell materially outperforms Hopper for large MoE inference. The deltas show up in NVLS all-reduce latency, MoE prefill combine time, and high-speed decode throughput.

releaseSECONDARY2026-05-10
DFlash adds Qwen3-8B speculator with 82.2% first-token acceptance

Posts said Qwen3-8B now has a DFlash speculator with 82.2% first-token acceptance and 3.74 accepted tokens per step, alongside broader DFlash claims of over 6x lossless acceleration. It matters because the release turns a decoding paper into a concrete speculative-inference artifact engineers can test against existing Qwen stacks.

newsPRIMARY2026-05-10
Local users report DeepSeek V4 Flash, Qwen 3.6, and Gemma 4 at 40-200 tok/s on Macs and 3090s

Developers posted new local-model measurements for DS4, Qwen 3.6, and Gemma 4: about 40 tok/s on an M3 Ultra, 70+ tok/s on MacBooks with MPS, and 120-200 tok/s for Qwen3.6-27B on a single RTX 3090. The numbers suggest coding-capable local runs are moving from demos toward regular use.

releaseSECONDARY2026-04-30
Qwen-Scope releases SAE toolkit for Qwen3.5-27B steering

Alibaba’s Qwen team released Qwen-Scope, an open sparse-autoencoder suite for Qwen3.5-27B that can steer outputs, surface repetition features, and compare benchmark feature overlap. The toolkit turns interpretability artifacts into debugging, data-generation, and evaluation workflows.

releaseSECONDARY2026-04-29
FlashQLA releases TileLang linear-attention kernels with 2–3x forward speedups

Alibaba Qwen introduced FlashQLA, a TileLang-based linear-attention kernel stack that reports 2–3x faster forward passes and 2x faster backward passes. The release gives edge and long-context deployments a new optimization lever below the model layer itself.

newsPRIMARY2026-04-26
Qwen3.6 community ships MLX and 3-bit quants with 40-56 tok/s local agent runs

Builders published new MLX and 3-bit Qwen3.6 quants and shared reproducible local benchmarks from M3 Ultra, RTX 5070, and Radeon AI Pro setups. That gives local-agent teams concrete deployment options beyond launch-day claims, though memory budgets and long-context tool use still limit larger workflows.

releasePRIMARY2026-04-25
Qwen-Image-2.0-Pro launches at #9 on Arena with multilingual text rendering

Alibaba launched Qwen-Image-2.0-Pro on ModelScope and API with better prompt adherence, multilingual typography, and steadier style quality. The model is aimed at text-heavy jobs like UI mockups and posters, so test it for layout-heavy generation.

releasePRIMARY2026-04-22
Qwen3.6-27B releases with 77.2 SWE-Bench Verified and Apache 2.0

Alibaba released Qwen3.6-27B, a dense open model with multimodal input and thinking or non-thinking modes that beats Qwen3.5-397B-A17B across major coding benchmarks. Day-one support across vLLM, SGLang, Ollama, llama.cpp, GGUF, and MLX makes it ready for local and hosted coding agents.

releasePRIMARY2026-04-20
Qwen launches Qwen3.6-Max-Preview on Qwen Chat with AA Index 52

Qwen put Qwen3.6-Max-Preview live on Qwen Chat as an early flagship preview with stronger agentic coding and world-knowledge claims. Early testers report strong first-pass results, but the Max line remains closed rather than open-sourced.

newsPRIMARY2026-04-19
Qwen3.6-35B-A3B benchmarks 40 tok/s on M3 Ultra with Strix Halo follow-ups

Fresh local reports put Qwen3.6-35B-A3B around 40 tok/s on M3 Ultra, extended testing to Strix Halo, and wired it into OpenClaw and Pi-style harnesses. The update matters because Qwen3.6 is moving from quant benchmarks into real local coding-agent loops with clearer hardware limits.

workflowPRIMARY2026-04-17
Unsloth benchmarks Qwen3.6-35B-A3B GGUF quants at 20-40 tok/s on local rigs

Unsloth published GGUF quant benchmarks for Qwen3.6-35B-A3B while practitioners shared local setup guides and long-context agent runs on Apple silicon and high-RAM desktops. The sparse 35B model is becoming a credible local coding-agent option, but speed and reasoning quality still vary by quant and offload strategy.

releasePRIMARY2026-04-16
Qwen3.6-35B-A3B releases Apache 2.0 sparse MoE with 3B active params

Alibaba open-sourced Qwen3.6-35B-A3B, a 35B multimodal sparse MoE with only 3B active parameters under Apache 2.0. Same-day support from vLLM, Ollama, SGLang, and GGUF builders makes it immediately usable for local and production coding workloads.

releasePRIMARY2026-04-10
Qwen Code updates v0.14.2 with Channels, Cron Jobs, and Qwen3.6-Plus

Qwen Code added phone-based control via Telegram, DingTalk, and WeChat, scheduled agent loops, per-subagent model selection, and a planning mode before execution. The release also centers Qwen3.6-Plus, which Alibaba says offers 1M context and 1,000 free daily requests, while Vals ranked the model #17 overall and #11 multimodal.

newsSECONDARY2026-04-03
OpenRouter says Qwen3.6-Plus hits 1.4T tokens in a day

OpenRouter said Qwen3.6-Plus became its first model to exceed about 1.4 trillion tokens in a day, and Qwen said the model also moved to No. 1 on the service. The milestone adds a concrete deployment signal beyond benchmark scores and preview availability, so track usage data alongside evals.

releasePRIMARY2026-04-02
Qwen3.6-Plus launches with 1M context and Code Arena #8 ranking

Alibaba launched Qwen3.6-Plus with a 1M default context window, stronger coding and multimodal performance, and rollout across chat, API, and routing partners. Benchmarks and partner availability make it a new high-end option for agentic coding and web tasks.

releasePRIMARY2026-03-30
Qwen releases Qwen3.5-Omni with 10-hour audio and 400s video support

Alibaba launched Qwen3.5-Omni across Lite, Flash, Plus, and Plus-Realtime variants for native text, image, audio, and video understanding, plus realtime voice controls and script-level captioning. The family targets long multimodal sessions and live interaction, so watch the understanding-focused limits if you need media generation.

AI PrimerAI Primer

Your daily guide to AI tools, workflows, and creative inspiration.

© 2026 AI Primer. All rights reserved.