Skip to content
AI Primer

Alibaba Cloud's family of language models under the Qwen brand.

Pricing

Model profile · Current snapshot
Input / 1M
$0.60
Output / 1M
$3.60
Blended / 1M
$1.35
Output TPS
63.2
TTFT (s)
1.44

Model Intelligence

Arena ranking
37
Benchmarkable
No
Model level
family
Intelligence Index
20.8
Coding Index
16.5
Math Index
70.7
MMLU Pro
0.82
GPQA
0.71
HLE
0.06
LiveCodeBench
0.59
SciCode
0.36
AIME 2025
0.71
IFBench
0.43
LCR
0.32
TerminalBench Hard
0.07
TAU2
0.35

Recent stories

16 linked stories
newsSECONDARY2026-05-12
Perplexity benchmarks Qwen3 235B on GB200 NVL72: NVLS latency drops from 586 µs to 313 µs

Perplexity published serving results for post-trained Qwen3 235B on NVIDIA GB200 NVL72 and argues Blackwell materially outperforms Hopper for large MoE inference. The deltas show up in NVLS all-reduce latency, MoE prefill combine time, and high-speed decode throughput.

releaseSECONDARY2026-05-10
DFlash adds Qwen3-8B speculator with 82.2% first-token acceptance

Posts said Qwen3-8B now has a DFlash speculator with 82.2% first-token acceptance and 3.74 accepted tokens per step, alongside broader DFlash claims of over 6x lossless acceleration. It matters because the release turns a decoding paper into a concrete speculative-inference artifact engineers can test against existing Qwen stacks.

newsPRIMARY2026-05-10
Local users report DeepSeek V4 Flash, Qwen 3.6, and Gemma 4 at 40-200 tok/s on Macs and 3090s

Developers posted new local-model measurements for DS4, Qwen 3.6, and Gemma 4: about 40 tok/s on an M3 Ultra, 70+ tok/s on MacBooks with MPS, and 120-200 tok/s for Qwen3.6-27B on a single RTX 3090. The numbers suggest coding-capable local runs are moving from demos toward regular use.

releaseSECONDARY2026-04-30
Qwen-Scope releases SAE toolkit for Qwen3.5-27B steering

Alibaba’s Qwen team released Qwen-Scope, an open sparse-autoencoder suite for Qwen3.5-27B that can steer outputs, surface repetition features, and compare benchmark feature overlap. The toolkit turns interpretability artifacts into debugging, data-generation, and evaluation workflows.

releaseSECONDARY2026-04-29
FlashQLA releases TileLang linear-attention kernels with 2–3x forward speedups

Alibaba Qwen introduced FlashQLA, a TileLang-based linear-attention kernel stack that reports 2–3x faster forward passes and 2x faster backward passes. The release gives edge and long-context deployments a new optimization lever below the model layer itself.

newsPRIMARY2026-04-26
Qwen3.6 community ships MLX and 3-bit quants with 40-56 tok/s local agent runs

Builders published new MLX and 3-bit Qwen3.6 quants and shared reproducible local benchmarks from M3 Ultra, RTX 5070, and Radeon AI Pro setups. That gives local-agent teams concrete deployment options beyond launch-day claims, though memory budgets and long-context tool use still limit larger workflows.

releasePRIMARY2026-04-25
Qwen-Image-2.0-Pro launches at #9 on Arena with multilingual text rendering

Alibaba launched Qwen-Image-2.0-Pro on ModelScope and API with better prompt adherence, multilingual typography, and steadier style quality. The model is aimed at text-heavy jobs like UI mockups and posters, so test it for layout-heavy generation.

releasePRIMARY2026-04-22
Qwen3.6-27B releases with 77.2 SWE-Bench Verified and Apache 2.0

Alibaba released Qwen3.6-27B, a dense open model with multimodal input and thinking or non-thinking modes that beats Qwen3.5-397B-A17B across major coding benchmarks. Day-one support across vLLM, SGLang, Ollama, llama.cpp, GGUF, and MLX makes it ready for local and hosted coding agents.

releasePRIMARY2026-04-20
Qwen launches Qwen3.6-Max-Preview on Qwen Chat with AA Index 52

Qwen put Qwen3.6-Max-Preview live on Qwen Chat as an early flagship preview with stronger agentic coding and world-knowledge claims. Early testers report strong first-pass results, but the Max line remains closed rather than open-sourced.

newsPRIMARY2026-04-19
Qwen3.6-35B-A3B benchmarks 40 tok/s on M3 Ultra with Strix Halo follow-ups

Fresh local reports put Qwen3.6-35B-A3B around 40 tok/s on M3 Ultra, extended testing to Strix Halo, and wired it into OpenClaw and Pi-style harnesses. The update matters because Qwen3.6 is moving from quant benchmarks into real local coding-agent loops with clearer hardware limits.

workflowPRIMARY2026-04-17
Unsloth benchmarks Qwen3.6-35B-A3B GGUF quants at 20-40 tok/s on local rigs

Unsloth published GGUF quant benchmarks for Qwen3.6-35B-A3B while practitioners shared local setup guides and long-context agent runs on Apple silicon and high-RAM desktops. The sparse 35B model is becoming a credible local coding-agent option, but speed and reasoning quality still vary by quant and offload strategy.

releasePRIMARY2026-04-16
Qwen3.6-35B-A3B releases Apache 2.0 sparse MoE with 3B active params

Alibaba open-sourced Qwen3.6-35B-A3B, a 35B multimodal sparse MoE with only 3B active parameters under Apache 2.0. Same-day support from vLLM, Ollama, SGLang, and GGUF builders makes it immediately usable for local and production coding workloads.

releasePRIMARY2026-04-10
Qwen Code updates v0.14.2 with Channels, Cron Jobs, and Qwen3.6-Plus

Qwen Code added phone-based control via Telegram, DingTalk, and WeChat, scheduled agent loops, per-subagent model selection, and a planning mode before execution. The release also centers Qwen3.6-Plus, which Alibaba says offers 1M context and 1,000 free daily requests, while Vals ranked the model #17 overall and #11 multimodal.

newsSECONDARY2026-04-03
OpenRouter says Qwen3.6-Plus hits 1.4T tokens in a day

OpenRouter said Qwen3.6-Plus became its first model to exceed about 1.4 trillion tokens in a day, and Qwen said the model also moved to No. 1 on the service. The milestone adds a concrete deployment signal beyond benchmark scores and preview availability, so track usage data alongside evals.

releasePRIMARY2026-04-02
Qwen3.6-Plus launches with 1M context and Code Arena #8 ranking

Alibaba launched Qwen3.6-Plus with a 1M default context window, stronger coding and multimodal performance, and rollout across chat, API, and routing partners. Benchmarks and partner availability make it a new high-end option for agentic coding and web tasks.

releasePRIMARY2026-03-30
Qwen releases Qwen3.5-Omni with 10-hour audio and 400s video support

Alibaba launched Qwen3.5-Omni across Lite, Flash, Plus, and Plus-Realtime variants for native text, image, audio, and video understanding, plus realtime voice controls and script-level captioning. The family targets long multimodal sessions and live interaction, so watch the understanding-focused limits if you need media generation.

AI PrimerAI Primer

Your daily guide to AI tools, workflows, and creative inspiration.

© 2026 AI Primer. All rights reserved.