Skip to content
AI Primer
MODEL27 stories

Qwen

Stories, products, and related signals connected to this tag in Explore.

RELEASE28th June
DeepSeek releases DSpark checkpoints for Qwen3 and Gemma-4

DeepSeek extended DSpark beyond V4 by publishing draft-model checkpoints for Qwen3 and Gemma-4 families and clarifying that DSpark targets higher-throughput serving by controlling verification cost. The release matters because speculative decoding is moving from papers into reusable open checkpoints.

RELEASE2w ago
SGLang adds DFlash and Spec V2 with 4.3x Qwen3.5-397B-A17B throughput

LMSYS and Modal shipped DFlash plus Spec V2 in SGLang, claiming 4.3x baseline throughput and 1.5x native MTP on Qwen3.5-397B-A17B. It cuts latency and serving cost for very large open models.

NEWS3w ago
Framework Max+ 395 benchmarks close to M5 Max on Qwen3-TTS with GGML Vulkan

A local benchmark on a 128GB Framework system reported Qwen3-TTS performance close to an M5 Max using a GGML Vulkan backend. The result suggests AMD Strix hardware can approach Apple-class local TTS speed without MLX or Metal.

NEWS4w ago
Hyper, OpenCode, Kilo, and Vals add Qwen 3.7 Plus support within 72 hours

Two days after Qwen 3.7 Plus launched, Hyper, OpenCode, Kilo, and Vals shipped support or rankings around the 1M-context multimodal model. The rapid pickup shows Alibaba’s new model landing quickly in coding-agent tools and public eval stacks outside its own platform.

RELEASE4w ago
Qwen releases Qwen 3.7 Plus with multimodal agent mode and browser demos

Alibaba released Qwen 3.7 Plus as a multimodal agent model for GUI, CLI, coding, and browser tasks. It ships with browser demos and immediate Cline support, giving teams another frontier-style agent model to compare against M3 and closed-source tools.

RELEASE1mo ago
Qwen3.7 Max ships implicit caching for no-setup context reuse

Alibaba rolled out implicit caching for Qwen3.7 Max, automatically reusing repeated context without user setup. The update also lands with fresh benchmark results and broader coding-agent support across OpenCode and Hermes Agent.

NEWS1mo ago
Qwen 3.7 Max users report 5-minute cache creation, $43 vibe-coding bills, and uneven task quality

A day after Qwen 3.7 Max launched, users posted both standout benchmark wins and rough real-work reports, including 5-minute cache creation and $43 in 15 minutes of vibe coding. That matters because teams evaluating coding agents are seeing a gap between leaderboard strength and per-task reliability.

RELEASE1mo ago
Qwen3.7 Max launches with 1M context, 35-hour autonomy, and 56.6 AA Index

Alibaba launched Qwen3.7 Max as its new flagship agent model with 1M context, stronger coding and reasoning scores, and cross-harness benchmarks. OpenRouter, Together, AI Gateway, and Kilo support it on day one, making it ready for immediate deployment.

NEWS1mo ago
Qwen opens 3.7 Max Preview and Plus Preview on Arena with a #10 coding rank

Alibaba put Qwen3.7 Max Preview and Qwen3.7 Plus Preview live on Arena and the Qwen site, with Arena placing Max Preview #13 overall and #10 for coding. That gives engineers an early read on the next Qwen generation before any broader API or open-weight release.

RELEASE1mo ago
Unsloth updates Qwen3.5 MTP GGUFs with draft-mtp flags for 1.8x speed

Unsloth said its updated Qwen3.5 MTP GGUFs now run about 1.8x faster after llama.cpp added spec-draft-p-min 0.75 and renamed the mode to draft-mtp. The update also raises draft-token settings and expands the small-model MTP set for local runners.

NEWS1mo ago
Perplexity benchmarks Qwen3 235B on GB200 NVL72: NVLS latency drops from 586 µs to 313 µs

Perplexity published serving results for post-trained Qwen3 235B on NVIDIA GB200 NVL72 and argues Blackwell materially outperforms Hopper for large MoE inference. The deltas show up in NVLS all-reduce latency, MoE prefill combine time, and high-speed decode throughput.

NEWS1mo ago
Local users report DeepSeek V4 Flash, Qwen 3.6, and Gemma 4 at 40-200 tok/s on Macs and 3090s

Developers posted new local-model measurements for DS4, Qwen 3.6, and Gemma 4: about 40 tok/s on an M3 Ultra, 70+ tok/s on MacBooks with MPS, and 120-200 tok/s for Qwen3.6-27B on a single RTX 3090. The numbers suggest coding-capable local runs are moving from demos toward regular use.

RELEASE1mo ago
DFlash adds Qwen3-8B speculator with 82.2% first-token acceptance

Posts said Qwen3-8B now has a DFlash speculator with 82.2% first-token acceptance and 3.74 accepted tokens per step, alongside broader DFlash claims of over 6x lossless acceleration. It matters because the release turns a decoding paper into a concrete speculative-inference artifact engineers can test against existing Qwen stacks.

RELEASE2mo ago
Qwen-Scope releases SAE toolkit for Qwen3.5-27B steering

Alibaba’s Qwen team released Qwen-Scope, an open sparse-autoencoder suite for Qwen3.5-27B that can steer outputs, surface repetition features, and compare benchmark feature overlap. The toolkit turns interpretability artifacts into debugging, data-generation, and evaluation workflows.

RELEASE2mo ago
FlashQLA releases TileLang linear-attention kernels with 2–3x forward speedups

Alibaba Qwen introduced FlashQLA, a TileLang-based linear-attention kernel stack that reports 2–3x faster forward passes and 2x faster backward passes. The release gives edge and long-context deployments a new optimization lever below the model layer itself.

NEWS2mo ago
Qwen3.6 community ships MLX and 3-bit quants with 40-56 tok/s local agent runs

Builders published new MLX and 3-bit Qwen3.6 quants and shared reproducible local benchmarks from M3 Ultra, RTX 5070, and Radeon AI Pro setups. That gives local-agent teams concrete deployment options beyond launch-day claims, though memory budgets and long-context tool use still limit larger workflows.

RELEASE2mo ago
Qwen-Image-2.0-Pro launches at #9 on Arena with multilingual text rendering

Alibaba launched Qwen-Image-2.0-Pro on ModelScope and API with better prompt adherence, multilingual typography, and steadier style quality. The model is aimed at text-heavy jobs like UI mockups and posters, so test it for layout-heavy generation.

RELEASE2mo ago
Qwen3.6-27B releases with 77.2 SWE-Bench Verified and Apache 2.0

Alibaba released Qwen3.6-27B, a dense open model with multimodal input and thinking or non-thinking modes that beats Qwen3.5-397B-A17B across major coding benchmarks. Day-one support across vLLM, SGLang, Ollama, llama.cpp, GGUF, and MLX makes it ready for local and hosted coding agents.

RELEASE2mo ago
Qwen launches Qwen3.6-Max-Preview on Qwen Chat with AA Index 52

Qwen put Qwen3.6-Max-Preview live on Qwen Chat as an early flagship preview with stronger agentic coding and world-knowledge claims. Early testers report strong first-pass results, but the Max line remains closed rather than open-sourced.

NEWS2mo ago
Qwen3.6-35B-A3B benchmarks 40 tok/s on M3 Ultra with Strix Halo follow-ups

Fresh local reports put Qwen3.6-35B-A3B around 40 tok/s on M3 Ultra, extended testing to Strix Halo, and wired it into OpenClaw and Pi-style harnesses. The update matters because Qwen3.6 is moving from quant benchmarks into real local coding-agent loops with clearer hardware limits.

WORKFLOW2mo ago
Unsloth benchmarks Qwen3.6-35B-A3B GGUF quants at 20-40 tok/s on local rigs

Unsloth published GGUF quant benchmarks for Qwen3.6-35B-A3B while practitioners shared local setup guides and long-context agent runs on Apple silicon and high-RAM desktops. The sparse 35B model is becoming a credible local coding-agent option, but speed and reasoning quality still vary by quant and offload strategy.

RELEASE2mo ago
Qwen3.6-35B-A3B releases Apache 2.0 sparse MoE with 3B active params

Alibaba open-sourced Qwen3.6-35B-A3B, a 35B multimodal sparse MoE with only 3B active parameters under Apache 2.0. Same-day support from vLLM, Ollama, SGLang, and GGUF builders makes it immediately usable for local and production coding workloads.

RELEASE2mo ago
Qwen Code updates v0.14.2 with Channels, Cron Jobs, and Qwen3.6-Plus

Qwen Code added phone-based control via Telegram, DingTalk, and WeChat, scheduled agent loops, per-subagent model selection, and a planning mode before execution. The release also centers Qwen3.6-Plus, which Alibaba says offers 1M context and 1,000 free daily requests, while Vals ranked the model #17 overall and #11 multimodal.

NEWS3mo ago
OpenRouter says Qwen3.6-Plus hits 1.4T tokens in a day

OpenRouter said Qwen3.6-Plus became its first model to exceed about 1.4 trillion tokens in a day, and Qwen said the model also moved to No. 1 on the service. The milestone adds a concrete deployment signal beyond benchmark scores and preview availability, so track usage data alongside evals.

RELEASE3mo ago
Qwen3.6-Plus launches with 1M context and Code Arena #8 ranking

Alibaba launched Qwen3.6-Plus with a 1M default context window, stronger coding and multimodal performance, and rollout across chat, API, and routing partners. Benchmarks and partner availability make it a new high-end option for agentic coding and web tasks.

RELEASE3mo ago
Qwen releases Qwen3.5-Omni with 10-hour audio and 400s video support

Alibaba launched Qwen3.5-Omni across Lite, Flash, Plus, and Plus-Realtime variants for native text, image, audio, and video understanding, plus realtime voice controls and script-level captioning. The family targets long multimodal sessions and live interaction, so watch the understanding-focused limits if you need media generation.

NEWS3mo ago
ATLAS benchmarks Qwen3-14B at 74.6% LiveCodeBench on one RTX 5060 Ti

The ATLAS harness says a frozen Qwen3-14B Q4 model on one RTX 5060 Ti reached 74.6% pass@1-v(k=3) on LiveCodeBench v5 through multi-pass repair and selection. The result shifts comparison toward harness design, though HN commenters note it is not a one-shot head-to-head with hosted frontier models.

AI PrimerAI Primer

Your daily guide to AI tools, workflows, and creative inspiration.

© 2026 AI Primer. All rights reserved.