Skip to content
AI Primer
release

Qwen launches Qwen3.6-Max-Preview on Qwen Chat with AA Index 52

Qwen put Qwen3.6-Max-Preview live on Qwen Chat as an early flagship preview with stronger agentic coding and world-knowledge claims. Early testers report strong first-pass results, but the Max line remains closed rather than open-sourced.

5 min read
Qwen launches Qwen3.6-Max-Preview on Qwen Chat with AA Index 52
Qwen launches Qwen3.6-Max-Preview on Qwen Chat with AA Index 52

TL;DR

  • Alibaba_Qwen's launch post introduced Qwen3.6-Max-Preview as an early preview of Qwen's next flagship, with claims of better agentic coding, stronger world knowledge, and higher real-world agent reliability.
  • According to AiBattle_'s Qwen Chat screenshot, the preview is already live on Qwen Chat with a 262,144 token context window, 65,536 token summary limit, and no Search or Code Interpreter support yet.
  • In Alibaba_Qwen's benchmark chart, Qwen3.6-Max-Preview led Qwen's posted numbers on NL2Repo, Terminal-Bench 2.0, SkillsBench, ToolcallFormatIFBench, QwenClawBench, and QwenChineseBench, while bridgemindai's close-up highlighted a narrow 57.3 versus 57.1 edge over Claude Opus 4.5 on SWE-Bench Pro.
  • Early hands-on posts from teortaxesTex's first-impressions thread and teortaxesTex's AIME test described the model as stronger than DeepSeek-Expert on first pass and more stable across longer reasoning runs.
  • AiBattle_ also noted that the Max line is still closed, not open-sourced, which keeps Qwen's flagship tier separate from the company's open model releases.

You can jump straight to the official benchmark card, inspect the live Qwen Chat model page, and compare that polished launch framing with an AIME-2026 solve after roughly 30 minutes of thinking. There is also a small but useful caveat in the Qwen Chat screenshot: this preview build ships without Search or Code Interpreter, even though the launch post leans hard on agentic coding.

What shipped

Qwen framed this as an early preview of its next flagship, not a finished general release. The launch post says the model improves over Qwen3.6-Plus on agentic coding, instruction following, world knowledge, and real-world reliability, while the live Qwen Chat card calls it the most advanced text model in the Qwen3.6 family.

The live product page adds the concrete rollout details missing from the announcement:

Benchmark spread

Qwen's launch image is doing most of the work here. It shows the preview model ahead of Qwen3.6-Plus on every chart in the graphic, but the cross-vendor picture is mixed rather than clean-sweep.

From Alibaba_Qwen's chart, Qwen3.6-Max-Preview posted these top-line results:

  • SuperGPQA: 73.9
  • AA-Omniscience Index: 10.0, behind Claude 4.5 Opus at 13.0
  • QwenChineseBench: 84.0
  • QwenClawBench: 59.0
  • SkillsBench: 55.6
  • ToolcallFormatIFBench: 86.1
  • NL2Repo: 42.9
  • Terminal-Bench 2.0: 65.4

The same chart also shows where competitors still led Qwen's own comparison set:

  • GDPval-AA: GLM 5.1 at 52.0, versus Qwen3.6-Max-Preview at 47.0, according to the launch chart
  • QwenWebBench: GLM 5.1 at 1558 Elo, versus Qwen3.6-Max-Preview at 1528, according to the launch chart OCR
  • SciCode: Claude 4.5 Opus at 49.5, versus Qwen3.6-Max-Preview at 47.8, according to the launch chart OCR
  • SWE-Bench Pro: GLM 5.1 at 58.4, while bridgemindai's crop showed Qwen3.6-Max-Preview at 57.3 and Claude Opus 4.5 at 57.1

That makes the launch look less like a universal benchmark takeover and more like a coding-and-agents push with a few carefully chosen wins.

Early hands-on

The first useful outside signal was not a benchmark spreadsheet. It was that teortaxesTex's early test called the model about on par with, or slightly stronger than, DeepSeek-Expert, then followed up in a later reply by saying it was "a damn strong model" despite usually disliking Qwen releases.

teortaxesTex's AIME post added a more specific datapoint: Qwen3.6-Max-Preview reportedly solved AIME-2026 problem 15 on its first try after about 30 minutes of thinking. The same post said other tests pointed in the same direction and described the preview as "more baked than DeepSeek-Expert."

There is also a more speculative size read in an earlier teortaxesTex post, which guessed the model "smells like" a 1 to 2 trillion parameter system from knowledge-benchmark behavior and Alibaba's training capacity. That is an attributed hunch, not a disclosed model spec.

Preview limits

The strongest caveat is sitting on the model card itself. According to the Qwen Chat screenshot, this preview does not currently support Search or Code Interpreter tools.

That matters because Qwen's headline claims lean on agentic coding and real-world tool reliability, while the publicly exposed chat surface is still missing two of the most obvious tool hooks. The launch tweet also says more Qwen3.6 models are coming, which leaves room for the current chat build to be an incomplete slice of the full family, per Alibaba_Qwen's announcement.

Closed Max line

One of the more revealing reactions came from AiBattle_, who pointed out that the preview was live immediately and also complained that Qwen still does not open-source its Max models. That fits the product split visible here: Qwen's flagship chat tier is advancing quickly, but the Max branch remains proprietary even as the broader Qwen brand is strongly associated with open releases.

A separate reaction in teortaxesTex's screenshot of Victor Taelin's post captured the mood from the other side of that divide. After testing Qwen 3.6 on a B200, Taelin joked that local models were starting to feel hopeless. Hyperbole aside, it is a reminder that the excitement around this preview is tied to hosted frontier performance, not to a downloadable model card or open weights.

Further reading

Discussion across the web

Where this story is being discussed, in original context.

On X· 2 threads
Early hands-on2 posts
Closed Max line1 post