Skip to content
AI Primer
release

Meituan releases LongCat 2.0: 1.6T MoE on domestic chips

Meituan disclosed LongCat 2.0, a 1.6T-parameter MoE with about 48B active parameters, 1M context, and 35T training tokens on domestic hardware. The release ties a near-frontier open model to a Chinese domestic compute stack and a custom sparse-attention design.

6 min read
Meituan releases LongCat 2.0: 1.6T MoE on domestic chips
Meituan releases LongCat 2.0: 1.6T MoE on domestic chips

TL;DR

  • Meituan has open sourced LongCat-2.0, a 1.6T-parameter MoE with about 48B active parameters per token, 1M context, and more than 35T training tokens, according to eliebakouch's launch post and the official announcement.
  • The release also appears to unmask OpenRouter's stealth model Owl Alpha, which Rohan Paul's report described as a fast-growing anonymous agent model with 10.1T monthly tokens and top-three usage across Hermes Agent, Claude Code, and OpenClaw.
  • Meituan's most concrete technical claim is the training stack: the official post says the model was trained and deployed entirely on AI ASIC superpods, while teortaxesTex's hardware thread pulled out details like sub-80 GB device memory and built-in 200 Gbps networking.
  • LongCat-2.0's long-context story centers on a custom sparse-attention scheme that eliebakouch's breakdown describes as block-level indexing plus token-level refinement, layered with sliding-window and sink-token ideas borrowed from earlier sparse designs.
  • One launch-day caveat stayed unresolved: after clicking Try It, teortaxesTex's hands-on check said the served model looked more like a small Qwen variant than the new flagship, and no official clarification appears in the launch materials.

You can read the official launch post, skim the LocalLLaMA thread, and inspect the benchmark table, superpod notes, and sparse-attention diagrams directly in the launch screenshots. The oddest reveal is that a model this large may have spent nearly two months in the wild under a fake name, with Rohan Paul's post and the Hermes usage screenshot tying Owl Alpha to very real agent traffic before Meituan put its name on it.

Owl Alpha

r/LocalLLaMA

Introducing LongCat-2.0 - , a large-scale MoE language model with 1.6 trillion total parameters and ~48 billion activated per token. This was the stealth model that was on Openrouter under the name 'owl-alpha'.

0 comments

According to Rohan Paul's report, Owl Alpha was LongCat-2.0-Preview all along: a 1.6T MoE with a dynamic active range of roughly 33B to 56B, native 1M context, and unusually high agent traffic for an unnamed model.

The usage numbers are the part engineers will remember. Paul's post cites 10.1T monthly tokens, 559B daily tokens, and 242% monthly growth, while the Hermes chart screenshot shows Owl Alpha at 7.11T monthly tokens and first place in Hermes Agent usage.

That would make Meituan's quiet preview one of the biggest stealth rollouts of the year. The official announcement never mentions Owl Alpha by name, so the identity link is still coming from community reporting rather than the vendor.

Benchmarks and harnesses

The benchmark framing is agent-first. In the launch screenshots attached to eliebakouch's post, Meituan compares LongCat-2.0 with Gemini 3.1 Pro, GPT-5.5, and Claude Opus variants across Terminal-Bench 2.1, SWE-bench Pro, SWE-bench Multilingual, FORTE, RWSearch, BrowseComp, IFEval, Writing Bench, IMO-AnswerBench, and GPQA-diamond.

Two details matter more than the bars:

  • The code-agent evals were run through Claude Code, according to the benchmark notes visible in the evaluation screenshot.
  • FORTE is defined there as an office-task agent benchmark that supports OpenClaw, Hermes, and Claude Code.
  • RWSearch is described there as an in-house search-agent benchmark using only basic Search and Browse tools.
  • The launch post says LongCat-2.0 is already integrated with Claude Code, OpenClaw, and Hermes, which lines up with the usage evidence in Rohan Paul's report.

This is a model announcement wrapped around harness performance, not just base-model scores. Even Meituan's foundational eval table sits below the agent benchmarks in the launch materials.

LongCat Sparse Attention

Meituan's custom long-context mechanism, LongCat Sparse Attention, is pitched as the reason the model can train on hundreds of billions of 1M-context tokens, per the official announcement.

The architecture diagram names three pieces:

  1. Streaming-aware Indexing: reshapes token selection so memory access becomes more sequential.
  2. Cross-Layer Indexing: reuses one indexing pass across several adjacent layers.
  3. Hierarchical Indexing: does a block-level recall step before token-level selection.

The same diagram also shows the serving budget split, roughly half non-contiguous KV and half contiguous KV.

The best outside summary came from eliebakouch's thread, which maps LSA back to earlier sparse-attention families:

  • Top-k indexing from DeepSeek Sparse Attention.
  • Shared indices across layers, similar to GLM-style index sharing.
  • Block-level top-k followed by token-level top-k.
  • A sliding window and sink-token path layered on top.

That makes LSA look less like a brand-new primitive and more like a very aggressive composition of existing ones.

AI ASIC superpods

The loudest claim in this launch is not the parameter count. It is that Meituan says both training and deployment ran entirely on AI ASIC superpods, with more than 50,000 chips and no Nvidia GPUs in the loop, according to the launch post screenshot and the official announcement.

The infrastructure notes surfaced in teortaxesTex's thread and the launch screenshots are unusually specific:

  • Per-device memory is lower than an H800's 80 GB.
  • Each physical superpod scales to 48 machines.
  • The accelerator includes a built-in 200 Gbps network adapter.
  • Meituan says the setup delivered more than 35% training throughput improvement over a naive implementation.
  • The training stack adds 6D parallelism, including EMBP for N-gram embeddings.
  • Memory work includes ZeRO-1, selective recomputation, allocator-level offloading, and routing padding tokens to a zero-expert.

teortaxesTex's follow-up speculated that the hardware profile may match Huawei's 910C-era systems, but that remains community inference, not a disclosed bill of materials.

Preview quality and endpoint confusion

Launch-day hands-on feedback cut against the benchmark story. After clicking the public Try It endpoint, teortaxesTex's test said the model behaved more like Qwen-4B than a near-frontier 1.6T MoE and failed simple Russian prompts.

That criticism matters because the launch materials and the Owl Alpha reporting point in opposite directions on product readiness. Rohan Paul's report describes a preview model already handling massive agent traffic, while the public endpoint check suggests the visible demo may not have been serving that model at all.

No line in the official announcement resolves the mismatch between the benchmarked model, the preview that allegedly ran on OpenRouter, and the model exposed through the launch-day chat interface.

Further reading

Discussion across the web

Where this story is being discussed, in original context.

On X· 3 threads
TL;DR1 post
Owl Alpha1 post
AI ASIC superpods1 post
Share on X