Skip to content
AI Primer
release

Tencent launches Hy3 preview with 295B/21B, 256K context, and day-one OpenRouter, vLLM, and SGLang support

Tencent open-sourced Hy3 preview, a 295B MoE with 21B active parameters and 256K context, then pushed it into OpenRouter, OpenCode, OpenClaw, vLLM, and SGLang immediately. That matters because engineers can test and deploy a new reasoning-agent model on day one instead of waiting for the runtime ecosystem to catch up.

5 min read
Tencent launches Hy3 preview with 295B/21B, 256K context, and day-one OpenRouter, vLLM, and SGLang support
Tencent launches Hy3 preview with 295B/21B, 256K context, and day-one OpenRouter, vLLM, and SGLang support

TL;DR

You can jump straight to the Hugging Face model page, the OpenRouter listing, the vLLM deployment page, and OpenClaw's Tencent provider docs. The interesting bit is how much of the stack was ready on arrival: lmsysorg's SGLang post exposed parser flags and speculative decoding settings, while vllm_project's documentation screenshot showed a 3.8B MTP layer and a BF16 deployment footprint of 708 GB.

What shipped

Tencent is framing Hy3 preview as a reasoning and agent model first, not just another open weight checkpoint. The launch post calls it a 295B A21B model, and the attached chart splits claims across reasoning, long-context retrieval, and agent benchmarks.

The package described across the launch materials includes:

Benchmarks

Tencent's own chart makes two things clear. Hy3 preview is being sold as a size-efficient model that stays competitive on agent work, and the company is comfortable showing it below top frontier models rather than pretending it wins every column.

The headline scores visible in the launch chart are:

  • Tsinghua Math PhD Qual: 86.4
  • FrontierScience Olympiad: 70.0
  • IMO Answer Bench: 84.3
  • CL-bench: 22.8
  • AA-LCR: 66.3
  • SWE-Bench Verified: 74.4
  • Terminal-Bench 2.0: 54.4
  • Hy-Backend: 54.7
  • WideSearch: 70.2
  • WildClawBench, text-only: 45.3

vllm_project added a more useful read on the same table: coding and agents are the biggest jumps relative to prior Hy releases. That lines up with Tencent's benchmark mix, which spends more real estate on agent and coding tasks than on generic chatbot evals.

Day-one runtimes

The runtime story is the real launch. Hy3 preview showed up on the two serving stacks most open model teams actually reach for, and both announcements included Hy-specific plumbing instead of a vague compatibility claim.

In SGLang, lmsysorg's command screenshot shows:

  • --reasoning-parser hunyuan
  • --tool-call-parser hunyuan
  • --speculative-algorithm EAGLE
  • --speculative-num-steps 3
  • --speculative-num-draft-tokens 4

In vLLM, vllm_project's documentation screenshot shows a different serving profile:

  • hy_v3 tool and reasoning parsers
  • MTP speculative decoding instead of EAGLE
  • VLLM 0.20.1+
  • BF16 deployment footprint of 708 GB on the selected H200 8x141G setup
  • Hardware matrix entries for H100, H200, B200, GB200, AMD MI300X, MI325X, and MI355X

Those screenshots matter more than the launch adjectives. They show Hy3 arriving with framework-specific parser support, speculative decoding hooks, and concrete hardware assumptions on day one.

Where it shows up

Hy3 preview spread across inference and agent surfaces within hours of the open source release.

The day-one rollout visible in the evidence includes:

OpenClaw's Tencent provider docs make the rollout slightly more interesting than a logo parade. The integration is not just model availability, it is provider-level packaging through TokenHub with onboarding and pricing metadata already attached.

Early rough edges

Not all of the first-run feedback was flattering. In an early OpenRouter test, teortaxesTex said the model was "very fast for 21B" at roughly 160 tokens per second, but also posted a Russian output example with punctuation and phrasing errors severe enough to call it "braindead" for that case.

That leaves a more specific first impression than the benchmark chart does: Hy3 preview landed with strong deployment coverage and aggressive free access, but at least one public multilingual test immediately found brittle behavior that the headline launch posts did not mention.

Further reading

Discussion across the web

Where this story is being discussed, in original context.

On X· 2 threads
TL;DR1 post
Where it shows up4 posts