releaseApril 23, 2026

Tencent launches Hy3 preview with 295B/21B, 256K context, and day-one OpenRouter, vLLM, and SGLang support

Tencent open-sourced Hy3 preview, a 295B MoE with 21B active parameters and 256K context, then pushed it into OpenRouter, OpenCode, OpenClaw, vLLM, and SGLang immediately. That matters because engineers can test and deploy a new reasoning-agent model on day one instead of waiting for the runtime ecosystem to catch up.

5 min read

Tencent launches Hy3 preview with 295B/21B, 256K context, and day-one OpenRouter, vLLM, and SGLang support

TL;DR

TencentHunyuan's launch post introduced Hy3 preview as an open source 295B MoE with 21B active parameters, and both vllm_project's rollout post and OpenRouter's listing matched the same 256K context window.
The launch pitch is centered on reasoning, long context, and agent work: TencentHunyuan's benchmark chart put Hy3 preview at 74.4 on SWE-Bench Verified and 54.4 on Terminal-Bench 2.0, while vllm_project's rollout post called coding and agents the biggest jumps in this release.
Runtime support shipped immediately, because vllm_project's rollout post announced day-0 vLLM support and lmsysorg's SGLang post announced day-0 SGLang support with Hy-specific reasoning and tool parsers.
Access also landed fast across inference and agent surfaces: OpenRouter's listing made the model free through May 8, while opencode's integration post and openclaw's provider update pushed it into OpenCode and OpenClaw the same day.

You can jump straight to the Hugging Face model page, the OpenRouter listing, the vLLM deployment page, and OpenClaw's Tencent provider docs. The interesting bit is how much of the stack was ready on arrival: lmsysorg's SGLang post exposed parser flags and speculative decoding settings, while vllm_project's documentation screenshot showed a 3.8B MTP layer and a BF16 deployment footprint of 708 GB.

What shipped

Tencent is framing Hy3 preview as a reasoning and agent model first, not just another open weight checkpoint. The launch post calls it a 295B A21B model, and the attached chart splits claims across reasoning, long-context retrieval, and agent benchmarks.

The package described across the launch materials includes:

295B total parameters, 21B active parameters, per TencentHunyuan's launch post
256K context, per vllm_project's rollout post
Controllable reasoning effort, per OpenRouter's listing
Tool calling and reasoning parsers in serving stacks, per lmsysorg's SGLang post and vllm_project's documentation screenshot
Open source weights on Hugging Face, linked in TencentHunyuan's launch post

Benchmarks

Tencent's own chart makes two things clear. Hy3 preview is being sold as a size-efficient model that stays competitive on agent work, and the company is comfortable showing it below top frontier models rather than pretending it wins every column.

The headline scores visible in the launch chart are:

Tsinghua Math PhD Qual: 86.4
FrontierScience Olympiad: 70.0
IMO Answer Bench: 84.3
CL-bench: 22.8
AA-LCR: 66.3
SWE-Bench Verified: 74.4
Terminal-Bench 2.0: 54.4
Hy-Backend: 54.7
WideSearch: 70.2
WildClawBench, text-only: 45.3

vllm_project added a more useful read on the same table: coding and agents are the biggest jumps relative to prior Hy releases. That lines up with Tencent's benchmark mix, which spends more real estate on agent and coding tasks than on generic chatbot evals.

Day-one runtimes

The runtime story is the real launch. Hy3 preview showed up on the two serving stacks most open model teams actually reach for, and both announcements included Hy-specific plumbing instead of a vague compatibility claim.

In SGLang, lmsysorg's command screenshot shows:

--reasoning-parser hunyuan
--tool-call-parser hunyuan
--speculative-algorithm EAGLE
--speculative-num-steps 3
--speculative-num-draft-tokens 4

In vLLM, vllm_project's documentation screenshot shows a different serving profile:

hy_v3 tool and reasoning parsers
MTP speculative decoding instead of EAGLE
VLLM 0.20.1+
BF16 deployment footprint of 708 GB on the selected H200 8x141G setup
Hardware matrix entries for H100, H200, B200, GB200, AMD MI300X, MI325X, and MI355X

Those screenshots matter more than the launch adjectives. They show Hy3 arriving with framework-specific parser support, speculative decoding hooks, and concrete hardware assumptions on day one.

Where it shows up

Hy3 preview spread across inference and agent surfaces within hours of the open source release.

The day-one rollout visible in the evidence includes:

OpenRouter, free through May 8, per OpenRouter's listing and OpenRouter's free access post
OpenCode, free for two weeks, per opencode's integration post and TencentHunyuan's OpenCode repost
OpenClaw, via Tencent Cloud in TokenHub with onboarding and pricing metadata, per openclaw's provider update and TencentHunyuan's OpenClaw post
vLLM day-0 serving support, per vllm_project's rollout post
SGLang day-0 serving support, per lmsysorg's SGLang post

OpenClaw's Tencent provider docs make the rollout slightly more interesting than a logo parade. The integration is not just model availability, it is provider-level packaging through TokenHub with onboarding and pricing metadata already attached.

Early rough edges

Not all of the first-run feedback was flattering. In an early OpenRouter test, teortaxesTex said the model was "very fast for 21B" at roughly 160 tokens per second, but also posted a Russian output example with punctuation and phrasing errors severe enough to call it "braindead" for that case.

That leaves a more specific first impression than the benchmark chart does: Hy3 preview landed with strong deployment coverage and aggressive free access, but at least one public multilingual test immediately found brittle behavior that the headline launch posts did not mention.