Skip to content
AI Primer
breaking

Step 3.7 Flash launches with day-one support in Kilo, Modal, SGLang, Hermes, and DesignArena

Step 3.7 Flash landed immediately across Kilo, Modal, SGLang, Hermes-linked tooling, and DesignArena as the model’s 198B MoE, 256K-context release spread through the stack. The breadth of day-one support gives engineers multiple ways to serve, benchmark, and wire the new open-weight multimodal model into agents.

4 min read
Step 3.7 Flash launches with day-one support in Kilo, Modal, SGLang, Hermes, and DesignArena
Step 3.7 Flash launches with day-one support in Kilo, Modal, SGLang, Hermes, and DesignArena

TL;DR

You can already deploy it on Modal, try the OpenRouter model page, and skim how infra partners described the same release across vLLM and SGLang. The interesting bit is how consistent the packaging was: everyone repeated the same 198B MoE, 11B-active, 256K-context shape, but each surface emphasized a different use case, from high-throughput serving to coding agents to visual UI work.

Model shape

Across the launch posts, the stable spec is a 198B sparse mixture-of-experts model with about 11B active parameters per token, plus a 256K context window. modal and vllm_project's support post both used that same parameterization, which makes the rollout read less like marketing drift and more like a coordinated infra release.

The model was also pitched as natively multimodal. lmsysorg described native multimodal perception, while OpenRouter labeled it image, video, and text capable, and modal explicitly called out image and video understanding.

One product detail kept showing up in nearly every integration post:

Benchmarks and workload fit

StepFun's launch framing leaned hard on agent efficiency, and the benchmark mix shows what that meant.

  • ClawEval-1.1: 67.1, ranked #1 according to lmsysorg
  • SimpleVQA Search: 79.2, ranked #1 according to lmsysorg
  • V*: 95.3, cited by lmsysorg for visual perception quality
  • SWE-Bench PRO: 56.3, ranked #2 according to lmsysorg

The task framing around those numbers was unusually concrete. lmsysorg tied the vision scores to turning UIs and charts into code, tied ClawEval to long-horizon tool orchestration, and tied SWE-Bench PRO to tracing repositories, isolating bugs, and shipping patches.

That same positioning carried into partner posts. kilocode called it one of the best open-weight models you can run right now, with multimodal agent behavior at 400 tok/s, while OpenRouter emphasized coding, agentic workflows, and structured outputs.

Day-one rollout surfaces

The ecosystem support landed fast enough that availability became part of the story.

For engineers, that breadth matters more than any single benchmark card. The model showed up at the model API layer, the self-hosting stack, agent tooling, and public eval surfaces at once.

Serving stack details

The most useful implementation details came from infra partners rather than from the recycled benchmark lines.

According to vllm_project, the release shipped with FP8 and NVFP4 quantized weights, built-in MTP speculative decoding, native tool calling, and reasoning parsing. That is a fairly deployment-ready bundle for a same-day open-weight launch.

Modal also published a live example endpoint through its StepFun inference example, and OpenRouter exposed a public model page immediately. Together with kilocode's weekly roundup, which grouped Step 3.7 Flash with other aggressive price-to-performance releases that week, the rollout looked designed to make the model easy to benchmark from several directions on day one.

Further reading

Discussion across the web

Where this story is being discussed, in original context.

On X· 3 threads
TL;DR2 posts
Day-one rollout surfaces4 posts
Serving stack details2 posts
Share on X