releaseMay 30, 2026

Step 3.7 Flash opens 30-day free access for Hermes users via Nous Portal

A day after launch, Nous made Step 3.7 Flash free for 30 days to Hermes users through Nous Portal. The access window landed alongside fresh vLLM/NIM and MLX-VLM support, making the model easier to test in both local and production stacks.

3 min read

Step 3.7 Flash opens 30-day free access for Hermes users via Nous Portal

TL;DR

NousResearch said Step 3.7 Flash is free for 30 days through Nous Portal, aimed at Hermes Agent users testing an MoE vision-language model for coding, search, and multimodal agent work.
Teknium added that the Portal access comes with a longer-term availability guarantee than the usual short-lived free model drops.
Day-one distribution was already broad: OpenRouter put the model on its router, modal offered hosted support, and vllm_project shipped vLLM serving with quantized weights.
The runtime story is unusually complete for a small launch, because vllm_project tied Step 3.7 Flash to NVIDIA NIM and DGX Station deployment while TheZachMueller surfaced immediate MLX-VLM support for Apple hardware.

You can grab the free window in the Nous Portal, check the hosted SKU on OpenRouter, and see the NVIDIA-side serving path on the DGX Station vLLM page. The interesting bit is how many inference surfaces were ready inside 24 hours: OpenRouter for quick trials, Modal for hosted runs, vLLM and NIM for production stacks, and MLX-VLM for local Apple setups.

Nous Portal access

Nous pitched the offer as a 30-day free run for Hermes Agent users, one day after Step 3.7 Flash launched. The post described the model as a new MoE vision-language model tuned for agent efficiency, coding, search, and multimodal workflows.

Teknium's follow-up matters because it narrows the practical question engineers always ask first: how long will the free tier stick around. He wrote that this one comes with a relatively long-term guarantee, rather than disappearing on short notice.

What shipped on day one

By launch day, Step 3.7 Flash was already spread across several common test and deployment surfaces:

OpenRouter described it as a multimodal MoE for image, video, and text, with 11B active parameters out of roughly 196B total, plus selectable reasoning levels.
modal advertised day-zero support with 198B total parameters, 11B active, a 256K context window, and three reasoning levels.
kilocode said the open-weight model was live in Kilo at roughly 400 tok/s.
vllm_project said vLLM support shipped immediately with FP8 and NVFP4 quantized weights, native tool calling, reasoning parsing, and built-in MTP speculative decoding.

The numbers vary slightly by post, 196B versus 198B total parameters, but the distribution story is clear: this was not a single-endpoint release.

Serving stack

The strongest infrastructure signal came from the vLLM side. Its launch note tied the model to long-context repo and document work, native image plus text input, and ready-to-serve quantized weights.

A day later, vLLM connected that same model to NVIDIA's DGX Station stack and NIM containers. That gives Step 3.7 Flash a cleaner path from quick benchmarking to local workstation runs and production serving than most open-weight releases get in their first 48 hours.

Apple and local runtimes

The local story was not just CUDA. TheZachMueller boosted Ivan Fioravanti's MLX-VLM update adding Step 3.7 Flash conversion, vision support, and text support, which puts the model into the Apple-side inference toolchain immediately.

That lands next to the hosted and datacenter options above: OpenRouter for API access, Modal for managed serving, vLLM and NIM for production infra, and MLX-VLM for local experimentation on Apple hardware. For a low-key open-weight release, Christmas came early for inference stack nerds.

TL;DR

Nous Portal access

What shipped on day one

Serving stack

Apple and local runtimes

Discussion across the web