releaseMarch 23, 2026

Miles adds ROCm support on AMD Instinct and raises AIME to 0.729

Miles added ROCm support for AMD Instinct clusters and reported GRPO post-training gains on Qwen3-30B-A3B, including AIME rising from 0.665 to 0.729. It matters if you are evaluating rollout-heavy RL jobs off NVIDIA and want concrete throughput and step-time numbers before porting.

LLM Serving Inference Optimization GPU Infrastructure Reinforcement Learning

3 min read

Miles adds ROCm support on AMD Instinct and raises AIME to 0.729

TL;DR

LMSYS and AMD say Miles now supports ROCm on MI300/350-class Instinct clusters, bringing its end-to-end RL post-training stack to non-NVIDIA hardware through the open-source Miles repo and the accompanying ROCm blog post.
The headline training result is a GRPO run on Qwen3-30B-A3B where AIME improved from 0.665 to 0.729, according to LMSYS's launch thread.
The rollout-side numbers are concrete enough to matter for cluster planning: LMSYS's performance thread reports roughly 1.1-1.3k tokens per GPU per second on MI300X and a mean step time of 388.5 seconds on one 8-GPU node.
The stack is positioned as more than a single benchmark run: the blog summary describes separated rollout and training components, prebuilt Docker images for MI300X and MI350X/355X, and validation for multi-turn agentic training, while LMSYS's serving note also points to broader multi-silicon SGLang work.

What shipped for AMD clusters

LMSYS Org

@lmsysorg

·Follow

🚀 New blog: ROCm Support for Miles: Large-Scale RL Post-Training on AMD Instinct™ GPUs Together with @AMD, Miles brings end-to-end RL pipelines to MI300/350-class clusters: ⚡️ Rollout generation dominates RL compute, and AMD’s HBM bandwidth directly addresses this bottleneck Show more

6:25 PM · Mar 23, 2026

Read 3 replies

Miles has added ROCm support for large-scale RL post-training on AMD Instinct systems, with LMSYS describing it as an end-to-end pipeline for MI300- and MI350-class clusters in the blog post. The release matters because Miles is not just a trainer: the [img:0|architecture diagram] in LMSYS's thread shows rollout generation and policy optimization split across separate components, coordinated by a scheduler and tied together with Megatron and SGLang.

The implementation details are practical. LMSYS's repo announcement says Miles is open-sourced via the Miles GitHub repo, and the blog summary says deployment is packaged through prebuilt Docker containers for MI300X and MI350X/355X, with ROCm validated end to end. That framing fits the project's pitch that rollout generation "dominates RL compute" on these jobs, making AMD's HBM bandwidth the hardware angle behind the port rather than a generic accelerator expansion launch thread.

What performance and training gains were reported

LMSYS Org

@lmsysorg

·Follow

Replying to @lmsysorg

Full blog link: lmsys.org/blog/2026-03-1…

6:25 PM · Mar 23, 2026

🧾 More sources

TL;DR1 tweets

Top-line summary of the ROCm launch, benchmark deltas, and deployment implications.

What shipped for AMD clusters1 tweets

Covers the new ROCm support, architecture, packaging, and open-source availability.