Miles adds ROCm support on AMD Instinct and raises AIME to 0.729
Miles added ROCm support for AMD Instinct clusters and reported GRPO post-training gains on Qwen3-30B-A3B, including AIME rising from 0.665 to 0.729. It matters if you are evaluating rollout-heavy RL jobs off NVIDIA and want concrete throughput and step-time numbers before porting.

TL;DR
- LMSYS and AMD say Miles now supports ROCm on MI300/350-class Instinct clusters, bringing its end-to-end RL post-training stack to non-NVIDIA hardware through the open-source Miles repo and the accompanying ROCm blog post.
- The headline training result is a GRPO run on Qwen3-30B-A3B where AIME improved from 0.665 to 0.729, according to LMSYS's launch thread.
- The rollout-side numbers are concrete enough to matter for cluster planning: LMSYS's performance thread reports roughly 1.1-1.3k tokens per GPU per second on MI300X and a mean step time of 388.5 seconds on one 8-GPU node.
- The stack is positioned as more than a single benchmark run: the blog summary describes separated rollout and training components, prebuilt Docker images for MI300X and MI350X/355X, and validation for multi-turn agentic training, while LMSYS's serving note also points to broader multi-silicon SGLang work.
What shipped for AMD clusters
Miles has added ROCm support for large-scale RL post-training on AMD Instinct systems, with LMSYS describing it as an end-to-end pipeline for MI300- and MI350-class clusters in the blog post. The release matters because Miles is not just a trainer: the [img:0|architecture diagram] in LMSYS's thread shows rollout generation and policy optimization split across separate components, coordinated by a scheduler and tied together with Megatron and SGLang.
The implementation details are practical. LMSYS's repo announcement says Miles is open-sourced via the Miles GitHub repo, and the blog summary says deployment is packaged through prebuilt Docker containers for MI300X and MI350X/355X, with ROCm validated end to end. That framing fits the project's pitch that rollout generation "dominates RL compute" on these jobs, making AMD's HBM bandwidth the hardware angle behind the port rather than a generic accelerator expansion launch thread.
What performance and training gains were reported
The main reported quality gain is on Qwen3-30B-A3B with GRPO, where LMSYS's results thread says AIME rose from 0.665 to 0.729 during training. On the systems side, the same thread reports MI300X rollout throughput of about 1.1-1.3k tok/GPU/s and a mean step time of 388.5 seconds on a single 8-GPU node using 32x8 sampling with an 8k response cap.
That makes this more of a reproducible infrastructure datapoint than a vague hardware claim. The blog summary says Miles also validated multi-turn agentic training on ROCm, and LMSYS's Trainium and Inferentia post places the release in a wider push to run the same serving and rollout stack across AMD GPUs and AWS Trainium/Inferentia rather than keeping SGLang tied to one silicon path.