Miles added ROCm support for AMD Instinct clusters and reported GRPO post-training gains on Qwen3-30B-A3B, including AIME rising from 0.665 to 0.729. It matters if you are evaluating rollout-heavy RL jobs off NVIDIA and want concrete throughput and step-time numbers before porting.

Miles has added ROCm support for large-scale RL post-training on AMD Instinct systems, with LMSYS describing it as an end-to-end pipeline for MI300- and MI350-class clusters in the blog post. The release matters because Miles is not just a trainer: the [img:0|architecture diagram] in LMSYS's thread shows rollout generation and policy optimization split across separate components, coordinated by a scheduler and tied together with Megatron and SGLang.
The implementation details are practical. LMSYS's repo announcement says Miles is open-sourced via the Miles GitHub repo, and the blog summary says deployment is packaged through prebuilt Docker containers for MI300X and MI350X/355X, with ROCm validated end to end. That framing fits the project's pitch that rollout generation "dominates RL compute" on these jobs, making AMD's HBM bandwidth the hardware angle behind the port rather than a generic accelerator expansion launch thread.
The main reported quality gain is on Qwen3-30B-A3B with GRPO, where LMSYS's results thread says AIME rose from 0.665 to 0.729 during training. On the systems side, the same thread reports MI300X rollout throughput of about 1.1-1.3k tok/GPU/s and a mean step time of 388.5 seconds on a single 8-GPU node using 32x8 sampling with an 8k response cap.
That makes this more of a reproducible infrastructure datapoint than a vague hardware claim. The blog summary says Miles also validated multi-turn agentic training on ROCm, and LMSYS's Trainium and Inferentia post places the release in a wider push to run the same serving and rollout stack across AMD GPUs and AWS Trainium/Inferentia rather than keeping SGLang tied to one silicon path.
Full blog link: lmsys.org/blog/2026-03-1…
Excited to see inference moving beyond GPUs, with @YottaLabs and @radixark bringing SGLang to AWS Trainium & Inferentia, pushing toward a truly multi-silicon serving future 🔥 Check out the blog here👇 Show more
The real shift in AI infra isn’t new models. It’s inference moving beyond GPUs. New deep dive: SGLang on @awscloud Trainium + Inferentia Built with @sgl_project and @radixark yottalabs.ai/post/mini-sgla…