breakingJune 7, 2026

Framework Max+ 395 benchmarks close to M5 Max on Qwen3-TTS with GGML Vulkan

A local benchmark on a 128GB Framework system reported Qwen3-TTS performance close to an M5 Max using a GGML Vulkan backend. The result suggests AMD Strix hardware can approach Apple-class local TTS speed without MLX or Metal.

3 min read

Framework Max+ 395 benchmarks close to M5 Max on Qwen3-TTS with GGML Vulkan

TL;DR

In badlogicgames' benchmark post, a Framework Desktop Max+ 395 with 128GB of LPDDR5x-8000 running GGML Vulkan generated 20 seconds of German Qwen3-TTS audio at roughly the same speed as an M5 Max running GGML Metal or MLX.
The interesting part is the stack split: badlogicgames' setup note used three inference paths, MLX on macOS, GGML Metal on macOS, and GGML Vulkan cross-platform, while the Rust qwen3_tts_rs project officially supports libtorch and MLX rather than Vulkan.
According to the benchmark thread, getting there required patching GGML's 1D convolution kernels for the Qwen3-TTS decoder, which lines up with badlogic's qwen3-tts.cpp repo being a fresh C++ GGML implementation rather than a stock upstream build.
Framework's desktop specs page lists the same Max+ 395, Radeon 8060S, and 128GB LPDDR5x-8000 configuration from the test machine post, so this was not a custom workstation so much as a shipping Strix Halo box.

You can read Qwen's official Qwen3-TTS repo, browse badlogic's C++ GGML port, and check the Pibot build log that explains why the whole exercise matters: the target is a local kid-friendly robot whose server keeps speech, the LLM, and agent logic on-device.

Three inference paths

The thread is basically a mini portability report. badlogicgames tested one model, Qwen3-TTS, through three backends:

MLX through Rust C bindings, macOS only
GGML Metal, macOS only
GGML Vulkan, cross-platform

That matters because Qwen's official release is pitched around multilingual, streaming TTS with voice cloning and voice design, while qwen3_tts_rs currently documents libtorch and MLX as its supported backends. The Vulkan result came from custom work around the official stack, not from flipping on an advertised backend.

GGML Vulkan kept up after kernel tweaks

According to badlogicgames' follow-up, the Framework box only got to M5 Max territory after "massaging" GGML, because the Metal and Vulkan 1D convolution kernels were not ideal for the Qwen3-TTS decoder architecture.

That fills in the missing detail behind the benchmark table. badlogic's qwen3-tts.cpp repo runs the full pipeline in C++ with GGML, GGUF weights, voice cloning, and runtime backend selection, so the result is partly about AMD hardware and partly about how much hand-tuning the decoder path still wants.

Framework's Max+ 395 box

Framework's desktop page matches the hardware in the tweet: Ryzen AI Max+ 395, Radeon 8060S graphics, and 128GB of LPDDR5x-8000. The post's "less than half the price" line is the author's framing, but the concrete takeaway is simpler: a shipping Strix Halo system can hang with Apple's M5 Max on this local TTS workload.

The benchmark also used German generation with a reference voice, which is a reasonable stress case because Qwen3-TTS officially supports German alongside nine other major languages.

Pibot and the deployment ceiling

The benchmark was not a lab exercise. In the same thread, badlogicgames' main post says the box is meant to serve "Pipi bots" for neighborhood kids, and the linked Pibot write-up describes a local-first robot server that handles speech-to-text, text-to-speech, the LLM, and agent tools on a laptop.

Two details make the result more concrete. First, badlogicgames' reply says the Framework system was running Ubuntu 26. Second, the same thread puts a real concurrency ceiling on it: local TTS is practical for about two kids at a time, not an unlimited swarm.