releaseJune 26, 2026

Hermes Agent introduces Mixture of Agents 2.0 as virtual models across providers

Hermes Agent launched Mixture of Agents 2.0, letting users combine models from different providers into presets that behave like a normal model inside the agent loop. It matters because multi-model orchestration becomes a reusable runtime primitive instead of a custom routing workflow.

4 min read

Hermes Agent introduces Mixture of Agents 2.0 as virtual models across providers

TL;DR

Hermes Agent shipped Mixture of Agents 2.0 as a reusable "virtual model," so a multi-model preset can be selected inside Hermes the same way you would pick a normal model, according to Teknium's launch post.
The preset can mix models from different providers and is not capped at two models, with Teknium's combinations reply saying users could stack multiple copies of the same model alongside others.
The runtime is parallel, not routed turn by turn, which means latency is bounded by the slowest model in the preset, per Teknium's latency reply and Teknium's follow-up on the slowest link.
Nous Research framed the feature as a way to reach beyond individually available frontier models, while Nous Research's launch thread claimed an Opus plus GPT mixture scored 8% above Opus 4.8 and 11% above GPT 5.5 on an upcoming HermesBench.

You can browse the setup docs, watch Nous Research's product demo, and dig into Teknium's explanation of the aggregator and reference-model split. There is also a nice weird detail in Teknium's reply about thought visibility: Hermes lets users inspect what the mixture models were thinking.

Virtual models

The important shift is not just "multi-model orchestration exists." Hermes already had an MoA tool. Teknium's revamp reply says the change was turning that tool into a virtual model that can sit directly inside the agent loop.

That makes the mixture reusable. Instead of assembling a routing workflow for each task, users define a preset once and call it like a model name. Teknium's virtual-model explanation also distinguishes it from OpenRouter-style fusion by saying Hermes runs the mixture agentically and across turns.

Parallel execution

Teknium described the execution pattern pretty plainly: one model acts as the main aggregator, while the reference models attempt the task independently and in parallel, then the main model synthesizes the result and acts.

Because every model in the preset answers on every turn, cost and latency scale with the full mixture. Teknium's full-mixture reply says "every turn the full mixture is called," while Teknium's parallel-turn reply confirms each model has to answer in parallel for the system to work.

Two mechanics stand out:

The slowest model sets completion time, according to Teknium's latency reply and Teknium's slowest-link warning.
Users choose both the aggregator and the reference models, with Teknium's aggregator reply describing the role split and Teknium's preset-selection reply noting Hermes does not auto-pick providers or price points for you.

Benchmarks and scope

Nous Research's headline claim is strong, but still provisional. Nous Research's launch thread said MoA presets scored 8% above Opus 4.8 and 11% above GPT 5.5 on an upcoming benchmark, while Nous Research's leaderboard teaser and Teknium's next-week reply said the full HermesBench leaderboard was still on the way.

The more interesting follow-up is where they want to push it next. Teknium's cheaper-open-models reply and Teknium's capability-for-less reply both point to the same thesis: combine cheaper open models until they approximate frontier-model capability at a lower price.

Docs, rollout, and adjacent agent plumbing

The feature was live immediately. Teknium's live-now reply said users could test it right away, and Teknium's update reply said installing the latest Hermes update was enough to get it.

The docs link in Teknium's docs link points to a setup guide for building custom mixtures, which is the practical companion to the launch claim. Separate from MoA itself, Hermes also shipped a small but useful agent-runtime change the same week: Teknium's Kanban update post added typed block reasons plus a recurrence counter so repeated same-cause blocks escalate to a human, while dependency blocks auto-resume when the parent task completes.

TL;DR

Virtual models

Parallel execution

Benchmarks and scope

Docs, rollout, and adjacent agent plumbing

Discussion across the web