releaseJune 30, 2026

Hermes Agent updates web extraction with 60x faster reads and 49x lower cost

Nous updated Hermes Agent web extraction to skip the old summarizer loop, pass cleaner content directly to the model, and page large documents on demand. The change is claimed to cut read latency by up to 60x and cost by 49x, so teams should compare output quality before adopting it.

3 min read

Hermes Agent updates web extraction with 60x faster reads and 49x lower cost

TL;DR

Hermes Agent says its web extraction path is now up to 60x faster and 49x cheaper, according to NousResearch's launch post and Teknium's rollout thread.
The main architectural change is that Hermes removed the in-loop summarizer LLM, which Teknium's reply on the summarizer and Teknium's later explanation both describe as the source of the speed and cost drop.
Large pages are no longer squeezed through a summary pass first. Instead, Teknium's later explanation says Hermes stores the markdown as a file, truncates the middle for the initial pass, and hands the model a filepath plus line boundaries for follow-up reads.
Nous and Teknium both say the optimization is backend-agnostic. Teknium's backend reply and Teknium's clarification frame it as an automatic upgrade for any web extraction backend inside Hermes.

You can inspect the implementation PR, check the GitHub release notes Teknium pointed people to, and watch NousResearch's demo post for the before-and-after speed claim. The most interesting detail sits in Teknium's architecture reply, which says Hermes now gives the model the whole output path back instead of an LLM-written summary, with line-number boundaries for paging deeper into a document.

Summarizer removal

The old bottleneck appears to have been a second model pass that summarized scraped content before the agent could use it. In Teknium's quality reply, Teknium said Hermes removed that summarizer and saw quality hold or improve.

That matters because the upgrade is not a new scraper. Teknium's clarification and Teknium's backend clarification both say the change lives in Hermes' extraction flow, not in a replacement backend.

On-demand paging

Teknium's most concrete description breaks the new path into three steps:

Hermes takes the backend's markdown output directly, per Teknium's later explanation.
It truncates the middle for the model's first pass, according to Teknium's later explanation.
It saves the full markdown locally and returns a filepath plus line-number boundaries so the agent can page back into the source text on demand, again per Teknium's later explanation.

That is a cleaner explanation of NousResearch's launch claim that large pages are saved locally and paged on demand NousResearch's launch post.

Backend compatibility

Teknium repeated the same compatibility point across the thread: Teknium's works-with-any reply, Teknium's any-backend reply, and Teknium's keep-any-backend reply all say the optimization works with any existing web extraction backend and activates automatically after updating.

A few replies narrow the boundary. In Teknium's browser-tool distinction, Teknium said a browser tool is different from Hermes' web scraper tool, while Teknium's Firecrawl note says some backend-specific behavior still depends on the scraper itself.

Rescraping behavior

The new flow does not appear to add a caching layer for previously visited URLs. When asked directly, Teknium's rescrape reply said Hermes will rescrape every URL the agent wants to see each time.

That makes the launch's cost win more specific: the savings come from removing summarization overhead and changing how long documents are handed to the model, not from reusing prior fetches. Teknium's auto-activation reply and Teknium's yes-it-works-with-any reply also suggest the behavior ships automatically once users update.

TL;DR

Summarizer removal

On-demand paging

Backend compatibility

Rescraping behavior

Discussion across the web