Skip to content
AI Primer
release

Hermes Agent updates web extraction with 60x faster reads and 49x lower cost

Nous updated Hermes Agent web extraction to skip the old summarizer loop, pass cleaner content directly to the model, and page large documents on demand. The change is claimed to cut read latency by up to 60x and cost by 49x, so teams should compare output quality before adopting it.

3 min read
Hermes Agent updates web extraction with 60x faster reads and 49x lower cost
Hermes Agent updates web extraction with 60x faster reads and 49x lower cost

TL;DR

You can inspect the implementation PR, check the GitHub release notes Teknium pointed people to, and watch NousResearch's demo post for the before-and-after speed claim. The most interesting detail sits in Teknium's architecture reply, which says Hermes now gives the model the whole output path back instead of an LLM-written summary, with line-number boundaries for paging deeper into a document.

Summarizer removal

The old bottleneck appears to have been a second model pass that summarized scraped content before the agent could use it. In Teknium's quality reply, Teknium said Hermes removed that summarizer and saw quality hold or improve.

That matters because the upgrade is not a new scraper. Teknium's clarification and Teknium's backend clarification both say the change lives in Hermes' extraction flow, not in a replacement backend.

On-demand paging

Teknium's most concrete description breaks the new path into three steps:

  1. Hermes takes the backend's markdown output directly, per Teknium's later explanation.
  2. It truncates the middle for the model's first pass, according to Teknium's later explanation.
  3. It saves the full markdown locally and returns a filepath plus line-number boundaries so the agent can page back into the source text on demand, again per Teknium's later explanation.

That is a cleaner explanation of NousResearch's launch claim that large pages are saved locally and paged on demand NousResearch's launch post.

Backend compatibility

Teknium repeated the same compatibility point across the thread: Teknium's works-with-any reply, Teknium's any-backend reply, and Teknium's keep-any-backend reply all say the optimization works with any existing web extraction backend and activates automatically after updating.

A few replies narrow the boundary. In Teknium's browser-tool distinction, Teknium said a browser tool is different from Hermes' web scraper tool, while Teknium's Firecrawl note says some backend-specific behavior still depends on the scraper itself.

Rescraping behavior

The new flow does not appear to add a caching layer for previously visited URLs. When asked directly, Teknium's rescrape reply said Hermes will rescrape every URL the agent wants to see each time.

That makes the launch's cost win more specific: the savings come from removing summarization overhead and changing how long documents are handed to the model, not from reusing prior fetches. Teknium's auto-activation reply and Teknium's yes-it-works-with-any reply also suggest the behavior ships automatically once users update.

Further reading

Discussion across the web

Where this story is being discussed, in original context.

On X· 3 threads
Summarizer removal3 posts
Backend compatibility5 posts
Rescraping behavior1 post
Share on X