Skip to content
AI Primer
release

Hyperbrowser launches AgentRank to test Claude, GPT, and Gemini on real websites

Hyperbrowser launched AgentRank, an open-source tool that runs Claude, GPT, and Gemini agents against a site to show where they get stuck. It matters because teams can turn agent website compatibility into a repeatable eval instead of an anecdotal demo.

3 min read
Hyperbrowser launches AgentRank to test Claude, GPT, and Gemini on real websites
Hyperbrowser launches AgentRank to test Claude, GPT, and Gemini on real websites

TL;DR

  • Hyperbrowser's AgentRank launch post pitches AgentRank as a way to run Claude, GPT, and Gemini agents against a live website and surface where they fail to navigate or complete tasks.
  • The project is open source, and Hyperbrowser's repo link post points developers to the code in Hyperbrowser's hyperbrowser-app-examples repository.
  • According to Hyperbrowser's API post, AgentRank rides on Hyperbrowser's agent APIs for Claude CU, OpenAI CUA, and Gemini CU, with one API key across the stack.
  • ai_for_success's recap frames the launch as website testing for agent behavior, which is the useful shift here: site compatibility becomes a repeatable eval instead of a one-off demo.

You can watch Hyperbrowser's launch video run through the product, open the hyperbrowser-app-examples repository, and trace the underlying agent layer from Hyperbrowser's API post to Hyperbrowser's agent API page. The interesting bit is not the dashboard itself, it is that someone is productizing "can agents use my site" as an eval surface across three model families at once.

AgentRank

Hyperbrowser describes AgentRank as a test harness for real websites. The launch post says it runs Claude, GPT, and Gemini agents on your site, then shows where they get stuck during browsing and interaction.

That makes the target metric unusually concrete for a low-level eval tool:

  • Can the agent navigate the site's structure?
  • Can it interact with page elements successfully?
  • Where does it fail or stall?

The commentary in ai_for_success's recap mostly matches the primary launch framing, which suggests the product is straightforward enough that the demo carries the story without much extra explanation.

Open-source pipeline

The second launch post is short but important: Hyperbrowser says teams can build their own agent testing pipeline, and it links directly to the public hyperbrowser-app-examples repository.

That open-source angle changes the shape of the launch. AgentRank is not presented as a closed report card. It is presented as a reproducible setup other teams can inspect, fork, and wire into their own evaluation flow.

Agent APIs

Hyperbrowser says AgentRank is built on its agent APIs, specifically Claude CU, OpenAI CUA, and Gemini CU. The same post adds that the stack works with one API key, and links to Hyperbrowser's agent API page.

That is the new infrastructure detail in this launch. AgentRank is the visible app, but the underlying product is a unification layer for computer-use style agents across multiple vendors.

Share on X