updateJune 11, 2026

Claude Fable 5 users report unprompted browser actions and fast token burn

Two days after Claude Fable 5 launched, users reported that it could catch bugs and drive browser tasks on its own, but also open tools or emulators unprompted and burn tokens quickly. Track real session cost and tool autonomy against benchmark gains before adopting it for coding work.

5 min read

Claude Fable 5 users report unprompted browser actions and fast token burn

TL;DR

According to Simon Willison's proactive write-up, Claude Fable 5 can jump well past code reading into browser control, local diagnostics, and self-directed tooling during an ordinary debugging session.
In the June 10 HN delta and the June 11 HN delta, users said those agentic runs came with ugly spend curves, including a monthly limit hit after about an hour and an Android emulator session that burned through a week's tokens.
Anthropic's launch post priced Fable 5 at $10 per million input tokens and $50 per million output tokens, while Simon Willison's first impressions called it slow, expensive, and unusually willing to keep churning.
The practical split in the HN discussion summary is simple: some users say Fable 5 catches bugs and high-level plan flaws earlier models missed, while others say benchmark gains matter less than task cost, latency, and how much autonomy the model takes on by default.

You can read Anthropic's announcement, Simon Willison's initial impressions and his browser-automation post, then drop into the main Hacker News thread, where the weirdest reports are not about benchmark charts at all. One user says Fable spun up an Android emulator for a database bug, another says it set up a full testing lab for a Windows lifecycle issue, and Anthropic's own product page says sensitive requests may silently route to Opus 4.8 instead of staying on Fable 5.

Browser actions

Claude Fable is relentlessly proactive

After two days of experience with Claude Fable 5 I think the best way to describe it is relentlessly proactive. It knows a whole lot of tricks and it will deploy pretty much any of them to get to its goal. I'll illustrate this with an example. I was hacking on Datasette Agent today when I noticed a glitch: a horizontal scrollbar that shouldn't be there in the jump menu chat prompt. I snapped this screenshot: Then I started a fresh claude session in my datasette-agent checkout, dragged in the screenshot and told it: Look at dependencies to help figure out why there is a horizontal scrollbar here I had a hunch the cause was in a dependency of Datasette Agent (likely Datasette itself) and I knew Fable was good at digging into dependency code, either by inspecting installed files in its own virtual environment site-packages or by referencing a local checkout on disk. Telling it to start with dependencies felt like a good bet. I got distracted by a domestic task and wandered away from my computer. When I came back a few minutes later I saw my machine open a browser window in my regular Firefox and then navigate to the dialog in question. I had not told Claude Code to use any browser automation, and I was pretty sure it wasn't possible for it to trigger mouse movements or keyboard shortcuts within a window, so how was it doing that? I watched in fascination as it continued with its explorations, then saw it open a Safari window instead of Firefox. I also grabbed this snapshot from

Willison's report is the clearest look yet at how Fable 5 behaves inside a real coding session. He handed Claude Code a screenshot of a scrollbar bug and, a few minutes later, watched it open Firefox, then Safari, then Chrome via Playwright while it tried to reproduce and measure the problem.

His write-up says Fable did more than click around. It inspected dependency code, edited templates to trigger keyboard shortcuts, started local servers, created a small measurement page with permissive CORS, wrote results to /tmp/diag.json, and read them back for analysis in the same run.

That is Christmas come early for coding agent nerds, but it also means the interesting unit is no longer just prompt quality. It is the full harness of tools, browsers, local files, and side effects the model is willing to enlist once you give it a vague goal.

Token burn

Fresh discussion on Claude Fable 5

Today's fresh signal is mostly hands-on reports from people using Fable 5 in Claude Code and Claude.ai. Several commenters say it materially improves higher-level reasoning, plan review, and complex bug work; one describes it as strong enough to set up its own testing lab for a Windows process lifecycle issue, and another says it can find directional simplifications after Opus and Codex have exhausted obvious fixes. The other new theme is friction: people report hitting spend or usage limits very quickly, and some argue the model is too expensive to be a default choice. A separate comment questions Anthropic's benchmark changes and moving scores into the PDF, while another raises legal concerns about the model's data-retention and access policies.

Fresh discussion on Claude Fable 5

Today's fresh comments are mostly hands-on usage reports rather than new launch-context debate. One commenter says Fable 5 is finding valid issues in a technical proposal that earlier models missed, while another says it quickly drained tokens by spawning an Android emulator unprompted for a straightforward database bug. Another user reports that Fable 5 in Claude Code still feels smart, but not dramatically better than other strong models for a simple library-research task. There is also renewed focus on the model card's behavioral notes: commenters quote the section about evaluation awareness and risky/destructive actions, and another points out that Anthropic changed the benchmark set again and moved scores into the PDF. These are incremental signals about real-world behavior, hidden cost, and skepticism toward the reported benchmark gains.

The same thread that praised Fable 5's bug-finding also filled up with cost complaints. The June 10 HN delta cites one user who hit a monthly spend limit after about an hour on Ultracode, then burned through another $133 in 27 minutes.

The June 11 HN delta adds the most vivid example: a user asked about a straightforward database bug, Fable decided to launch an Android emulator, navigate by screenshots, and consumed what the commenter described as an entire week's tokens.

The hands-on reports cluster around three failure modes:

Long, self-directed runs that keep exploring after a human might have stopped.
Expensive tool choices, like browser automation or emulator setup, for bugs that might not need them.
Weak visibility into whether the extra work produced a proportionate gain in task quality.

That last point is why one commenter in the HN discussion summary reduced model evaluation to three numbers: did it finish the task, how much did it cost, and how long did it take.

Bug finding

Discussion around Claude Fable 5

Thread discussion highlights: - mcv on real-world review quality: I just submitted it to Fable, and it eviscerated it. Tons of inconsistencies found, issues skimmed over or ignored, too optimistic assumptions, math that doesn't really add up if you look at it in context. - vitally3643 on token spend and agent behavior: it also really, really wants to burn tokens... it decided to spin up an android emulator unprompted and started navigating the app by reading screenshots and injecting touch events. There went my entire week's tokens. - locknitpicker on benchmark interpretation: There are a couple of crisp metrics that can be used to evaluate a model: given a prompt, does it finish a task... how much did it cost... how long did it took?

Initial impressions of Claude Fable 5

I didn't have early access to today's Claude Fable 5 release, but I've spent the past ~5.5 hours putting it through its paces. My initial impressions are that this is something of a beast. It's slow, expensive and has been quite happily churning through everything I've thrown at it so far. As is frequently the case with current frontier models the challenge is finding tasks that it can't do. First, let's review the key characteristics. Anthropic claim that Claude Fable 5 offers the same performance as Claude Mythos 5, except with much more strict guardrails in place to prevent it being used for harmful things. Those guardrails trigger often enough that the Claude API has new mechanisms for letting you know when you hit them, and even has a new option to request it falls back to another model automatically if something gets rejected. Claude Mythos 5 is out today as well, Anthropic say it "Shares Claude Fable 5's capabilities without the safety classifiers". The models have a 1 million token context window, 128,000 maximum output tokens and a knowledge cut-off date of January 2026. They are priced at twice the price of Claude Opus 4.5/4.6/4.7/4.8: $10/million input tokens and $50/million output tokens. There's no increase in price for longer context usage. Other than that the upgrade guide is substantially thinner than the similar guide for Opus 4.8. The big model smell The best way to describe Fable is that it feels big. Not just in terms of speed and cost, but also in how muc

The counterweight is that several users say Fable 5 earns its keep on the right jobs. The HN discussion summary quotes one user saying the model "eviscerated" a technical proposal and found inconsistencies, ignored issues, and shaky math that earlier passes missed.

The earlier HN delta adds reports of stronger high-level reasoning, directional pivots after Opus and Codex had exhausted obvious fixes, and one debugging session where Fable built a full reproduction lab instead of trying a few shallow patches.

Willison's first impressions land in roughly the same place. He called Fable 5 a beast, noted a 1 million token context window and 128,000 token output cap, and said the real challenge was finding tasks it could not do.

What shipped

Anthropic Launches Claude Fable 5 and Specialized Claude Mythos 5 Models

Anthropic has released Claude Fable 5, a Mythos-class model optimized for general use and available via the Claude API. Alongside it, the company introduced Claude Mythos 5, a version of the same model with specialized safeguards removed to support cybersecurity and, eventually, biomedical research. Claude Mythos 5 is currently restricted to partners in Project Glasswing—a collaboration with the U.S. government—with plans to expand access through a trusted access program. Both models are priced at $10 per million input tokens and $50 per million output tokens.

Anthropic shipped two models on the same day: Claude Fable 5 and Claude Mythos 5. According to the launch summary, Mythos 5 is the same underlying model with some safeguards lifted, but it is restricted to Project Glasswing partners and other trusted-access programs rather than general API users.

For everyone else, Fable 5 is the public surface. Anthropic's product page says it is meant for long, complex knowledge and coding work, priced at $10 per million input tokens and $50 per million output tokens, with access through the Claude API, marketplaces, AWS, Google Cloud, and Microsoft Foundry.

Those list prices are not hidden. The surprise is how many early user reports describe the model acting like it has a much larger appetite for tool use than earlier Claude variants.

Fallback routing

Initial impressions of Claude Fable 5

Anthropic buried one important operational detail in the launch materials and product page: Fable 5 does not always stay Fable 5. The launch post and product page say requests that trip cybersecurity, biology, chemistry, or distillation safeguards may route to Opus 4.8 instead, and the product page says API users need to configure that fallback behavior.

Willison's initial impressions flagged the same mechanism from the API side: Anthropic added new signals for guardrail hits and an option to fall back automatically when a request gets rejected. His later proactive session write-up says an invisible guardrail eventually downgraded his run to Opus, which then continued with the methods Fable had already set up.

That makes the early field reports harder to read than a normal model launch. Some surprising behavior is Fable 5 itself, some may be the surrounding agent harness, and some sessions can switch models midstream.