Skip to content
AI Primer
breaking

Mozilla reports Claude Mythos Preview fixed more Firefox bugs in April than the prior 15 months

Mozilla says Claude Mythos Preview helped it fix more Firefox security bugs in April than in the previous 15 months combined. Teams building large codebases should watch this as a strong production example of frontier models accelerating defensive vulnerability work.

4 min read
Mozilla reports Claude Mythos Preview fixed more Firefox bugs in April than the prior 15 months
Mozilla reports Claude Mythos Preview fixed more Firefox bugs in April than the prior 15 months

TL;DR

  • Mozilla says alexalbert__'s post was not hype, Firefox fixed 423 security bugs in April 2026, which is more than the prior 15 months combined in Mozilla's own chart.
  • According to Mozilla's official write-up, the jump came from pairing Claude Mythos Preview with a bug-finding harness, not from raw model output dropped straight into triage.
  • Simon Willison's summary highlighted Mozilla's bigger claim: a few months ago AI bug reports were mostly maintainer spam, then stronger models plus better filtering flipped that dynamic.
  • The bug list surfaced by emollick's screenshot included old, ugly browser bugs, including a 15-year-old <legend> issue and a 20-year-old XSLT flaw.

You can read Mozilla's full post, skim Simon Willison's notes, and the screenshots in emollick's post show the kind of exploit descriptions Mozilla says the system was producing. The weird bit is not just the April spike. Simon Willison's summary says Mozilla's existing defense-in-depth blocked many harness attempts, which turned the write-up into both a model capability story and a stress test of Firefox's own mitigations.

The April spike

Mozilla's chart turned a normal 20 to 30 fixes per month into 61 in February, 76 in March, then 423 in April. alexalbert__'s post and WesRoth's repost both point to the same breakpoint, April was the month Mythos-assisted work hit the graph hard.

The official post at Mozilla Hacks is explicit about scope: the chart is for all Firefox security bug fixes, all sources, all severities. That matters because Mozilla is not claiming every April fix was an AI-found zero day. It is claiming its overall defensive throughput changed dramatically.

The harness

Behind the Scenes Hardening Firefox with Claude Mythos Preview

Behind the Scenes Hardening Firefox with Claude Mythos Preview Fascinating, in-depth details on how Mozilla used their access to the Claude Mythos preview to locate and then fix hundreds of vulnerabilities in Firefox: Suddenly, the bugs are very good Just a few months ago, AI-generated security bug reports to open source projects were mostly known for being unwanted slop. Dealing with reports that look plausibly correct but are wrong imposes an asymmetric cost on project maintainers: it’s cheap and easy to prompt an LLM to find a “problem” in code, but slow and expensive to respond to it. It is difficult to overstate how much this dynamic changed for us over a few short months. This was due to a combination of two main factors. First, the models got a lot more capable. Second, we dramatically improved our techniques for harnessing these models — steering them, scaling them, and stacking them to generate large amounts of signal and filter out the noise. They include some detailed bug descriptions too, including a 20-year old XSLT bug and a 15-year-old bug in the <legend> element. A lot of the attempts made by the harness were blocked by Firefox's existing defense-in-depth measures, which is reassuring. Mozilla were fixing around 20-30 security bugs in Firefox per month through 2025. That jumped to 423 in April. Via Lobste.rs Tags: firefox, mozilla, security, ai, generative-ai, llms, anthropic, claude, ai-security-research

The most useful line in Mozilla's account is the process change. According to Simon Willison's summary, the shift came from two things: better models, and better techniques for harnessing them, specifically steering, scaling, and stacking outputs to generate more signal and filter noise.

That framing is more interesting than the headline number. Mozilla describes a system for producing candidate reports at volume, then separating plausible findings from the slop that made earlier AI security reports expensive to review. The official write-up at Mozilla Hacks reads like a production lesson in triage architecture, not just a model eval.

Bug classes

The screenshot in emollick's post is where this stops looking like generic "AI found bugs" marketing. Mozilla's examples include:

  • a WebAssembly GC fake-object primitive with potential arbitrary read and write
  • a 15-year-old <legend> bug triggered by edge cases across recursion limits, expando properties, and cycle collection
  • IPC race conditions that could lead to sandbox escape
  • a raw NaN crossing an IPC boundary and masquerading as a tagged JS object pointer
  • a DNS parsing edge case that leaked parent-process stack memory
  • a 20-year-old XSLT reentrancy bug

Those examples are detailed enough to show what Mozilla means by signal. They are not one-line lint findings. They are exploit-shaped reports touching IPC, parser edges, refcounting, nested event loops, and legacy browser subsystems.

Defense in depth

One of the most revealing lines came through Simon Willison's summary, which notes that many harness attempts were blocked by Firefox's existing defense-in-depth measures. Mozilla's post at Mozilla Hacks therefore doubles as a mitigation report: the system found real weaknesses, but it also ran into layers that prevented some candidate exploits from landing.

That gives the story a cleaner shape than the usual "model found vulnerabilities" discourse. Mozilla is describing both higher bug-finding throughput and a way to probe whether hardening work still holds up when a frontier model can generate much more inventive test cases at scale.

Further reading

Discussion across the web

Where this story is being discussed, in original context.

On X· 1 thread
The April spike1 post
Share on X