Skip to content
AI Primer
update

Report: Claude Mythos reportedly solves Erdős problem #90 in air-gapped test

Anthropic staff and outside observers said a Mythos-powered Claude Code setup solved Erdős problem #90 in an internet-blocked test. The result is still based on harnessed runs and social-thread disclosures, so watch for fuller verification before treating it as settled.

4 min read
Report: Claude Mythos reportedly solves Erdős problem #90 in air-gapped test
Report: Claude Mythos reportedly solves Erdős problem #90 in air-gapped test

TL;DR

  • kimmonismus' report said Claude Mythos solved Erdős problem #90 in an air-gapped test, and WesRoth's summary added that the setup blocked internet access to avoid leakage from OpenAI's earlier result.
  • The Anthropic-side description that deredleritt3r quoted was not a one-shot run, it used multiple Claude Code instances that ideated, summarized avenues, and fanned work back out.
  • kimmonismus' post said Mythos repeatedly found a different route than OpenAI's reported solution, and arohan's repost linked a public PDF of a TeXed-up version of the argument.
  • The OpenAI comparison is narrower than some early posts made it sound: Sebastien Bubeck said Mythos and GPT-5.5 can reproduce the result with an appropriate harness, while his follow-up reply called the jump from that to an internal model's one-shot solve "the biggest lift."
  • Public evidence is still mostly social-thread reporting plus a proof PDF, not a full formal writeup of the evaluation protocol, as altryne and haider1 both framed the result against OpenAI's earlier disclosures.

A weekend math test turned into a weirdly revealing AI benchmark story. kimmonismus' post says Mythos solved Erdős problem #90 with a cleaner argument than the previously known route, a quoted Anthropic description says the run used a multi-instance Claude Code harness, and TeXed-up proof PDF gives the public something more concrete than a screenshot war.

Air-gapped setup

The core claim is narrow and specific. According to WesRoth, Anthropic tested Mythos in an isolated setup where Claude Code instances had no internet access, explicitly to rule out contamination from OpenAI's earlier published result.

kimmonismus' post adds the stronger detail engineers will care about: Mythos did not just reproduce the known approach, it repeatedly converged on a different argument that the mathematician involved described as cleaner and free of the earlier analytic complications.

Claude Code harness

The most concrete process description came via deredleritt3r, who quoted an Anthropic explanation of "isolated claude code instances hitting mythos." The workflow was:

  • one instance got the problem and explored positive and negative avenues
  • another instance summarized those avenues
  • more instances received a summary plus a candidate idea
  • the run then branched from there

That makes this a harnessed system result, not a single raw completion from a bare model prompt.

OpenAI's Sebastien Bubeck leaned into exactly that distinction. He wrote that Mythos and GPT-5.5 can reproduce what OpenAI's internal model did on the unit distance problem "with an appropriate harness," then in a follow-up reply called the move from harnessed reproduction to one-shot solving the biggest lift.

Different proof route

The striking part of kimmonismus' report is not just that Mythos reached a solution, it is that the model reportedly kept landing on a different line of attack than OpenAI's earlier one. That is the detail that makes this feel less like memorization theater and more like an independent search trace, assuming the air-gap and protocol details hold up.

The public artifact is a TeXed-up version of Mythos's argument, linked by arohan. haider1 separately described the proof as a "cute, simple proof," which matches the cleaner-route framing in the earlier reports.

Internal model versus GPT-5.5

Part of the online confusion is that "OpenAI solved it" can refer to two different claims. Sebastien Bubeck said OpenAI had an internal model that solved the problem in one shot, while Mythos and GPT-5.5 could reproduce that result with a harness.

haider1 made the same distinction more bluntly, saying OpenAI used an internal model rather than GPT-5.5 for the original solve, and that GPT-5.5 only matched it later with minimal human guidance. That does not weaken the Mythos result, but it does change the apples-to-apples comparison people are making in timeline posts.

Public artifact

The most concrete thing available right now is not a benchmark card or a polished launch post, it is the proof PDF itself. arohan linked a document titled as Opus 4.7's TeXed-up version of Mythos's argument, which gives outside readers an actual mathematical object to inspect even though the fuller evaluation methodology is still being reconstructed from tweets.

That mismatch between artifact quality and disclosure style is part of why this story landed oddly. altryne noted that Anthropic did not wrap the result in a large announcement, while daniel_mac8 argued the company had already spent its Mythos attention budget on cybersecurity framing instead.

Further reading

Discussion across the web

Where this story is being discussed, in original context.

On X· 4 threads
TL;DR1 post
Claude Code harness1 post
Internal model versus GPT-5.51 post
Public artifact1 post
Share on X