Google DeepMind reports AlphaProof Nexus solved 9 Erdős problems with Lean verification
A new paper says AlphaProof Nexus resolved 9 of 353 open Erdős problems and 44 OEIS conjectures using Gemini-guided search plus Lean checks. The strongest results came where Lean libraries are already mature, so those libraries remain the bottleneck to watch.

TL;DR
- Google DeepMind's AlphaProof Nexus reportedly solved 9 of 353 open Erdős problems, including two that had been open for 56 years, according to kimmonismus's paper summary and the linked arXiv paper.
- The core loop paired Gemini models with Lean, where chetaslua's breakdown says Gemini 3.0 Flash handled high-throughput evaluation while Gemini 3.1 Pro did harder proof generation, and rohanpaul_ai's thread describes Lean as the system that forces every step to compile.
- A simpler agent that just alternated generation with compiler feedback matched the 9 Erdős solves, while kimmonismus's summary says the heavier evolutionary and RL machinery mainly helped on the hardest cases.
- The paper's reach went beyond Erdős: kimmonismus and WesRoth's recap say the system also proved 44 of 492 open OEIS conjectures and contributed results in algebraic geometry, optimization, graph theory, and quantum optics.
- The bottleneck stayed concrete, not mystical: kimmonismus's thread says wins clustered where Lean's math library is already mature, which lines up with the paper's HTML version and the broader Lean project.
You can read the paper PDF, skim the HTML version, and watch rohanpaul_ai's Terry Tao clip on proof-writing turning into search. The weirdly useful detail is that the fancy system was not required for the headline result: kimmonismus's thread says a basic agent reproduced all 9 Erdős solves. Another detail worth bookmarking is that the system sometimes caught bad formalizations in the problem statements themselves, which turned it into a debugging tool for the math, not just a proof generator, again according to that same paper summary.
Proof loop
The architecture looks more like a harness than a monolith. According to chetaslua's summary, DeepMind used Gemini 3.0 Flash for cheap rating and evaluation, Gemini 3.1 Pro for harder prover work, a formal verifier, and AlphaProof-style search.
The crucial piece is Lean. As rohanpaul_ai's thread puts it, the model keeps editing a formal proof, reads compiler errors, and tries again until the proof compiles or dies. That changes the job of the model from writing plausible math to generating candidates that can be killed quickly.
That filter matters because the failure modes were exactly the ones informal proof demos tend to hide. kimmonismus's summary says the agent hallucinated known lemmas and sometimes buried the real difficulty inside helper lemmas, both of which Lean rejects immediately.
Simple agent
The most surprising line in the evidence is that the basic loop was enough for the marquee result. According to kimmonismus's thread, a simple agent that alternated LLM generation with compiler feedback replicated all 9 Erdős successes.
The full AlphaProof Nexus stack still added machinery:
- Shared pools of partial proof attempts, as rohanpaul_ai describes
- Rating of promising branches, per chetaslua's component list
- Evolutionary search and reinforcement learning, according to kimmonismus's summary
But the paper summary in kimmonismus's thread says those extras delivered meaningful gains mainly on the hardest problems. That is Christmas come early for agent-loop nerds: stronger base models plus verifier feedback are closing ground that used to demand much more bespoke search infrastructure.
Solved set
The headline number is 9 solved Erdős problems out of 353 attempted. haider1's post adds that two had been open for 56 years.
The broader result set is easier to scan as a list:
- 9 of 353 open Erdős problems, per kimmonismus's summary
- 44 of 492 open OEIS conjectures, per WesRoth's recap
- A resolved 15-year-old question in algebraic geometry, according to kimmonismus
- A new algorithmic parameter in optimization theory that humans had not found, again per that summary
That spread is why the paper lands harder than a single benchmark spike. The system is not only retrieving known tricks inside one narrow domain, it is pushing through formally specified open questions across several areas where the library support is already deep enough.
Cost and checking
The paper summary in kimmonismus's thread pegs the spend at a few hundred dollars per solved Erdős problem. That price tag only makes sense because most candidate proofs die cheaply: chetaslua's breakdown says the system used Flash for high-throughput rating and evaluation, reserving Pro for harder proof work.
Formal verification also changes what “solved” means here. kimmonismus says no human review was needed to confirm correctness because Lean checked each logical step automatically, while WesRoth describes the final workflow as AI search, formal verification, then human review of the result rather than manual proof checking.
Library coverage
The limits are not subtle. rohanpaul_ai's earlier thread says the system works inside carefully constrained worlds, and kimmonismus's summary says successes clustered in combinatorics, number theory, and optimization, where Lean's math library is already mature.
That also explains why the paper can claim both a real jump and a real ceiling. Problems that require substantial new theory mostly stayed out of reach, and most of the 353 Erdős set remained unsolved, according to kimmonismus.
One more useful detail: that same summary says the agent sometimes detected misformalizations in the literature and corrected ambiguous statements before solving the fixed version. That is a different kind of capability than brute-force proof search, and it is new information even after all the benchmark numbers.