Science & Tech FrontiersAI Agent

An 80-Year-Old Mathematical Conjecture, Disproved by a General-Purpose AI Model

Place some points on a plane. How many pairs can be exactly distance 1 apart?

This question was posed by Paul Erdős in 1946, and for 80 years there was no real progress. Two days ago, an internal OpenAI reasoning model disproved Erdős’s conjecture. Fields Medalist Tim Gowers, after reviewing the proof, wrote: if this were a paper by a human submitted to Annals of Mathematics, “I would have recommended acceptance without any hesitation.”

But the real signal here isn’t about mathematics.

A Problem Anyone Can Understand

Let’s start with the question itself. You have n points scattered on a plane. What’s the maximum number of pairs exactly distance 1 apart?

Intuitively, putting them in a line doesn’t help — adjacent points are distance 1 apart, but that yields only n-1 pairs, growing linearly with n. A better approach: arrange the points in a square grid, aligned horizontally and vertically with spacing 1. Each point now has more neighbors at unit distance, giving roughly 2n pairs — still linear.

But you can do better. Rescale the entire grid so that a specific distance between grid points equals exactly 1, then count how many times that distance appears. In a √n × √n grid, certain distance values appear more frequently — because an integer can be expressed as a sum of two squares in multiple ways (for example, 5 = 1² + 2², and also 5 = 2² + 1² — the same distance value corresponds to more point pairs in the grid). Erdős himself calculated that this grid construction achieves roughly n^(1 + c/log log n) unit-distance pairs — slightly faster than linear, but not by much.

log log n grows extremely slowly. Even when n equals the number of atoms in the universe, log log n is still in the single digits. So this term approaches 0 as n grows — the growth is essentially linear.

In 1946, Erdős conjectured that this was optimal. In mathematical terms: the number of unit-distance pairs is at most n^(1+o(1)), where o(1) denotes a term that tends to 0 as n grows. For 80 years, nearly all mathematicians believed this conjecture was true.

Classic construction of unit-distance pairs in a square grid

In a rescaled square grid, each point can form unit-distance pairs with multiple neighbors.

Why 80 Years Without Progress

This problem has one key number: O(n^(4/3)). In 1984, Spencer, Szemerédi, and Trotter proved this upper bound — you can think of it as an insurmountable ceiling. What it means technically isn’t important. What matters is what happened in the 42 years that followed: the best mathematicians tried every method they could think of, and this number refused to budge.

In that same 1946 paper, Erdős posed another question: given n points, what is the minimum number of distinct distances between pairs? This is the sibling problem to the unit distance question. In 2010, Guth and Katz nearly solved it using an entirely new set of methods. János Pach wrote at the time: “This is a festive day.” But he immediately added: “I am afraid that it will take much longer to settle the Unit Distance Problem.”

His intuition was dead right. Guth and Katz’s methods tore through the sibling problem, but they were powerless against the unit distance problem. The same tools, applied to a different question, produced nothing. The O(n^(4/3)) ceiling hung there from 1984 all the way to 2026.

Making matters worse, all indirect evidence pointed toward the conjecture being true. Matoušek and others studied non-Euclidean variants of the problem and found that in the vast majority of cases, the conjecture holds — if you replace Euclidean distance with a different metric, the square grid is indeed optimal. It was as if every signpost pointed in the same direction, but no one could find the road.

Noga Alon called this one of Erdős’s favorite problems, one he mentioned repeatedly in his lectures. “Every mathematician working in combinatorial geometry thought about this problem.”

What the AI Did

GPT-5 chose the direction most human mathematicians didn’t: it tried to disprove the conjecture, not prove it.

In the model’s chain of thought — spanning over 100 pages of reasoning — most of its steps were attempts to construct a counterexample. As number theorist Arul Shankar put it: “The model has some combination of good intuition, willingness to try approaches considered long-shot by the community, and a predisposition to attempt constructions.”

The key step came from a place no one expected: algebraic number theory. Specifically, the proof uses infinite class field towers and Golod-Shafarevich theory. These tools have almost no intersection with combinatorial geometry — they were developed to study the algebraic structure of numbers, dealing with things like factorization in extensions of the integers. No one imagined they could answer a geometric question about points and distances on a plane.

Erdős’s own construction relied on Gaussian integers — numbers of the form a+bi, where a and b are integers and i is the square root of -1. GPT-5’s breakthrough was to replace Gaussian integers with more complex algebraic number fields, exploiting their richer symmetries to create many more unit-length differences. The model constructed infinitely many point configurations achieving n^(1+δ) unit-distance pairs, where δ is a fixed constant greater than 0. Princeton’s Will Sawin later refined δ to 0.014.

This means Erdős was wrong. For 80 years, everyone believed the square grid was essentially optimal. It wasn’t.

The Signal Is Bigger Than Mathematics

This is easiest to misread as “AI is smarter than human mathematicians.” Tim Gowers’s assessment is genuinely striking — Annals of Mathematics is among the most prestigious journals in mathematics. Arul Shankar said AI has moved “beyond just helpers to human mathematicians — they are capable of having original ingenious ideas, and then carrying them out to fruition.” Jacob Tsimerman, who “actually briefly worked on this problem and tried to make a counterexample, but failed to make progress,” called the construction “intimidating to see through even if you know what is going on.”

But the signal worth focusing on isn’t “winning.” It’s three more specific things.

First, this model was not specifically trained for mathematics. It’s not AlphaGeometry, not some scaffolded proof-search system. Noam Brown explicitly called it a “general-purpose LLM,” not targeted at this problem or even at the field of mathematics. This means reasoning capability, past a certain threshold, can transfer across domains — it can do original research in math not because it was specialized, but because it can reason.

Second, this isn’t an isolated case. OpenAI for Science simultaneously released 13 case studies spanning mathematics, physics, biology, materials science, and computer science. In physics, it helped analyze symmetries around black hole equations. In biology, it identified an unexpected mechanism change in an immune cell experiment and proposed an experiment that was later validated in the lab. In mathematics, there were four proofs (including a separate Erdős number-theory problem). If this were a single case, you could say “math happens to be an LLM-friendly domain.” Thirteen cases across five disciplines makes that harder to argue.

Third, the timeline is compressing. Noam Brown’s exact words: “Less than 1 year ago frontier AI models were at IMO gold-level performance. I expect this pace of progress to continue.” IMO gold means solving problems with known answers; original research means creating new knowledge. That jump took one year.

Humans Are Still in the Loop

GPT-5’s chain of thought runs over 100 pages. Human mathematicians extracted the key parts, rewrote them in standard paper format, verified, and supplemented. The companion paper is signed by Tim Gowers, Noga Alon, Arul Shankar, Jacob Tsimerman, and Thomas Bloom — not GPT-5.

But the pattern itself is shifting. Thomas Bloom’s summary in the companion paper may best capture where things stand: “The frontiers of knowledge are very spiky, and no doubt the coming months and years will see similar successes in many other areas of mathematics, where long-standing open problems are resolved by an AI revealing unexpected connections and pushing the existing technical machinery to its limit.”

Mathematics is the cleanest testbed for reasoning: problems are precise, proofs are verifiable, and a long argument only holds if every link in the chain is correct. If a model can produce original contributions that survive scrutiny by top human experts in this kind of testbed, its potential in other domains requiring long-chain reasoning is not theoretical speculation.

Direction still comes from people. Choosing which problems matter, interpreting what the results mean, deciding where to go next — these remain human judgments. But the value of those judgments is shifting: when a general-purpose reasoning model disproves a conjecture that stood for 80 years, there are more things to rejudge than one might think.


Primary sources: OpenAI Blog · Proof PDF · Mathematician Remarks · Chain of Thought · OpenAI for Science Paper · János Pach on Guth-Katz (2010)