← SurfacedDrop no. 09Tech news drama6min read

ChatGPT 5.5 Pro Did PhD Math in 17 Minutes: Tim Gowers's Test

The story behind the drop.

A Fields Medalist handed ChatGPT 5.5 Pro an unsolved sumset problem. The model came back with a proof he called clearly best possible.

Published

UTC

Reading time

6 min

~210 wpm

Word count

1,295

plain English

Category

Tech news drama

tech-news-drama

A Fields Medalist handed an unsolved problem in additive combinatorics to ChatGPT 5.5 Pro, and seventeen minutes later it returned a proof he called "clearly best possible."

The experiment, and who is running it

On May 8, 2026, Sir Timothy Gowers published a post on Gowers's Weblog titled "A recent experience with ChatGPT 5.5 Pro." The post documents what he calls "PhD-level research in an hour or so," produced by OpenAI's reasoning-focused model on questions Gowers selected from a current preprint. The author is not a casual observer of mathematical novelty. Gowers won the Fields Medal in 1998 for work connecting functional analysis and combinatorics, was knighted in 2012 for services to mathematics, and currently holds the Combinatorics chair at the Collège de France while remaining a Research Professor at the University of Cambridge and a fellow of Trinity College. He also founded the Polymath Project, the massively collaborative online experiment whose first effort produced a combinatorial proof of the density Hales-Jewett theorem in seven weeks.

The test material was deliberately concrete. Gowers reached for Mel Nathanson's preprint "Diversity, Equity and Inclusion for Problems in Additive Number Theory," posted to arXiv as arXiv:2603.15556. The paper sits inside additive combinatorics, the branch of mathematics that studies what happens to the size of a sumset when you take a finite set of integers A and form the h-fold sumset hA, the set of all sums of h elements drawn from A. Nathanson asks for sets of integers with prescribed values of the size of A and the size of hA that have the smallest possible diameter, packaged as the function N(h,k). The reason Gowers chose this paper, and not a benchmark, is that solutions could be checked rigorously by a human expert, and Isaac Rajagopal's prior exponential framework at MIT supplied a known baseline against which the model's output could be compared.

Seventeen minutes and five seconds

The simplest version of Nathanson's problem fixes h at two. For that case, Nathanson's recent paper gave an upper bound on the diameter that grew exponentially in k, on the order of two to the power of k minus one. Gowers asked ChatGPT 5.5 Pro to do better.

The model thought for 17 minutes and 5 seconds. It returned a proof that replaced the exponential bound with a quadratic diameter bound, which Gowers described in his post as "clearly best possible." It then spent a further 2 minutes and 23 seconds producing a clean LaTeX preprint of the proof, ready in the formatting any journal would expect.

Gowers pushed. He asked the model to extend the result to restricted sumsets, the version of the problem that excludes pairs of identical elements where a equals b. The model, by his account, handled the generalization "with no trouble at all." On a single attempt, then, the system had improved a recently published human result and formatted its work into a publishable paper, on a question that was open before he asked it.

A polynomial bound and a new construction

The harder ground lay in the general case of arbitrary h. Gowers fed the model the same question without the simplifying assumption. ChatGPT 5.5 Pro thought for 16 minutes and 41 seconds and produced an intermediate result: it improved Nathanson's exponential-in-k bound to a bound that is exponential in k to the power alpha for any alpha greater than one half. Writing that result up as a LaTeX preprint took the model 47 minutes and 39 seconds.

Pressed for more, the model proposed a route to a polynomial bound. It sketched the approach in 13 minutes and 33 seconds, then verified the technical lemmas in 9 minutes and 12 seconds, then composed the clean LaTeX preprint of the polynomial proof in 31 minutes and 40 seconds. The final bound, written in plain prose, says that N(h,k) is at most a constant times k to the power of ten h cubed, valid for sufficiently large k.

The mathematically interesting part is not just the bound but how the model got there. Earlier proofs in this area lean on exponentially growing geometric series, which is why their diameter bounds blow up. ChatGPT 5.5 Pro introduced what it called h-squared-dissociated sets, a construction that uses polynomial-sized elements in place of the exponential ones. The proof threads that new object through classical material in additive number theory, including Sidon sets, B-sub-h sets, the Singer construction from 1938 and the Bose-Chowla construction from 1963.

When Gowers sent the polynomial proof to Isaac Rajagopal, the MIT student whose framework gave Gowers his baseline, Rajagopal worked through it line by line. He declared the proof "almost certainly correct" at both line-by-line and high-idea levels, and confirmed that the h-squared-dissociated-set idea was, to his knowledge, "completely original." Gowers's own line in the post is unambiguous about authorship: "My mathematical input was zero."

A floor, not a ceiling

The blog post is doing something carefully calibrated. It does not claim ChatGPT 5.5 Pro can prove anything more than the specific Nathanson sumset problems Gowers fed it. It does not claim mathematicians are obsolete. What it does claim is a recalibration of where the entry threshold sits. In Gowers's exact words: "The lower bound for contributing to mathematics will now be to prove something that LLMs can't prove."

That is a sentence aimed at the inside of the profession. The "gentle problems" Gowers refers to are the kind of warm-up questions traditionally given to first-year mathematics PhD students, problems that are open enough to be original but contained enough to be finished within a thesis schedule. Gowers writes that he is uneasy about how universities will train beginning students if a language model can clear that band of problems in under an hour. The question is operational, not rhetorical: how do you assign work to a beginner that is hard enough to be worth doing but not so hard it stops being a training exercise.

The institutional gap

The third strand of the post is administrative. The proofs ChatGPT 5.5 Pro produced are not available through the channels mathematicians normally use to read each other's work. arXiv, the dominant preprint server in mathematics, currently rejects machine-written submissions. Gowers notes the consequence in the post: the model's preprints are circulating only as Google Drive PDFs linked from his blog.

This is a small detail with a large second-order effect. The credibility of a mathematical result in 2026 rests in part on the route by which it reaches readers. A paper that exists only as a personal-blog link is harder to cite, harder to index, and harder to integrate into the standard apparatus of priority, peer review and citation. If Rajagopal's verdict, "almost certainly correct," is accurate, then the polynomial bound on N(h,k) is a real piece of additive combinatorics that the field's central distribution channel will not host. That is a policy question, and Gowers does not pretend to have resolved it.

What he does instead, at the end of his post, is keep score honestly. The model did the work. The baseline came from Rajagopal. The questions came from Nathanson. The verifier was Gowers. The construction, the h-squared-dissociated sets that replaced a generation of geometric-series arguments, was, by the only independent reader who has checked it, completely original. Whatever else May 8, 2026 turns out to mean, it is the first time a working Fields Medalist has put his name to that combination of claims.

Sources