$1000 to Beat an 80-Year-Old Erdős Problem. The Math Is the Second Story.

AINews summarized what may be the cleanest milestone of this model generation: OpenAI's GPT-next disproved an 80-year-old Erdős conjecture — the planar unit distance problem — in under 32 hours of model time, for roughly $1000 of compute. The result is being framed as the math story. I think the math is the second-most interesting thing about it.

The first is what happened to the cost-per-discovery curve.

The number that should actually scare people

$1000 to overturn an open problem that survived eighty years of human attention isn't a model story. It's a price story.

I have been running ~70 MCP tools across five channel servers on my own machine for six months. The thing I think about every day is the unit economics of an agent doing one specific job — drafting a comment, scanning my SEO findings, writing a blog post — because at this scale the only number that matters is "how much did that cost me and how much did I save." A single Claude Code session writing a real feature on Sutra is maybe $4-$8 of API spend. The same session a year ago would have been $40-$80 and would have failed twice along the way.

The Erdős result is the same curve hitting a different domain. A 32-hour autonomous search session for $1000 is not a frontier-lab number anymore. It's a single individual's discretionary budget. You can run that every week and call it a hobby.

That is the actual update from this announcement. Not "AI does math now" — that has been creeping up since the 2025 IMO Gold run. The update is a problem class that used to require a tenured department now fits inside a long weekend's worth of API credits.

Why the indie operator framing matters more than the lab framing

Most coverage will read this as "OpenAI beats human mathematicians." That is the wrong frame because it sets up the wrong question — "when do models replace mathematicians" — when the interesting question is much smaller and much closer to home.

The interesting question is: what stack of mine, that I assumed was expensive enough to be safe from automation, just got repriced?

I ran into this exact pattern when I first built Sathi as a 70+ tool MCP server. The 812 tests on that stack exist because I assumed the protocol overhead was the hard part and the cognitive overhead was free. Six months later the cognitive overhead is basically free — the agent reads the test suite, understands the schema, and writes new tools faster than I can review them. The cost line that I priced in 2025 is not the cost line of 2026.

The Erdős result is the same shock, scaled up. An entire problem-solving discipline just had its price-per-attempt drop two orders of magnitude. The people who built their careers on the assumption that running a 32-hour search was infeasible are about to discover it isn't, and the field will reshape around the new floor.

The three places this hits next

If $1000 buys you an 80-year-old open conjecture, here are the three nearest-adjacent domains where the same number disrupts the same way:

One — code archaeology. Reverse-engineering a 20-year-old binary, recovering a lost protocol, deconflicting two undocumented APIs. These are problems where the value is high, the human time is brutal, and the search space is bounded. A $1000 weekend agent run on a sealed codebase is going to recover answers that used to take a senior engineer a month. I have already seen a glimpse of this with the $400k Bitcoin wallet recovery — the real breakthrough there was the agent finding a bug in the brute-force tool everyone else was using. Same shape of cost collapse.

Two — security research. Finding a novel vulnerability class, fuzzing a protocol stack until something falls out. The defensive-research labs are well funded; the offensive ones at the indie scale are not. $1000 of autonomous fuzz-and-prove is going to start finding things that ten years of academic eyeballs didn't.

Three — infrastructure post-mortems. "Why did our system fail in this specific way" is a problem class with a bounded blast radius and a high willingness-to-pay. A weekend of agent time chewing through six months of logs and traces is the same scale of compute, on a target that absolutely every team has.

None of those three are frontier mathematics. All three are inside the existing budget of a mid-sized engineering team and most of them are inside the budget of one curious individual. That is the part of the announcement that should get noticed and probably won't.

What this doesn't mean

Two clarifications, because the discourse tomorrow is going to overclaim.

It doesn't mean models can do open-ended research. The Erdős problem is well-bounded — clear statement, clear verification, finite candidate space once you frame it right. The interesting research questions in most fields don't look like that and the cost-per-attempt is much higher when you don't have a checker.

It doesn't mean the cost line keeps falling at the current rate forever. The $1000 number assumes a frontier model running on subsidized infrastructure. If GPT-next costs OpenAI $50k of real compute to give you $1000 of API spend, the curve looks different from the consumer side than it does from the supply side. We've been through enough hosting reprices in the last year to know this is not a stable equilibrium.

But within those caveats: a hard-thinking task that used to cost a year of compute now costs a long weekend's worth, and the people who internalize that first will build the products that matter next.

Why this matters

If you are an indie operator or a small team running on the assumption that "expensive cognitive work" is your moat, that assumption just got a stress test. Pick the most expensive part of your job — the part you charge the most for because it takes you the longest. Then ask: at $1000 a run, how many times would you have to swing the agent at it before one of those attempts beat you?

If the answer is less than ten, the moat is gone. Build something with the saved time. Don't be the mathematician who spent the next year trying to prove the result wasn't really proved.