GPT-5.2 derives a new result in theoretical physics
- outlace - 12368 sekunder sedanThe headline may make it seem like AI just discovered some new result in physics all on its own, but reading the post, humans started off trying to solve some problem, it got complex, GPT simplified it and found a solution with the simpler representation. It took 12 hours for GPT pro to do this. In my experience LLM’s can make new things when they are some linear combination of existing things but I haven’t been to get them to do something totally out of distribution yet from first principles.
- smj-edison - 251 sekunder sedanRegardless of whether this means AGI has been achieved or not, I think this is really exciting since we could theoretically have agents look through papers and work on finding simpler solutions. The complexity of math is dizzying, so I think anything that can be done to simplify it would be amazing (I think of this essay[1]), especially if it frees up mathematicians' time to focus even more on the state of the art.
- Davidzheng - 12379 sekunder sedan"An internal scaffolded version of GPT‑5.2 then spent roughly 12 hours reasoning through the problem, coming up with the same formula and producing a formal proof of its validity."
When I use GPT 5.2 Thinking Extended, it gave me the impression that it's consistent enough/has a low enough rate of errors (or enough error correcting ability) to autonomously do math/physics for many hours if it were allowed to [but I guess the Extended time cuts off around 30 minute mark and Pro maybe 1-2 hours]. It's good to see some confirmation of that impression here. I hope scientists/mathematicians at large will be able to play with tools which think at this time-scale soon and see how much capabilities these machines really have.
- square_usual - 9711 sekunder sedanIt's interesting to me that whenever a new breakthrough in AI use comes up, there's always a flood of people who come in to handwave away why this isn't actually a win for LLMs. Like with the novel solutions GPT 5.2 has been able to find for erdos problems - many users here (even in this very thread!) think they know more about this than Fields medalist Terence Tao, who maintains this list showing that, yes, LLMs have driven these proofs: https://github.com/teorth/erdosproblems/wiki/AI-contribution...
- cpard - 7972 sekunder sedanAI can be an amazing productivity multiplier for people who know what they're doing.
This result reminded me of the C compiler case that Anthropic posted recently. Sure, agents wrote the code for hours but there was a human there giving them directions, scoping the problem, finding the test suites needed for the agentic loops to actually work etc etc. In general making sure the output actually works and that it's a story worth sharing with others.
The "AI replaces humans in X" narrative is primarily a tool for driving attention and funding. It works great for creating impressions and building brand value but also does a disservice to the actual researchers, engineers and humans in general, who do the hard work of problem formulation, validation and at the end, solving the problem using another tool in their toolbox.
- computator - 1800 sekunder sedanI have a weird long-shot idea for GPT to make a new discovery in physics: Ask it to find a mathematical relationship between some combination of the fundamental physical constants[1]. If it finds (for example) a formula that relates electron mass, Bohr radius, and speed of light to a high degree of precision, that might indicate an area of physics to explore further if those constants were thought to be independent.
[1] https://en.wikipedia.org/wiki/List_of_physical_constants
- nilkn - 10762 sekunder sedanIt would be more accurate to say that humans using GPT-5.2 derived a new result in theoretical physics (or, if you're being generous, humans and GPT-5.2 together derived a new result). The title makes it sound like GPT-5.2 produced a complete or near-complete paper on its own, but what it actually did was take human-derived datapoints, conjecture a generalization, then prove that generalization. Having scanned the paper, this seems to be a significant enough contribution to warrant a legitimate author credit, but I still think the title on its own is an exaggeration.
- Insanity - 12859 sekunder sedanThey also claimed ChatGPT solved novel erdös problems when that wasn’t the case. Will take with a grain of salt until more external validation happened. But very cool if true!
- mym1990 - 6494 sekunder sedanMany innovations are built off cross pollination of domains and I think we are not too far off from having a loop where multiple agents grounded very well in specific domains can find intersections and optimizations by communicating with each other, especially if they are able to run for 12+ hours. The truth is that 99% of attempts at innovation will fail, but the 1% can yield something fantastic, the more attempts we can take, the faster progress will happen.
- pear01 - 1854 sekunder sedanIf a researcher uses LLM to get a novel result should the llm also reap the rewards? Could a nobel prize ever be given to a llm or is that like giving a nobel to a calculator?
- vbarrielle - 9177 sekunder sedanI' m far from being an LLM enthusiast, but this is probably the right use case for this technology: conjectures which are hard to find, but then the proof can be checked with automated theorem provers. Isn't it what AlphaProof does by the way?
- major4x - 4089 sekunder sedanCan't help not thinking of https://en.wikipedia.org/wiki/Bogdanov_affair
- elashri - 11639 sekunder sedanI would be less interested in scattering amplitude of all particle physics concepts as a test case because the scattering amplitudes because it is one of the concisest definition and its solution is straightforward (not easy of course). So once you have a good grasp of the QM and the scattering then it is a matter of applying your knowledge of math to solve the problem. Usually the real problem is to actually define your parameters from your model and define the tree level calculations. Then for LLM to solve these it is impressive but the researchers defined everything and came up with the workflow.
So I would read this (with more information available) with less emphasize on LLM discovering new result. The title is a little bit misleading but actually "derives" being the operative word here so it would be technically correct for people in the field.
- crorella - 12637 sekunder sedanThe preprint: https://arxiv.org/abs/2602.12176
- another_twist - 4164 sekunder sedanThats great. I think we need to start researching how to get cheaper models to do math. I have a hunch it should be possible to get leaner models to achieve these results with the right sort of reinforcement learning.
- jtrn - 2125 sekunder sedanThis is my favorite field for me to have opinions about, without not having any training or skill. Fundamental research i just a something I enjoy thinking about, even tho I am psychologist. I try to pull inn my experience from the clinic and clinical research when i read theoretical physics. Don't take this text to seriously, its just my attempt at understanding whats going on.
I am generally very skeptical about work on this level of abstraction. only after choosing Klein signature instead of physical spacetime, complexifying momenta, restricting to a "half-collinear" regime that doesn't exist in our universe, and picking a specific kinematic sub-region. Then they check the result against internal consistency conditions of the same mathematical system. This pattern should worry anyone familiar with the replication crisis. The conditions this field operates under are a near-perfect match for what psychology has identified as maximising systematic overconfidence: extreme researcher degrees of freedom (choose your signature, regime, helicity, ordering until something simplifies), no external feedback loop (the specific regimes studied have no experimental counterpart), survivorship bias (ugly results don't get published, so the field builds a narrative of "hidden simplicity" from the survivors), and tiny expert communities where fewer than a dozen people worldwide can fully verify any given result.
The standard defence is that the underlying theory — Yang-Mills / QCD — is experimentally verified to extraordinary precision. True. But the leap from "this theory matches collider data" to "therefore this formula in an unphysical signature reveals deep truth about nature" has several unsupported steps that the field tends to hand-wave past.
Compare to evolution: fossils, genetics, biogeography, embryology, molecular clocks, observed speciation — independent lines of evidence from different fields, different centuries, different methods, all converging. That's what robust external validation looks like. "Our formula satisfies the soft theorem" is not that.
This isn't a claim that the math is wrong. It's a claim that the epistemic conditions are exactly the ones where humans fool themselves most reliably, and that the field's confidence in the physical significance of these results outstrips the available evidence.
I wrote up a more detailed critique in a substack: https://jonnordland.substack.com/p/the-psychologists-case-ag...
- emp17344 - 9598 sekunder sedanCynically, I wonder if this was released at this time to ward off any criticism from the failure of LLMs to solve the 1stproof problems.
- pruufsocial - 12429 sekunder sedanAll I saw was gravitons and thought we’re finally here the singularity has begun
- snarky123 - 12215 sekunder sedanSo wait,GPT found a formula that humans couldn't,then the humans proved it was right? That's either terrifying or the model just got lucky. Probably the latter.
- - 12196 sekunder sedan
- getnormality - 3889 sekunder sedanI'll believe it when someone other than OpenAI says it.
Not saying they're lying, but I'm sure it's exaggerated in their own report.
- baalimago - 10293 sekunder sedanWell, anyone can derive a new result in anything. Question is most often if the result makes any sense
- sfmike - 7311 sekunder sedan5.2 is the best model on the market.
- PlatoIsADisease - 8282 sekunder sedanI'll read the article in a second, but let me guess ahead of time: Induction.
Okay read it: Yep Induction. It already had the answer.
Don't get me wrong, I love Induction... but we aren't having any revolutions in understanding with Induction.
- ares623 - 9664 sekunder sedanI guess the important question is, is this enough news to sustain OpenAI long enough for their IPO?
- gaigalas - 10367 sekunder sedanI like the use of the word "derives". However, it gets outshined by "new result" in public eyes.
I expect lots of derivations (new discoveries whose pieces were already in place somewhere, but no one has put them together).
In this case, the human authors did the thinking and also used the LLM, but this could happen without the original human author too (some guy posts some partial on the internet, no one realizes is novel knowledge, gets reused by AI later). It would be tremendously nice if credit was kept in such possible scenarios.
- - 11146 sekunder sedan
- vonneumannstan - 12614 sekunder sedanInteresting considering the Twitter froth recently about AI being incapable in principle of discovering anything.
- mrguyorama - 8460 sekunder sedanDon't lend much credence to a preprint. I'm not insinuating fraud, but plenty of preprints turn out to be "Actually you have a math error here", or are retracted entirely.
Theoretical physics is throwing a lot of stuff at the wall and theory crafting to find anything that might stick a little. Generation might actually be good there, even generation that is "just" recombining existing ideas.
I trust physicists and mathematicians to mostly use tools because they provide benefit, rather than because they are in vogue. I assume they were approached by OpenAI for this, but glad they found a way to benefit from it. Physicists have a lot of experience teasing useful results out of probabilistic and half broken math machines.
If LLMs end up being solely tools for exploring some symbolic math, that's a real benefit. Wish it didn't involve destroying all progress on climate change, platforming truly evil people, destroying our economy, exploiting already disadvantaged artists, destroying OSS communities, enabling yet another order of magnitude increase in spam profitability, destroying the personal computer market, stealing all our data, sucking the oxygen out of investing into real industry, and bold faced lies to all people about how these systems work.
Also, last I checked, MATLAB wasn't a trillion dollar business.
Interestingly, the OpenAI wrangler is last in the list of Authors and acknowledgements. That somewhat implies the physicists don't think it deserves much credit. They could be biased against LLMs like me.
When Victor Ninov (fraudulently) analyzed his team's accelerator data using an existing software suite to find a novel SuperHeavy element, he got first billing on the authors list. Probably he contributed to the theory and some practical work, but he alone was literate in the GOOSY data tool. Author lists are often a political game as well as credit, but Victor got top billing above people like his bosses, who were famous names. The guy who actually came up with the idea of how to create the element, in an innovative recipe that a lot of people doubted, was credited 8th
https://journals.aps.org/prl/abstract/10.1103/PhysRevLett.83...
- brcmthrowaway - 12206 sekunder sedanEnd times approach..
- starkeeper - 12125 sekunder sedan[flagged]
- baggachipz - 12344 sekunder sedan[flagged]
- starkeeper - 11662 sekunder sedan[flagged]
- longfacehorrace - 10898 sekunder sedanCar manufacturers need to step up their hype game...
New Honda Civic discovered Pacific Ocean!
New F150 discovers Utah Salt Flats!
Sure it took humans engineering and operating our machines, but the car is the real contributor here!
Nördnytt! 🤓