Harvard matematicians dey judge AI performance for unpublished research math

Published: 2026-06-14 23:09:44 |

Harvard’s “First Proof, Second Batch” dey test how AI dey perform for research-level mata-matics under strict condition dem. Thirty experts dem blind-grade solutions wey four top AI systems—models from OpenAI and Google—submit, using 10 original, unpublished problems wey dem take from active research (none dey for textbook or arXiv). Main result: the expert panel give passing grades for 7 out of the 10 problems across the four systems tested. Earlier trial runs reportedly solve only 2 of the 10, showing improvement fit happen with multiple attempts or different prompting strategies, while the grading still blind to where submissions come from. The organizers stress why unpublished problems matter: normal benchmarks often get known solution paths, but research math fit involve not even knowing if solution dey at all. This second batch follow the first evaluation wey happen February 2026, and e form ongoing framework to track whether AI performance really dey advance for the frontier of math research or na just dey plateau after early benchmark gains. Overall, the exercise give balanced view of AI performance: e fit solve meaningful research-level tasks, but reliability still no uniform across problems.

Neutral

Dis news no dey directly about crypto protocols, tokens, or regulation. Na technology evaluation of AI performance on unpublished maths problems be this. For crypto traders, near-term market impact likely small because e no get immediate link to BTC/ETH liquidity, stablecoins, exchange flows, or particular Web3 catalysts. But e fit matter indirectly through the wider “AI narrative” wey sometimes boost AI-adjacent assets. Still, this study dey framed as nuanced (pass on 7/10, vs 2/10 for early trials), and that one no likely to trigger one strong speculative rush like clear breakthrough announcement go do. Short-term: likely neutral—no direct trading trigger. Long-term: neutral to slight constructive for sentiment around AI capabilities, but any effect go be gradual and sector-wide rather than coin-specific. Similar past cases when AI benchmarks improve usually cause short-lived hype, but lasting price impact normally need follow-up link to deployable products or clear token-ecosystem demand—neither dey here.