数学领域的AI革命已经到来

内容来源:https://www.quantamagazine.org/the-ai-revolution-in-math-has-arrived-20260413/
内容总结:
人工智能浪潮席卷数学界,研究范式迎来历史性变革
2025年夏季成为人工智能深度介入数学研究的转折点。当年7月,多个AI模型在国际数学奥林匹克竞赛中成功解答六道难题中的五道,这一表现震惊了数学界。尽管奥赛题目属于有已知答案的“谜题”,而非开放性研究问题,但此次突破促使许多曾认为AI错误率过高、不堪大用的数学家开始认真尝试与之合作。
从辅助工具到合作者:AI推动研究效率革命
加州大学洛杉矶分校的杰出数学家陶哲轩指出:“2025年是AI真正开始对多种数学任务产生实用价值的一年。”早期使用者发现,AI不仅能解决谜题,更能帮助开辟全新的研究领域。数学家们开始借助AI发现并证明新结论,将以往需要数周或数月的工作压缩至一天内完成。
虽然尚未出现颠覆性的重大突破,但部分AI生成的成果已达到专业数学期刊的发表水平。在某些情况下,算法能够以最小的人力干预提出猜想、完成证明并验证过程;而在另一些场景中,与ChatGPT、Claude或Gemini等大语言模型的深入对话,能催生新颖的证明策略。
陶哲轩形象地比喻:“这人拿着铲子,那人拿着镐。我们可以一起挖通隧道。”多伦多大学的丹尼尔·利特认为,即使只是解决简单问题,AI也“正在改变数学的实践方式”。
研究模式发生根本性转变
传统上数学家一次专注于一个难题,而AI工具使得同时处理成千上万个问题并进行统计分析成为可能。陶哲轩预测,数学工作“很快将在外观和感觉上与传统方式截然不同”。尽管无人认为AI将完全取代数学家,但陶哲轩承认,“我们需要进行大量的制度性变革和文化调整”。
这些变革伴随着争议。高等研究院的阿克沙伊·文卡特什警告,随着AI成为强大工具,数学家可能失去对数学理解直接体验的机会。他与陶哲轩均认同AI的影响将十分深远,但文卡特什态度更为审慎:“我们的文化中有许多宝贵的东西值得尽力保留。”
企业界兴趣高涨,人才流动加速
部分数学家已离开学术界,加入OpenAI、谷歌等科技巨头,或投身Harmonic、Logical Intelligence、Axiom Math、Math Inc等专注于数学的AI初创公司。卡内基梅隆大学计算机辅助数学推理研究所所长杰里米·阿维加德指出:“企业界对数学AI兴趣高涨的一个关键原因是,人们认识到实现通用智能的关键,在于结合机器学习获得的洞察力与数学赋予的精确性。”
标志性事件:从“奥赛”到“首证”
2025年的奥赛表现可类比为AI“进入顶尖大学数学项目”,而2026年2月的“首证挑战”则标志着其“研究生毕业”。在该挑战中,参赛者有一周时间让AI模型解决数学各领域的10个研究级问题——这些问题均经过精心挑选,确保未出现在算法训练数据中。最终,模型在不同自主程度上成功解决了过半问题。利特在分析结果时写道:“这项技术很可能比计算机本身意义更为重大。”
实践案例:AI如何助力突破
优化理论专家、加州大学洛杉矶分校的柳艾伦在2025年奥赛结果后开始更深入使用大语言模型。他利用ChatGPT,在约12个小时内(分散在三天)取得了优化理论中一个悬置42年难题——关于涅斯捷罗夫方法收敛性的证明——的关键进展。他总结道:“这是ChatGPT真正加速发现的具体实例。”此后,柳艾伦已从UCLA休假,加入OpenAI担任技术团队成员。
在另一个案例中,2025年10月,布朗大学的一次代数组合学特别项目中,全球百余位数学家利用DeepMind的AlphaEvolve系统分析置换群中布鲁哈特区间的结构。AI没有直接回答他们预设的问题,却意外发现了这些区间在特定条件下呈现高维超立方体的隐藏优美结构,揭示了一个“五十年来一直存在于我们眼前却未被注意”的模式。悉尼大学的盖尔迪·威廉姆森感叹,使用大语言模型,“我可以在20分钟内完成一个两年前需要两周的实验”。
挑战与隐忧:噪音泛滥与教育困境
在欢呼进展的同时,数学家们也警惕潜在风险。利特指出“AI生成的废话正在大量污染公共知识空间”,圣母大学的乔尔·大卫·汉金斯则对“淹没我们期刊系统的低质内容海洋”表示忧虑。
为确保可靠性,数学界正寄望于“形式化证明”——将证明转化为计算机可理解的语言,并用程序验证所有逻辑步骤。陶哲轩强调:“未经验证的AI对于任何严肃应用都不可靠。”目前,通过AI进行“自动形式化”可能成为解决方案。
更大的挑战在于数学教育。即使最积极的支持者也担忧AI对学生训练的影响。陶哲轩指出,许多我们布置的习题AI可瞬间解决,这可能阻碍学生锻炼“思维肌肉”。汉金斯表示,由于大量作业由AI代写,他不得不放弃传统课后作业,全部转为课堂测验,“这对整个学术职业都是个问题”。一位顶尖研究型大学的数学家警告:“在加速严肃数学研究进展的同时,AI也可能阻碍我们培养更多数学研究人员。”
未来展望:工具而非替代,艺术与科学并重
尽管过去一年变化迅猛,但受访数学家均不认为数学学科会因此过时。陶哲轩比喻道:人类如同登山者,能一步步规划攀登珠峰的路线;而当前AI更像跳跃机器人,能跃上人类无法攀爬的6英尺高墙,却缺乏长期战略规划能力。某些数学“珠峰”(如π + e是否为有理数等数论难题)可能在未来数个世纪仍无法由AI攻克。
技术的演进速度仍超乎想象。利特预计:“20年内,我们几乎肯定会看到AI工具生成的数学在许多可衡量方面超越所有人类数学家。”但正如文卡特什所言,数学表述方式无穷无尽,我们的选择受人类价值观支配,且数学不仅是一门科学,也是一门艺术。如果AI使数学远离其艺术传统,即使每月证明的定理数量增加,这门学科也会变得贫乏。
数学界的最佳期望是,AI能帮助人类发现并证明那些原本将是永恒谜题的真理。2026年初在华盛顿特区举行的全球最大数学年会上,关于被AI淘汰的紧张玩笑层出不穷,尽管公开场合所有人都坚称AI将是人类的助手。长期致力于AI与数学融合的威廉姆森在向大会作特邀报告时指出,以无知和恐惧应对AI发展是错误的,但他理解这种恐惧的根源——数学是一门“人们倾注一生心血的手艺,其价值在未来确有被大幅削弱的可能性”。
当前,从谷歌、OpenAI等巨头到Axiom等初创公司,从学术界到爱好者社区,新成果正持续涌现。这场变革的规模让许多数学家感到不安,但也为探索数学“想象之外富饶世界”开启了前所未有的可能性。
中文翻译:
数学领域的AI革命已经到来
导言
转折点出现在2025年夏天。那年七月,多个人工智能模型在国际数学奥林匹克竞赛中解决了六道题目中的五道——这项年度赛事旨在挑战全球顶尖高中生。尽管数学家们深感震惊(几乎无人预料到这些程序能如此迅速地达到这般水平),但这一令人瞩目的成果未必意味着AI将在数学研究领域取得重大突破。毕竟,奥数题是已知答案的挑战性谜题,而非开放性问题。
然而,这些成果确实引起了人们的关注。曾因AI模型错误率过高而认为其无用的数学家们开始尝试接触它们。这些早期使用者惊讶地发现,这些模型不仅擅长解谜题,还能帮助开辟真正的新领域。很快,数学家们开始运用AI发现并证明新结论,用一天时间完成了以往需要数周甚至数月的工作。“2025年是AI真正开始对多种不同任务产生助益的一年,”加州大学洛杉矶分校杰出数学家陶哲轩表示。
虽然没有任何单一新成果堪称世界级突破,但其中部分成果已达到专业数学期刊发表水平。在某些情况下,算法能以最少的人工干预提出猜想、完成证明并验证过程;另一些情况中,与ChatGPT、Claude或Gemini等大型语言模型的深入对话则催生了新颖的证明策略。
“好比有人拿着铁锹,有人拿着镐头,我们合力就能开凿隧道,”陶哲轩比喻道。这过程充满“大量试错,需要不断尝试才能找到可行方案。”
尽管陶哲轩可能是AI数学应用最著名的倡导者,其他学者也持相同看法。多伦多大学的丹尼尔·利特指出,即使只是解决简单问题,AI“正在改变数学研究的方式”。
陶哲轩预测,很快“数学研究的面貌和体验将与传统方式截然不同。”以往数学家每次只能研究一个问题,“借助这些工具,你可以同时解决数千个问题并进行统计分析。”虽然受访者都认为AI不会取代数学家,但陶哲轩补充道:“我们需要进行大量制度性变革和文化调整。”
与其他正应对AI冲击的学科一样,数学领域的这些变革必将引发争议。高等研究院的阿卡什·文卡特什指出,随着AI模型成为强大的新工具,数学家可能面临丧失数学理解直接经验的风险。与陶哲轩同为菲尔兹奖得主的文卡特什认为,AI的影响将十分深远,但他持更谨慎态度:“我们的文化中有许多宝贵传统值得保留。”
部分数学家已离开学术界,加入OpenAI、谷歌等科技巨头,或投身Harmonic、Logical Intelligence、Axiom Math、Math Inc等专注数学的AI初创公司。卡内基梅隆大学数学计算机辅助推理研究所所长杰里米·阿维加德分析道:“企业界对数学AI如此感兴趣,是因为人们认识到通用智能的关键在于融合机器学习获得的洞见与数学提供的精确性。”
到2026年初,人们对AI能力的震惊已逐渐转化为惊叹。二月举办的“首证挑战赛”要求参赛者在一周内用AI模型解决数学各领域的10个研究级问题。这些问题由数学家精心设计,确保不太可能出现在算法训练数据中。在不同自主程度下,参赛模型成功解决了超过半数问题。如果说奥赛成绩标志着AI迈入顶尖大学数学课程,那么“首证”成果无疑意味着它们已从研究生院毕业。利特在分析结果的博客中写道:“这项技术的影响力很可能超越计算机本身。”
创造性进化
尽管2025年夏天标志着AI能力的拐点,但这并非凭空出现。谷歌DeepMind科学副总裁普什米特·科利透露,公司自2018年就开始尝试用AI解决数学问题。现任职于Axiom的弗朗索瓦·沙尔东早在2019年就首次尝试用机器学习解决数学问题。
但在早期阶段,这仍属小众领域。最初,沙尔东等少数研究者用AI解决已知答案的问题,只为验证新技术是否可行。到2024年,他们开始向前突破:寻找具有丰富分析数据的问题,利用AI构建具有可量化性质的数学对象——例如在网格上寻找不构成等腰三角形的最优点阵排列。
2025年1月,陶哲轩与布朗大学的哈维尔·戈麦斯-塞拉诺开始与DeepMind数学家亚当·瓦格纳、博格丹·格奥尔基耶夫合作开发名为AlphaEvolve的AI系统。该系统通过Gemini编写可能长达数百行的Python程序,再运用所谓遗传算法“进化”这些程序,以寻求数学问题的最优解。四位数学家在数月间每隔一两天就用AlphaEvolve处理新问题。
在此过程中,他们也学会了如何优化给AlphaEvolve的指令。关键发现是:模型似乎能从鼓励中受益。“当我们给大语言模型提供正向强化提示时,效果更好,”戈麦斯-塞拉诺举例说,“比如告诉它‘你能做到’——这似乎真有帮助。这现象很有趣,原因尚不明确。”
到五月下旬,团队已在数学多个领域的67个不同问题上测试AlphaEvolve。其中23个问题取得了微小改进,36个问题达到已知最佳水平,剩余少数问题未能突破现有成果。数学家们在2025年11月的论文《规模化数学探索与发现》中分享了这些发现。戈麦斯-塞拉诺指出,任何单项成果都可能由该领域专家花费数月时间获得,但“我们并非这些领域的专家,却能在两天内取得相当成果。”
正如陶哲轩所言,当前AI模型“非常擅长从海量问题中筛选容易攻克的目标。这种工作枯燥乏味且不受重视,人类不愿从事。”他提醒道,模型取得的成功是“未报告失败海洋中的零星岛屿”,但这些成功值得关注。
戈麦斯-塞拉诺估计自己现在约三分之二时间都在使用AI。“它正变得实用且可用,这是我们开展数学研究新方式的起点。”
错误身份
前些年,AI的额外能力似乎源于其重新发现埋藏在冷门文献中被长期遗忘的证明。加州大学洛杉矶分校的伊戈尔·帕克指出,ChatGPT目前在“查找正确参考文献、专业文献以及建立联系方面表现惊人,这是基于语义检索的谷歌学术无法做到的。”
苏黎世联邦理工学院的约翰内斯·施密特表示,2025年间情况发生了变化:“与大语言模型对话开始变得有用,并非因为它们能给出完整答案,而是它们成了优质的对话伙伴。”
他接触的大语言模型难免会犯大量错误,导致部分数学家完全否定它们。施密特说,许多研究者认定“既然它说的每句话都似是而非,我就不必与之交流。”但另一些人——他把自己归为此类——对“与这种胡言乱语的模型对话的痛苦”有更高容忍度。“他们认为即使不是每个想法都好,也能忽略糟糕的,采纳优秀的。”施密特特别指出,这些错误非常奇特:任何受过数学训练的人都不可能犯下如此多基础错误的同时,还能提出精妙、原创且正确的观点。
主要研究优化理论的加州大学洛杉矶分校数学家欧内斯特·柳在奥赛结果公布后也开始更关注大语言模型。当AlphaEvolve致力于优化特定量时,柳希望证明优化算法有效工作的条件。
2025年夏天,他注意到大语言模型的数学能力显著提升,开始用它们帮助准备讲义,主要是填补对特定证明细节的记忆空白。他说:“有时它能发现我推理中的错误,有时是重大错误,有时是细微问题,偶尔还能找到比我笔记更简洁的证明。”
他感觉到AI模型正在“展现生命迹象”,既怀疑又乐观。为亲自验证大语言模型的能力边界,他决定进行实验。十月某个夜晚,在幼子入睡后,他开始着手解决优化理论中一个开放性问题——这个问题他过去曾尝试多次。这次他使用了ChatGPT。“这不是最重要的问题,但我知道有十个人会非常期待解决方案。”
柳的问题最初由俄罗斯数学家尤里·涅斯捷罗夫于1983年提出。涅斯捷罗夫试图找到多变量函数的最小值,这些函数输出满足特定数学“良好”性质的单个值。若将输出视为等高线图,需要证明最终能收敛到最低点,而不会无止境震荡。
这类问题在应用数学中很常见,尤其在机器学习领域对训练神经网络至关重要。假设从图上某点出发,广泛使用的梯度下降法运用微积分基本工具确定下坡方向及该点坡度。每次沿最陡方向迈进一步,最终将抵达最低点。
但梯度下降法有时收敛极慢,因此数学家长期寻找能更快收敛的变体。涅斯捷罗夫开发了一种技术:每次下坡步长不仅取决于函数在该点的陡峭程度,还取决于已走过的路径。如果之前迈出较大步伐,后续将继续保持。
直觉上这显然能更快抵达谷底,但若速度过快导致超调呢?可能围绕真正最小值无限震荡而无法达到。涅斯捷罗夫未能证明其算法最终收敛到最优值,此后42年也无人成功。
柳询问ChatGPT时,“它不断给出错误证明,但在必然错误出现前的推导步骤中,包含有趣的部分结果和看似有用的正确片段。”随着大语言模型逐步推进,他会检查答案,保留正确部分,通过新提示反馈给模型。“我必须扮演验证者角色,”柳说,“借助ChatGPT,我感觉探索进度极快,远超独自工作速度。这让我坚持了下来。”
经过三天约12小时的工作,他得到了问题简化版的证明。又过几天,最终证明了涅斯捷罗夫方法的收敛性。柳评价道:“这不是最具创造性或最复杂的成果,但也绝不简单。虽然不会改变人生,但即使没有AI辅助,也足以发表在顶级优化期刊上。”
“这是ChatGPT真正加速发现的具体例证,”他认为大语言模型的能力将持续提升。“观察进步速度令人震惊。一年后,经过两三代模型迭代,我们将看到AI辅助下真正令人印象深刻的实质性发现。这一天必将到来。”
在分享涅斯捷罗夫方法论文数月后,柳从加州大学洛杉矶分校休假,加入OpenAI担任技术团队成员。
法庭秩序
2025年至2026年初,AI被用于证明日益抽象的结果。
2025年9月,全球百余名数学家齐聚布朗大学参加代数组合学特别项目。智利的尼古拉斯·利贝丁斯基和大卫·普拉萨、墨西哥的何塞·西门塔尔、澳大利亚的乔迪·威廉姆森、威斯康星州的乔丹·埃伦伯格均到场。
出于不同原因,他们都对计算称为d不变量的量感兴趣——该量出现在数学多个领域。要理解d不变量,不妨先观察这些领域中一个被深入研究的对象:置换群。这个对象描述了排列一组物品(如扑克牌)的不同方式。
最初很简单:若只有一张牌,则无法洗牌,因此置换群S1只有一个元素。S2有两个元素:两张牌有两种可能顺序。S3稍复杂:三张牌有六种不同排列方式。
不同排列方式可组织成由顶点和边构成的网络图。起始排列123置于底部,每条边(用箭头表示)代表两张牌的交换:
随着牌数n增加,Sn快速增长——S4之后的群几乎无法绘制完整图形。(S60的元素数量约等于可观测宇宙中的原子数。)
数学家希望理解这些图的结构,既作为独立研究对象,也作为分析其他问题的工具。
再看具有六个元素(置换)的置换群S3图。我们想探索这些置换之间的关系,方法之一是观察通过箭头从一种置换到另一种的所有路径。若能从第一个置换沿箭头到达第二个,则称该置换(使用布鲁哈特序定义)比另一个“小”。因此213小于321。
接着可以观察两个置换间的“布鲁哈特区间”——即沿图箭头位于它们之间的所有不同置换集合。例如,213与321之间的区间(下图红色部分)包含231和312。(若无法通过箭头从一置换到达另一置换,如213到132,则两者互不为大小关系,其区间未定义。)
粗略来说,与两个置换相关的d不变量是衡量其布鲁哈特区间底层结构复杂度的指标。这个量出现在许多看似无关的数学问题中,因此引起数学家极大兴趣。
在更大置换群中,很难概括两个给定置换间的布鲁哈特区间形态。“区间是极其复杂的结构,”利贝丁斯基说。他、西门塔尔、普拉萨、威廉姆森和埃伦伯格——各自出于不同原因希望找到给定置换群的最大可能d不变量——开始寻求AI帮助。
结果他们发现了完全不同的东西。
2025年10月,埃伦伯格请DeepMind的瓦格纳用AlphaEvolve(未公开提供)分析数十个置换群的布鲁哈特区间结构。程序运行整夜。“早晨我们发现这个程序确实在做有趣的事情,”威廉姆森回忆,“那天邮件往来非常频繁。”
大语言模型在计算过程中会自我对话。“我打算提出真正离奇的方案,针对这个问题的‘疯狂伊万’策略,”它沉思道——这个潜艇战术术语因汤姆·克兰西小说《猎杀红色十月》而广为人知,指潜艇突然转向以探测对手。
最终,AlphaEvolve为寻找大d不变量区间生成了约50行Python代码。当数学家试图理解代码含义时,埃伦伯格意识到:若牌数是2的幂(如16=2⁴),程序会大幅缩短至约五行。“可以非常明确地分析,”威廉姆森说,“它在执行非常美妙的操作。”
据2026年1月3日发布的预印本,AlphaEvolve发现这些特定置换群的布鲁哈特区间具有惊人特例结构。研究者分析区间时,发现它们形成了称为超立方体的高维立方体。“观察AlphaEvolve的思考过程,我极为震惊,”利贝迪斯基说,“如果是人类,那将是极具创造力的人。”
AlphaEvolve回答了他们未曾意识到的问题。“我们没有要求它寻找大超立方体,”埃伦伯格解释,“我们要求寻找其他东西,经过思考才意识到这是个巨大的超立方体——我们从未预料到它的存在。”
正如威廉姆森所言:“这个结构五十年来一直存在于我们眼前,只是我们未曾察觉。”
较早的机器学习方法也曾促成此类偶然数学发现——揭示无人想到寻找的模式。但威廉姆森指出,过去这需要“真正的工程努力……需要懂编程,花费大量时间研究神经网络训练细节。对于没有机器学习背景的数学家来说极为困难。”
有了大语言模型,“我突然能在20分钟内完成两年前需要两周的实验,”他说。虽然“大多数时候不成功”,但AI现在能以前所未有的方式“发现超乎想象丰富的世界”。
球面探索
尽管布鲁哈特区间看似纯组合对象,它们在代数几何这一特别抽象的数学领域也扮演重要角色——斯坦福大学数学家、美国数学学会现任主席拉维·瓦基尔专精于此。
代数几何研究由多项式方程(如x³+2x²y+xz=5)定义的形状,这些方程涉及变量整数次幂的和。方程次数是多项式的最高指数,本例中为3。
瓦基尔与同事——新南威尔士大学的鲍拉日·埃莱克、不列颠哥伦比亚大学的吉姆·布莱恩——致力于研究球面如何嵌入称为旗流形的特殊空间(旗流形也出现在布鲁哈特团队论文中)。每种嵌入方式(将球面每个点对应到旗流形内点的方法)可由多项式方程定义。
嵌入球面的方式很多。数学家将每种嵌入表示为独立高维空间中的点,通过分析不同次数多项式定义嵌入形成的空间进行研究。
随着次数增加,数学家希望理解这些空间如何变化。他们已知当次数趋于无穷大时,空间会近似于所有连续嵌入的空间(不仅限于多项式定义)。但这种近似何时发生?
瓦基尔和同事发现的例证表明,近似发生得非常迅速。“有些一致性原本预期在无穷远处才出现,但实际上早已发生,”他说。
于是他们与当时任职DeepMind的弗雷迪·曼纳斯和乔治·萨拉法蒂诺斯合作,使用基于谷歌Gemini构建的两个专用模块进行证明:公开可用的DeepThink,以及萨拉法蒂诺斯开发未公开的系统FullProof。他们从简单案例入手。“AI给出的证明非常优雅、正确、行文优美,我们可以逐行推敲,”瓦基尔说,“它揭示了当时不明显的结构,由此我们意识到整个论证及重要推广应该如何进行。”
随后团队回到AI模型,勾勒一般情况证明框架,要求其补充细节。据2026年1月12日预印本报告,他们成功了。“对我而言,真正重要的是第一步——DeepMind对简单案例的证明,”瓦基尔说,“论证的清晰度给了我们新思路。”但他思考:“这个思路归功于谁?是我们还是模型?”
无论功绩归属,瓦基尔相信“只要有足够时间,我本应能想出证明。”但他迟疑道:“我想是的。不过不确定。或许我会用笨拙方式完成。很可能没有AI辅助,这篇论文就不会诞生。”
最终他总结:“我们需要反复交流。AI模型将通过让我们完成以往无暇顾及的工作来助力数学研究。”
这或许是当前AI如何发挥作用的典型范例:一组专业数学家在大科技公司协助下,以远超常规的速度取得成果——他们能逐行验证,因此确信其正确性。
须知全貌
探讨AI对数学研究的影响时,不应只关注成功案例。利特警告:“AI生成的垃圾信息正在严重污染公共领域。”圣母大学的乔尔·大卫·哈姆金斯表示“对淹没期刊系统的低质内容海洋感到绝望”。
数学家将希望寄托于形式化证明,视其为穿越这片混沌海洋的航标。他们将证明转化为计算机可理解的语言,再用程序验证所有逻辑是否成立。“未经验证的AI在严肃应用中可靠性太低,”陶哲轩强调。
目前,以这种方式形式化数学证明是耗时精细的过程,需要大量数学知识且颇具技艺性。因此数学家越来越多转向“自动形式化”——AI模型将数学陈述转化为形式逻辑语句并进行证明。“有史以来第一次,”陶哲轩说,“我们确实感觉能通过AI形式化大部分数学。”
许多数学家认为AI数学能力提升带来的另一主要挑战是如何影响学生学习方式。即使最热心的AI支持者也感到担忧。弗吉尼亚大学教授、近期休假担任Axiom“创始数学家”的小野健告诉我:“我对AI助力数学研究持乐观态度,但对AI在未来工作和各层次培训中的角色深感忧虑。”
陶哲轩指出:“我们布置的许多习题,AI能瞬间解决。这可能阻碍学生锻炼思维能力。”
哈姆金斯深有同感:“我曾布置大量作业,现在无法继续了,”他说,学生提交的作业有相当部分由AI完成。“我不想阅读这些,也不愿充当AI警察。”尽管作业具有重要教学价值,但现在“所有评估都必须在课堂测验和作业中进行。这是整个学术行业面临的难题。”
另一位顶尖研究型大学的数学家告诉我:“在加速严肃数学研究者进步的同时,AI存在阻碍培养更多数学研究者的严重风险。”
尽管过去一年变化迅速,但所有受访数学家都不担心这门学科会过时。陶哲轩用登山比喻:数学家试图攀登“拥有众多高峰和丘陵的巨大山脉”。人类只能逐步攀登,但能规划通往珠峰之巅的路线;而当前AI如同跳跃机器人,有时能“跑酷式跃上人类无法攀爬的6英尺高墙”,但缺乏长期战略规划能力。陶哲轩设想,6英尺可能变成10英尺甚至100英尺,但“这些小跳跃机器人离数学的珠穆朗玛峰还差得远。”
帕克认为某些“珠峰”——例如关于π+e等求和能否表示为分数的数论重大问题——将悬置数百年。“我很怀疑AI能在这些
英文来源:
The AI Revolution in Math Has Arrived
Introduction
The tipping point came in the summer of 2025. That July, several artificial intelligence models solved five out of six problems at the International Mathematical Olympiad, an annual challenge for some of the world’s best high school students. But while mathematicians were shocked — few had expected the programs to get that good that quickly — the impressive results didn’t necessarily mean that AI would make important strides in research math. After all, Olympiad problems are challenging puzzles with known answers, not open questions.
Nevertheless, the results made people pay attention. Mathematicians who had dismissed AI models as too error-prone to be useful started playing around with them. Those early adopters found, to their surprise, not only that the models were good at puzzles, but that they could help break genuinely new ground. Soon, mathematicians were using AI to discover and prove new results, accomplishing in a day what would have once taken them weeks or months. “2025 was the year when AI really started being useful for many different tasks,” said Terence Tao, a prominent mathematician at the University of California, Los Angeles.
While no single new result is a world-beating breakthrough, some of them are on par with discoveries published in professional mathematical journals. In some cases, algorithms formulate a conjecture, prove it, and verify the proof with minimal human intervention. In others, extensive chats with large language models such as ChatGPT, Claude, or Gemini lead to novel proof strategies.
“This guy’s got a shovel. This guy’s got a pickax. Together we can bore a tunnel,” Tao said. There’s “a lot of throwing things at the wall to see what sticks.”
Though Tao is perhaps the most prominent exponent of AI’s utility in mathematics, others agree.
Even by solving easy problems, said Daniel Litt of the University of Toronto, AI “is changing how mathematics is done.”
Soon, “it will look and feel altogether different from the way mathematics was traditionally done,” Tao said. Where before mathematicians studied one problem at a time, “with these tools you can solve thousands of problems at once and start doing statistical studies.” Though nobody I spoke with thinks AI will replace mathematicians, Tao added that “there are a lot of institutional changes, cultural changes, we will have to make.”
Those changes will be contested, in math as in other academic disciplines wrestling with AI’s impact. As AI models become a powerful new tool, they risk causing mathematicians to lose direct experience with mathematical understanding, said Akshay Venkatesh of the Institute for Advanced Study. Like Tao, Venkatesh is a recipient of the Fields Medal, math’s most prestigious prize. Both agree that AI’s impact will be significant, but Venkatesh is more cautious about it: “There are valuable things in our culture which we should try to keep,” he said.
Some mathematicians are now leaving academia to work at big tech firms, like OpenAI and Google, or to join math-focused AI startups such as Harmonic, Logical Intelligence, Axiom Math, and Math Inc. “One reason there is so much interest in AI for mathematics in the corporate world is that people are recognizing that the key to general intelligence is combining the insights you get from machine learning and the precision you get from mathematics,” said Jeremy Avigad, the director of the Institute for Computer-Aided Reasoning in Mathematics at Carnegie Mellon University.
By the start of 2026, shock at the power of AI had turned into something more like wonder. A February challenge called First Proof gave entrants a week to have their AI models solve 10 research-level questions in various areas of math. Mathematicians had chosen the questions so that they were unlikely to have appeared in the algorithms’ training data. With varying levels of autonomy, the models succeeded in solving over half the problems. If the Olympiad results represented the moment AI entered an ambitious college math program, the First Proof results were arguably the moment they finished graduate school. In a blog post analyzing the results, Litt wrote: “It’s very likely that this technology is bigger than the computer.”
Creative Evolution
Though the summer of 2025 marked an inflection point in the capabilities of AI, it didn’t come out of nowhere. Pushmeet Kohli, Google DeepMind’s vice president of science, said DeepMind has been trying to solve math problems with AI since 2018. François Charton, now at Axiom, first started trying to use machine learning to solve mathematical problems back in 2019.
But in those early years, it was a niche area. At first, Charton and a handful of others used AI to solve problems whose solutions were already known, just to see if they could get the new techniques to work. By 2024, they were beginning to forge ahead. They looked for problems where there was a rich set of data to analyze, and then used AI to construct mathematical objects with quantifiable properties — like optimal arrangements of points that could fit on a grid without forming an isosceles triangle.
In January 2025, Tao and Javier Gómez-Serrano of Brown University began working with two mathematicians at DeepMind, Adam Wagner and Bogdan Georgiev, on an AI system called AlphaEvolve. AlphaEvolve works by using Gemini to write programs in Python code that might be hundreds of lines long. It then “evolves” these programs using so-called genetic algorithms to attempt to find optimal solutions to math problems. The four mathematicians used AlphaEvolve on a new problem every day or two for a few months.
As they did so, they also learned how to improve the prompts they gave AlphaEvolve. One key takeaway: The model seemed to benefit from encouragement. It worked better “when we were prompting with some positive reinforcement to the LLM,” Gómez-Serrano said. “Like saying ‘You can do this’ — this seemed to help. This is interesting. We don’t know why.”
By late May, the team had tried AlphaEvolve on 67 different problems in several areas of mathematics. On 23 of them, AlphaEvolve improved, in a small way, on the best known solutions. On 36 of the 67, it did as well as what had already been done, and on the remaining handful, it couldn’t match the best known result. The mathematicians shared their findings in a November 2025 paper, “Mathematical Exploration and Discovery at Scale.” Gómez-Serrano noted that any one of their results might have been obtained by an expert in a given area who worked at it for a few months. But without being experts in many of these fields, “we were able to obtain comparable results in the span of a day or two,” he said.
As Tao put it, current AI models are “very good at scouring big lists of problems for low-hanging fruit. It’s tedious and thankless and not something humans want to do.” He cautioned that models are achieving “scattered successes among a big sea of unreported failures.” But the successes are notable.
Gómez-Serrano estimates that he now spends about two-thirds of his time using AI. It is, he said, “getting to the point where it is useful and usable. This is the beginning of the new way we will do mathematics.”
Mistaken Identities
In previous years, AI’s extra power seemed to come from its ability to resurface long-forgotten proofs buried in obscure references. Igor Pak of UCLA noted that ChatGPT is currently “fantastic in finding the right references, right literature, finding connections that Google Scholar — which doesn’t work semantically — can’t.”
Then, over the course of 2025, said Johannes Schmitt of the Swiss Federal Institute of Technology Zurich, something shifted. “It started becoming useful to talk to LLMs, not because they would give you the full answer,” he said, but because “they became good conversation partners.”
Aitor Iribar-López
The LLMs he spoke with inevitably made lots of mistakes, leading some mathematicians to dismiss them outright. Many researchers, he said, decide that if “everything it says is kind of wrong, I will just not talk to it.” But others — he puts himself in this camp — have a higher tolerance for “the pain of talking to this bullshitting model. They say, I can still get something out of this conversation; even if not every idea is good, I can ignore the bad ones and take the good ones.” And the mistakes, Schmitt noted, are weird ones: There is virtually no way that a person with any training in mathematics would make such a plethora of basic errors while also succeeding in coming up with subtle, original, and correct ideas.
UCLA’s Ernest Ryu, who works largely in a branch of applied math called optimization theory, also started paying more attention to LLMs after the Olympiad results. Whereas AlphaEvolve was trying to optimize particular quantities, Ryu wanted to prove things about the conditions under which optimization algorithms work.
In the summer of 2025, he noticed that LLMs’ mathematical capabilities had dramatically improved. He started using them to help prepare lecture notes, mostly to fill in gaps in his memory of the details of a particular proof. At times, he said, “it would find an error in my reasoning, sometimes major, sometimes minor. Sometimes it would find a simpler proof than I had in my notes.”
He had a sense that AI models were “exhibiting signs of life.” He remembers feeling skeptical but optimistic. To make up his own mind about what LLMs can and can’t do, he decided to try an experiment. One evening in October, after his young son went to sleep, he set out to solve an open problem in optimization theory that he’d attempted a few times in the past. This time he used ChatGPT. “It’s not the most important problem, but I know 10 people who would very much appreciate a solution,” he said.
Ryu’s problem was first proposed in 1983 by a Russian mathematician named Yurii Nesterov. Nesterov was trying to find the minimum of functions that take many variables as inputs and output a single value that behaves “nicely” in a particular mathematical way. If you think of the outputs as forming an elevation map, you want to prove that you eventually converge on the lowest point, and don’t end up endlessly bouncing around in search of it.
This sort of problem arises quite often in applied math, especially in machine learning, where it is central to training neural networks. Say you start somewhere on your map. A widely used technique known as gradient descent uses basic tools from calculus to figure out which way is downhill and how steep the hill is at the point where you’re standing. Take a step downward in the steepest direction each time, and you will eventually get to the very bottom.
But although gradient descent will get you to the right answer, it sometimes gets you there very slowly. So mathematicians have long looked for variations that converge to the right answer more quickly. Nesterov developed one technique for doing so, in which the size of each step downhill depends not only on how steep the function is at a given point, but also on the path you’ve already taken to get there. If you’ve been taking bigger steps in the past, you’ll continue to do so.
It seems intuitively obvious that this will get you to the bottom of the hill more quickly. But what if you go too fast and overshoot? You might risk endlessly oscillating around the true minimum, and never attaining it. Nesterov couldn’t prove that his algorithm would eventually converge to the optimal value. And for 42 years, no one else could either.
When Ryu asked ChatGPT, “it kept giving me incorrect proofs,” he said. “But the lead-up to the inevitable error had interesting steps, correct partial results that seemed potentially useful.” As the LLM made incremental progress, he would check its answers, keep the correct parts, and feed them back into the model with a new prompt. “I had to play the role of the verifier,” Ryu said. “With ChatGPT, I felt like I was covering a lot of ground very rapidly, much more quickly than I could do on my own. That’s what kept me going.”
Within about 12 hours of work spread over three days, he had arrived at a proof of a simplified version of the problem. After a few more days, he finally proved that Nesterov’s method converges. It wasn’t, Ryu said, “the most creative thing, not the most complicated. But certainly it wasn’t that easy.” While it’s not a life-changing result, he added, “it’s something that could be published in a top optimization journal without the AI component. It’s a good result.”
“It’s a concrete instance where the use of ChatGPT really accelerated the discovery,” he said. And he thinks the capabilities of LLMs are only going to continue to improve. “If you look at the velocity of the improvement, that’s staggering. If we continue one year later, two or three model releases down the line, we are going to get really, really impressive, substantial discoveries assisted by AI. It’s going to come.”
A few months after sharing his paper on Nesterov’s method, Ryu took a leave of absence from UCLA to take a job at OpenAI, where he is now a member of the technical staff.
Order in the Court
Over the course of 2025 and into the first months of 2026, AI has been used to prove increasingly abstract results.
In September 2025, more than 100 mathematicians from around the world gathered at Brown University for a special program on algebraic combinatorics. Nicolás Libedinsky and David Plaza had come from Chile, José Simental from Mexico, Geordie Williamson from Australia, and Jordan Ellenberg from Wisconsin.
All of them were interested, for different reasons, in computing a quantity called the d-invariant, which appears in many areas of math. To understand what the d-invariant is, it helps to first look at a well-studied object in one of these areas, called the permutation group. This object is a way of describing the different ways one can shuffle a set of items, like cards in a deck.
It starts out simple. If you have a deck with just one card, you can’t shuffle it. So the permutation group S1 has one element. S2 has two elements: If you have two cards, they can appear in two possible orders. S3 becomes a little more complicated; there are six different ways to order a deck of three cards.
Mark Belan/Quanta Magazine
The different ways of ordering the cards can be arranged into a network of vertices and edges called a graph. The starting arrangement, 123, goes at the bottom. Each edge of the graph (drawn as an arrow) represents a swap of two cards:
As the number of cards n gets larger, Sn grows very quickly — making this graph next to impossible to draw for groups after S4. (S60 has about as many elements as there are atoms in the observable universe.)
Mathematicians want to understand the structure of these graphs both as objects in and of themselves and as tools for analyzing other things.
Consider again the graph of the permutation group S3, which has six elements, or permutations. We want to explore the relationship between these permutations. One way to do so is to look at all the ways to get from one permutation to another by following the arrows. A given permutation is “smaller” than another one (using a definition of size called the Bruhat order) if it is possible to travel along the arrows from the first permutation to the second. So 213 is smaller than 321.
We can then look at the “Bruhat interval” between the two permutations — the set of all the different permutations that lie between them when you follow the graph’s arrows. For example, the interval between 213 and 321 (seen below in red) includes 231 and 312. (If you can’t get from one permutation to another by following the arrows, like from 213 to 132, then neither one is smaller than the other, and the interval between them is not defined.)
The d-invariant associated with two permutations is, loosely speaking, a measure of the complexity of their Bruhat interval’s underlying structure. The same quantity appears in a number of mathematical questions that otherwise seem unrelated, making it of great interest to mathematicians.
In bigger permutation groups, it’s hard to say in any general way what the Bruhat interval between two given permutations looks like. “Intervals are just super complicated things,” Libedinsky said. He, Simental, Plaza, Williamson, and Ellenberg — all hoping for different reasons to find the biggest possible d-invariant for a given permutation group — set out to get AI to help them.
They ended up finding something else entirely.
In October 2025, Ellenberg asked Wagner at DeepMind to use AlphaEvolve (which is not publicly available) to analyze the structures of the Bruhat intervals of dozens of permutation groups. It ran overnight. “In the morning, we were like, this program is really doing something interesting,” Williamson said. “And then I remember a flurry of emails back and forth that day.”
The LLM talked to itself while performing calculations. “I’m about to propose something truly outlandish, a ‘Crazy Ivan’ maneuver for this problem,” it mused, referring to a sharp turn submarines sometimes take to detect their adversaries, popularized in the Tom Clancy novel The Hunt for Red October.
Ultimately, AlphaEvolve generated something like 50 lines of Python code in its attempts to find intervals with large d-invariants. As the mathematicians tried to figure out what this code was doing, Ellenberg realized that if the number of cards in the deck was a power of 2 (like 16, which is 24), then the program became much shorter — about five lines long. “You can analyze it very explicitly,” Williamson said. “It’s doing something very beautiful.”
As they related in a preprint on January 3, 2026, AlphaEvolve had found that the Bruhat intervals in these particular permutation groups had a surprisingly special structure. When the researchers studied the intervals, they found that they formed higher-dimensional cubes called hypercubes. “If you look at what AlphaEvolve was thinking, I was super surprised,” Libedinsky said. “If it was a human, it would be an extremely creative human.”
AlphaEvolve had answered a question they didn’t know they had. “We didn’t ask AlphaEvolve to find big hypercubes,” Ellenberg said. “We asked it to find something else, and we thought about it and realized it was a gigantic hypercube which we had not anticipated was there.”
As Williamson put it, “It’s a structure that’s been sitting there for 50 years in front of our nose. We just hadn’t noticed it.”
Older machine learning methods had previously enabled such serendipitous mathematical discoveries, too — uncovering patterns no one had thought to look for. But in the past, Williamson said, it was a “real engineering effort. … You need to know how to code, spend a lot of time looking at details of neural network training. It was basically extremely difficult for a mathematician with no significant machine learning background to do this.”
With LLMs, “I can suddenly do an experiment in 20 minutes that two years ago would have taken me two weeks,” he said. Though “most of the time it doesn’t work,” AI can now be used like never before “to discover the world that has riches beyond our imagination.”
Around Sphere
Though Bruhat intervals seem like purely combinatorial objects, they also play an important role in a particularly abstract area of math called algebraic geometry, which Ravi Vakil, a mathematician at Stanford University and the current president of the American Mathematical Society, specializes in.
Algebraic geometry is the study of shapes defined by polynomial equations like x3 + 2x2y + xz = 5, which involve a sum of variables raised to whole-number exponents. The degree of the equation is the highest exponent the polynomial has, in this case 3.
Rod Searcey
Vakil and his colleagues, Balázs Elek of the University of New South Wales and Jim Bryan of the University of British Columbia, were interested in studying how spheres can be embedded in special spaces called flag varieties. (Flag varieties appear in the Bruhat team’s paper as well.) Each embedding — a way of associating each point on the sphere to a point within the flag variety — can be defined by a polynomial equation.
There are lots of ways to embed the sphere. Mathematicians represent each embedding as its own point in a separate high-dimensional space. They then study the embeddings defined by polynomials of different degrees by analyzing the different spaces they form.
As the degree increases, mathematicians want to understand how these spaces change. They knew that when the degree gets arbitrarily large — as it goes to infinity — the space resembles the space of all continuous embeddings, not just those defined by polynomials. But when does this resemblance come to pass?
Vakil and his colleagues had found examples that suggested, to their surprise, that it happens very quickly. “There was some consistency that was not supposed to happen until you reached infinity, and it already happened,” he said.
So, together with Freddie Manners and George Salafatinos, who were then working for DeepMind, they set out to prove it using two specialized modules built atop Google Gemini: DeepThink, which is publicly available, and a system developed by Salafatinos, called FullProof, which is not. They started with a simpler case. “The proof it gave was very elegant, correct, beautifully written. We could follow it line by line,” Vakil said. “It made clear a structure that was not obvious at the time. From that, we realized how the whole argument and significant generalization should potentially work.”
Vakil and his colleagues then went back to the AI model, sketching a proof of the general case and asking it to fill in the details. As they reported in a preprint on January 12, 2026, it succeeded. “To me,” Vakil said, “the real thing was the first thing” — DeepMind’s proof of the simpler case. “The clarity of the argument gave us a new idea.” But he wonders: “Who is that idea due to? Is it due to us? Is it due to the model?”
However one ascribes credit, Vakil said, “I believe I would have come up with the proof given enough time.”
But then he hesitated. “I think so. I’m not sure. I don’t know. Maybe I would have done it in a clunky way. Very possibly, the paper wouldn’t have happened without the assistance.”
And finally: “We needed to go back and forth. AI models will help us do mathematics by letting us do things we did not have time to do before.”
This is perhaps a paradigmatic example of how AI can be useful today. A group of expert mathematicians, with help from a big tech company, figures something out faster than they likely would have otherwise — and they are sure it is correct, because they can check it line by line.
All Ye Need To Know
In asking what AI is doing to mathematical research, we shouldn’t only look at the successes. Litt cautioned that “there is a lot of pollution of the commons by AI-generated nonsense.” Joel David Hamkins of the University of Notre Dame said he is “despairing of this ocean of slop that is overwhelming our journal systems.”
Mathematicians are pinning their hopes on formal proof as the way to navigate this ocean of slop. They’re converting proofs into a language that computers can understand, and then using computer programs to verify that all the logic in the proof pans out. “AI without validation is too unreliable to be of use in any serious application,” Tao said.
Currently, formalizing mathematical proofs in this way is a time-consuming, intricate process that itself takes substantial mathematical knowledge and is a bit of a craft. And so mathematicians are increasingly turning to “autoformalization,” in which AI models translate mathematical statements into formal, logical ones and then prove them. “For the first time,” Tao said, “it does feel like we could formalize a significant fraction of mathematics through AI.”
The other major challenge that many mathematicians see as a consequence of AI’s increasing ability to do math is how it will affect the way students learn. Even the most ardent proponents of AI are concerned. Ken Ono, a professor at the University of Virginia who recently took a leave of absence to become the “founding mathematician” at Axiom, told me he sees “a rosy picture about how AI can help mathematics research, but I am deeply concerned about the role of AI in the future of work and training at all levels.”
Tao said, “Many of the problems we assign, AI can solve instantly. This can discourage a lot of the students from building up their mental muscles.”
Hamkins agreed. “I used to assign quite a bit of homework. I just can’t do it anymore,” he said; a substantial fraction of the assignments students turn in are written by AI. “I don’t want to read it. I don’t want to be the AI cop.” Though homework was highly pedagogically valuable, now “everything has to be in-class quizzes and work. It’s a problem for the entire academic profession.”
As another mathematician at a leading research university told me, “There is a serious risk that, in parallel with accelerating the progress of serious mathematical researchers, AI prevents us from making more mathematical researchers.”
Even with the rapid changes of the past year, none of the mathematicians I spoke to in reporting this piece fear that the subject will become obsolete. Tao gave the analogy of mathematicians trying to climb “a big mountain range with lots of tall mountains and lots of foothills.” Humans can only climb one step at a time, but they can plan a route to the top of a mountain like Everest. Meanwhile, Tao said, current AIs are like jumping robots. They can sometimes “parkour their way to the top of a 6-foot wall” that a human couldn’t climb. But they can’t do long-term strategic planning. Those 6 feet might become 10 feet, or 100, Tao imagines, but “the little jumping robots are nowhere near the Mount Everests of math.”
Pak thinks that certain Everests — such as a major problem in number theory about whether sums like π + e can be written as fractions — will remain unresolved for centuries. “I’m really doubtful AI can make any dents there at all,” he said. “This is not something that AI would be able to do. But I’m quite positive that if humanity survives, eventually we will figure it out.”
Of course, a lot depends on how the capabilities of AI algorithms change and improve in coming years. Even the most astute and careful observers can’t say for sure how the models will develop. Few see signs of stagnation. “Things are moving very fast. I don’t see any sign they are slowing down,” Litt said. The first few months of 2026 have already seen a steady stream of new results from big companies like Google and OpenAI and small ones like Axiom, as well as from academics and even hobbyists.
“My expectation is surely in 20 years we are going to see AI tools generating mathematics that in many measurable ways are better than every human mathematician,” Litt said. “I would be shocked if that doesn’t happen.”
But as Venkatesh told me, “In the end, there are infinitely many ways to formulate any piece of math.” The choices we make, he said, are governed by human values and shaped by the fact that mathematics is not only a science but also an art.
That balance between science and art is in large measure what gives math its beauty — one of the “valuable things in our culture” that Venkatesh wants to retain. If AI pushes mathematics away from its artistic heritage, the discipline will be diminished, even if more theorems are proved each month. After all, no poet talks seriously about doing statistical regression on sonnets to find the optimal ones.
The best hope for AI is that it will help mathematicians find and prove things that would otherwise have remained mysteries. Most mathematicians agree that that’s what computers have done for the past 80 years. But the scale of the change now underway has left many feeling unsettled.
The biggest annual mathematics conference in the world is held every year in early January. In 2026, in Washington, D.C., nervous jokes about being made obsolete by AI were plentiful, even if, on the record, everyone insisted that AI will be a helpmate to human mathematicians. Williamson — who has been working with AI for years and is very excited by it — was chosen to deliver a series of prestigious lectures about AI and math to the entire conference. He told the audience that it’s a mistake to react to AI developments with ignorance and fear.
But he said he understands where the fear comes from. He sees mathematics as a “craft that people have spent their lives — dedicated their lives — towards. There is some possibility that its value may be greatly diminished in the future.”