人工智能工业革命

内容来源:https://nav.al/industrial
内容总结:
AI工业革命:从写代码到建工厂,人类正在成为“验证者”
近日,一场由Naval播客主持的深度对话,汇聚了三位前沿科技创始人——Vercel的Guillermo Rauch、Boom Supersonic的Blake Scholl以及Science公司的Max Hodak。他们围绕AI如何重塑软件开发、硬件工程、医疗监管乃至人类创造力展开了探讨,核心观点震撼:纯软件时代正在落幕,人类正从“执行者”转变为“验证者”。
一、软件开发:从“写代码”到“建工厂”
Guillermo Rauch提出一个颠覆性概念:软件工厂。过去衡量工程师的标准是“能产出多少代码”,而现在,工程师的核心价值变成了“能否建造一个能自动产出多种成果的工厂”。Naval补充说,10倍工程师已成过去,100倍甚至1000倍工程师正成为现实——AI杠杆让这种差距更加明显。
现场讨论中,Naval分享了自己的“浪费Token,节省时间”策略:不再纠结于提示词技巧,而是直接让多个模型反复处理同一问题,用算力换时间。“模型会越来越聪明,与其学习如何使用它,不如让它适应你。”
二、硬件工程:软件思维颠覆传统流程
Blake Scholl以Boom Supersonic的飞机引擎设计为例,展示AI如何重塑硬件开发:过去,设计一个涡轮叶片需要工程师手动在Excel中完成冷热转换计算,一天只能做一个叶片;现在,软件工程师搭建架构,硬件工程师“氛围编码”,两名工程师就能完成整个喷气发动机的设计。
Max Hodak则预测,AI将很快能直接生成STEP文件和PCB布局图,届时机械和电气工程也将迎来革命性变化。
三、监管困境:AI正在打破“审批枷锁”
Blake指出,AI让飞机认证流程从数月缩短至分钟级——过去改动设计需要重写200页合规文件,现在只需几分钟。但Naval提出警示:当监管机构也开始用AI批量生成文件,就会演变为“AI对AI”的军备竞赛。
在医疗领域,Max直言中国药监局(CFDA)正在超越美国FDA:中国已批准首个植入式脑机接口(BCI),而美国因监管成本过高,创新被严重抑制。Naval尖锐指出:“医疗行业本质上是一个小共产主义社会嵌入在大资本主义社会中——没有真正的市场机制。”
四、自主化企业:员工的工作是训练AI
Blake分享了一个实验:让全公司员工(包括前台接待员)用AI开发自己认为最重要的工具,一周后,几乎没有无用项目,多个成果直接改变了公司方向。结论是:未来员工的核心任务不再是亲自干活,而是训练替他们干活的AI。
Guillermo透露,Vercel的基础设施已高度自主化——AI代理能自动发现异常、调查原因并推荐解决方案,甚至用1.4万美元的Token成本完成了过去需要数月、整支安全团队才能完成的任务。
五、人类独特价值:创造力、品味与意图
当讨论到AI能否真正“创新”时,Naval坚持认为:人类的独特价值在于“意图”——艺术是“我想让你感受到某种情感”,而这种意图是AI无法复制的。Max则认为,未来的回报将从70%智力+30%行动力,转变为70%行动力+30%智力,最终人类的核心竞争力将是“品味与判断”。
Naval用“霍默·辛普森的车”做比喻:如果让用户自己设计车,结果就是“装了雨伞、手电筒和小丑喇叭的四不像”——判断什么该做、什么不该做,这本身就是稀缺能力。
六、未来图景:小团队大爆发
Guillermo预测,未来将是“大量极小型团队”的天下。“过去造一个引擎需要一千人,现在两个人就够了。但这不是让998人失业,而是意味着我们可以同时制造很多种不同的引擎。”AI降低了门槛,让通才拥有了前所未有的舞台。
Naval总结道:“最好的投资就是让自己精通这些工具,始终知道它们能做什么、不能做什么——而这是一个动态变化的目标。”
中文翻译:
人工智能工业革命
本期节目末尾附有20分钟全新内容。嘉宾:吉列尔莫·劳赫(Vercel公司)、布莱克·肖尔(Boom Supersonic公司)、马克斯·霍达克(Science公司)。
第一部分:浪费Token,节省时间
纳维:欢迎收听《纳瓦尔播客》,您获取新知识的权威来源。今天我们尝试点新花样。我邀请了三位前沿领域的创始人——三位帅小伙,外加第四位,纳瓦尔,也是位帅哥。
我来逐一介绍。
吉列尔莫·劳赫,人称"G"。他正在将Vercel打造成为面向智能体世界以及未来一切的AI云平台。
布莱克·肖尔。他正在创办Boom Supersonic公司,在自己的工厂里制造超音速飞机,还有喷气发动机。
还有来自Science公司的马克斯·霍达克。他正在构建一种生物混合脑机接口,在硅基芯片上培养活体神经元,用于恢复视觉等感官功能——但最终目标是探索大脑的新区域和新感官。
这三位都不是用现成零件拼凑产品。他们都在建造自己的工厂。而我们关心的,与其说是他们具体在造什么,不如说是他们从"如何建造"的过程中学到了什么。
他们正在创造什么新知识?
他们的"阿尔法"(核心优势)是什么?
他们发现了哪些其他创始人可以借鉴的原则?
他们此刻正在试图解决什么问题?
纳瓦尔,在我切入吉列尔莫的问题之前,你有什么想说的吗?
纳瓦尔:嗯,咱们就好好享受这场对话吧。
纳维:你们几位随时可以加入讨论。
AI软件工厂
吉列尔莫·劳赫:我记不清原话是怎么说的了,但我一直深受"软件工厂"这个理念的影响。过去,工程师的工作就是上班,直接产出成果,公司里衡量人的标准就是"甲产出成果B的能力有多强?"
而现在发生的变化是,我评判你作为工程师的标准变成了:"你是否在建造一个能够成倍产出成果B到Z的工厂?"
这是一个相当重大的转变。我们过去相信——而且这在当时还颇有争议——存在10倍效率的工程师。
现在显然有100倍甚至1000倍效率的工程师了,而世界还没有完全适应这一点。
纳瓦尔:我过去在推特上说存在10倍效率的工程师,常常被人炮轰,因为这与"人人平等"的平等主义哲学观念相悖。但现实是,当你工作在思想领域、知识和虚拟数字领域时,差距甚至不是10倍——而是100倍或1000倍,而且历来如此。
中本聪。Notch(《我的世界》创始人)。JavaScript的发明者,那些Brendan Eich们。John Carmack。这些人都是1000倍效率的程序员。
更不用说——如果你选择了正确的事情去做,而不是错误的事情,那差距就是无穷大。这不一定是成为更好的程序员,而是首先对做什么事有更好的判断力。
现在,由于AI的杠杆作用,这个观点争议性显然没那么大了。
吉列尔莫:有争议的是Token排行榜。人们仍然有点困惑——"嗯,我有一堆100倍效率的工程师。看看我为此支付的所有这些Token。"
我很好奇你们是否也看到了同样的情况——你们如何衡量ROI?
布莱克·肖尔:这就像过去衡量代码行数一样。Token消耗量和代码行数感觉上都不是直接的衡量范式。
马克斯·霍达克:我的观察是,Claude或ChatGPT基本上和你在某个领域的水平相当。如果你是一个非常能干的开发者,这些东西会非常强大。如果你是个初级开发者,你会发现它更像一个初级开发者。你偶尔给它们的反馈似乎极其重要——这些微小的更新似乎完全决定了你从它们那里获得的性能表现。
吉列尔莫:我现在提供一种新型支持——你来找我,说你没从模型那里得到好的输出,我就告诉你该怎么提示模型。重新提示的质量极其重要。
马克斯:澄清一下,我认为随着时间的推移,这会变得不那么重要。随着模型变得更加智能,你投入得更少,就能获得更多。但在这个阶段,它似乎确实反映了用户带来的判断力。
浪费Token,节省时间
纳瓦尔:我有点抗拒学习所有的技巧和窍门。"用Ralph Wiggum。用OpenClaw。用Hermes。用这个提示引擎。用那个脚手架。插入这个组件。总是使用计划模式。"
我完全忽略了所有这些。我假设模型自我改进的速度,会快过我摸索如何使用它的速度。它会弄清楚如何利用我,而不是我去弄清楚如何使用它。所以我对它们非常粗放。
我会对它们感到沮丧,并且发现自己输入的信息越来越少,做的工作也越来越少,因为我就是假设我能靠着蛮力解决问题。我会反复把同一个问题扔给Codex、Claude和Gemini,就是浪费Token来节省时间。不管这些模型看起来多贵,它们仍然比人类便宜得多。所以我会说——浪费Token,节省时间。不要把Token看作是输入或输出。只关注你的时间,以及最终的输出。
即使它们写的是低质量代码——我知道在很多情况下确实如此——但当时间到了,我想把它发布到生产环境时,我只会投入更多的Token。"过一遍,看看,重写一遍。"
它们每一代都会变得更好。我看不到这有什么必然的终点。只要我们有可验证的领域和已解决的问题,它们就能解决这些问题。现在,在未解决问题的领域——也许你是陶哲轩,处于创造力的前沿——你需要与模型非常协作、谨慎和紧密地工作。但在软件工程方面,我还没到那个水平。
模型指导人类
纳瓦尔:吉列尔莫,你可能是团队中最极端的软件工程师。你觉得这些模型的极限在哪里?
吉列尔莫:最近发生了一件事,和你说的很有共鸣。过去,你给模型一个提示,它就会进行经典的"下一个Token预测",然后顺着你的想法往下跑。现在模型已经具备了这种直观的计划模式——甚至不需要你要求它计划——它会回复你说:"看,你要求我做的事,有这三条路可以走。这是一系列的权衡取舍。"
这就是X上的人们开始惊呼"现在我们有了博士级别的工程师模型"的时刻。
模型在某个阶段已经"毕业"了。它们曾经是初级工程师。现在它们是首席工程师,因为它们能向你提出一系列权衡方案。当然,有时它们也会胡说八道,这很搞笑——它会告诉你"这需要三周时间"和"这么多Token"。它的预测能力很差。但我现在更尊重模型了,把它们看作是可以进行智力交流的同行。
但仍然存在很多差距。如果你是一位非常熟练的工程师或架构师,你仍然能从模型中榨取更多价值。
所以马克斯提出的问题——如果你是初级者,你会得到一个初级模型吗?
显然不是,因为初级者能从中获得比自己能写的更高级的代码知识。但是,一位经验丰富的架构师从模型中获得的提升是10倍,而初级工程师只有2倍吗?这就是我想弄清楚的。
马克斯:这里面涉及架构决策。我现在在我们团队的一些初级软件工程师身上看到了这一点——他们职业发展的下一步是什么?是从为某个功能编写实现,转向选择技术。比如选择Postgres还是其他数据库。选择ZeroMQ还是其他队列系统。模型可以给出建议,但问题在于——你看到建议会说:"不,不,我想用这个其他的东西。"
这就是那种真正重要的微反馈,以及现阶段你似乎能获得的输出类型。
纳瓦尔:这关乎品味和判断力,对吧?话虽如此——你可以问模型"我该用哪个,为什么",它们什么都知道。它们会给你一个非常好的权衡矩阵。
吉列尔莫:这就是最近发生的变化。你可能会说:"嘿,把这个超高基数的遥测数据放到Postgres里。"它会说:"不不,兄弟。我们不把那种数据放到Postgres里。你应该考虑ClickHouse或Athena之类的。"
这种情况我遇到过很多次。令人印象深刻。
我仍在挣扎的事情是——显然,人类仍在补全模型。什么时候会反过来呢?人类开始接收指令:"去给我弄到这个API密钥,因为这是只有你能做的事。"
或者"为我的下一轮投资弄到这么多资金。"你只能看着。显然,我们还没到那一步。
纳瓦尔:这只是暂时的偏离。很快,每个优秀的SaaS公司或托管服务商都会拥有模型可以直接使用的CLI和API接口。它们甚至不一定需要API。只要是基于文本、基于Unix的——智能体可以自己破解API。至于钱的部分——插入加密Token,放入比特币,放什么都行,模型就会去支付它需要的东西。人们正在研究这个。
纯软件已死?
纳瓦尔:我现在正在思考的问题是——纯软件死了吗?纯软件工程是否已经过时?这就像说"说英语"一样。模型现在会说英语了。我们过去必须学习代码才能与它们交流。现在模型会说英语了——像人类一样模糊、不严谨的英语——而且它们能理解事物。那么创始人的护城河在哪里?硬件?这是个福音。你必须制造硬件,但过去很难同时建立一个软件公司。Patrick Collison说:"软件是艺术,很难雇佣到艺术家。"
现在,作为一个硬件创始人——太好了,你可以相当快地开发出非常好的软件。
如果你是创建模型的,也许那才是新的软件工程——训练、调整、后训练、微调。但经典的软件工程——它死了吗?纯软件还值得投资吗?纯软件还能作为组织公司和团队、试图获得一些杠杆的东西吗?
吉列尔莫:你们看到过吗?X上有一篇Mitchell Hashimoto的文章,叫做"积木经济"。他的观点是,现在对智能体来说最有用的东西是强大且可重用的积木块。就像马克斯的例子,你不会期望你的"造币机"每次需要发送邮件时都重新发明一套队列基础设施系统。它需要引入正确的积木块,大小适合任务——"好的,对于这个任务,我们用BullMQ。"
我挑战这样一种观念:我希望智能体从第一性原理出发,以与现存社会和文明不兼容的方式,重新发明整个宇宙。这几乎就像为你自己重新发明高速公路、法律、政策。即使有额外优化的潜力,但说"我们都依赖于Postgres 13.2"仍然具有大规模协作的价值。
这些智能体将要使用的基础设施软件和积木块类别——显然,有偏见的说,这正是我们在构建的——是极其有价值的。我不认为智能体会在短期内重新发明所有这一切。
我用的另一个比喻是:任何已经被创造出来、模型可以重复使用的东西,就像一个Token缓存。你不想消耗一万亿个Token去复制已经存在的东西。模型总是可以有一个可以分叉的起点。这将深刻地改变很多事情。
纳瓦尔:所以这些就像是模型用的库和依赖项。
吉列尔莫:是的——特别是针对智能体的。
你不再会被卡住了
马克斯:不过,回应纳瓦尔的问题——我从小就开始学编程。在整个青少年和二十多岁的岁月里,我会沉浸其中,连续编码二十个小时。超级有趣。我精通各种编程语言的知识。
我已经很久没有写过一行代码了。部分原因是我的工作性质变了。但也是因为——从去年12月开始,我用AI构建了大量的软件,现在我每天都在用。所有这些我多年来幻想过的项目,现在都在用了——我真的把它们造出来了。我一个代码都没写。我简直无法想象再回到手动编写代码的日子。我很难把这看作是未来的一部分。
吉列尔莫:真正酷的是你理解各个部分是如何组合在一起的。任何理解API是什么、数据如何流动、输入输出、性能的人——因为你必须围绕"我对这次操作的期望水平是这个"来引导模型——这向来比写代码有用得多。一位非常熟练的工程领导者,过去通过Slack或一对一沟通进行所谓的"氛围编码"——你传递你的意志、意图、经验,然后让其他人去执行。现在我们做同样的事情,但对象是智能体。这就是你能成功的原因。我不知道是否每个人都能看到同样程度的成功。
纳瓦尔:我从二十年没写过代码,变成了现在一直通过智能体写代码。构建了大量的软件。事实证明,仅仅理解软件工程和算法的基本原理就能让你走得很远。我停止写代码的原因是我没时间去搞懂最新的语言、最新的架构、要接入的基础设施组件。Vercel让这变得容易多了,但即便如此——光是开始就很麻烦。把各个部分拼凑起来,组装基础设施,实在太烦人了。
马克斯:真正改变的是——过去,你可以构建很多东西,很多过程会很顺利,但然后你会遇到某个随机的问题,你可能会花上无限长的时间来调试某个狭窄的问题。现在,有了智能体,你就不会再被卡住了。这真是太棒了。它们能相对较快地找到做事的正确方法。过去,我记得当其他朋友试图学习编程时,感觉就像——"不,这本身就是令人沮丧的。这是学习的一部分。这就是你学习的方式。"
而现在,这不再是真的了。
第二部分:氛围编码硬件
氛围编码涡轮叶片
纳维:嘿,布莱克,你在Boom Supersonic公司是如何应用所有这些的?
布莱克·肖尔:这彻底改变了软件和硬件开发者的角色。从第一天起,我们就试图采用许多传统的工程工作流程——硬件工程工作流程——并将它们转变为软件。如果你没接触过硬件工程,让我试着解释清楚。很多硬件工程发生在工程师笔记本电脑上的Excel电子表格里,而且是孤立的。非常复杂的电子表格,有时还带有VBScript代码。所有这些都是软件,但却被当作不是软件来对待。没有源代码控制,没有自动化测试。如果你想把一个工作从空气动力学工程师交接给结构工程师,那是通过电子邮件手动发送电子表格完成的。这就像回到了九十年代。太糟糕了。
所以我们开始构建软件框架,以实现硬件工程流程的自动化和可重复性,其理念是我们可以降低迭代成本。但进展缓慢——我们永远负担不起足够多的软件工程师。现在我们进入了一个令人惊叹的截然不同的模式:软件工程师创建架构,因为他们理解系统、算法和关注点分离。然后硬件工程师可以"氛围编码"他们的部分,因为他们懂硬件工程。结果是小团队的效率达到了令人难以置信的水平。
举个例子。如果你在设计一个涡轮叶片——传统上,涡轮叶片初始是冷的,但运行时变热,所以它会膨胀。你必须同时设计空气动力学和结构设计,使其在冷态和热态下都能工作。你必须在冷热状态、结构和空气动力学之间进行转换。这需要一个工程师花费一天时间来完成一个叶片、一个分析环节的工作。一台喷气发动机大约有1000个叶片。你根本做不了多少事。现在,通过软件和硬件人员共同创建解决方案,你可以改变叶片几何形状,并实时看到结构和空气动力学的结果。两个工程师就能设计一整台喷气发动机。天壤之别。
吉列尔莫·劳赫:你提到的一点是,软件工程师正在为其他工程师创造工具和架构。对我来说,这是企业软件最大的"灾难"——没有哪家初创公司再能卖给你硬件协作工具了。在公司内部,你随时都在编码解决你恰好需要的东西。就连电子表格也基本上过时了。电子表格之所以成功,是因为没人能构建定制软件。最接近定制软件的东西就是带有大量VBScript函数的电子表格。
纳瓦尔:没错——它们是轻量级的编程。
马克斯·霍达克:我个人已经几乎完全从Excel转向了Python模型,在那里我可以对事物进行可信的模拟。AI还没有做到的一点是,但我认为它会在一两年内做到——可能是在2026年内——那将会非常令人兴奋:现在它能生成软件,但很快它就能生成STEP文件和PCB布局。当它应用到机械和电气工程领域时,那将是另一番我们还从未见过的景象。非常酷。
开源放大了中国的优势
纳瓦尔:在硬件方面,这对所有那些因为做不出好软件而被迫编写糟糕软件的小配件公司和零部件公司来说,是个福音。现在它们将能够制造出"足够好"的软件。甚至可能不是具有人类前端的软件——可能完全是智能体化的,一个智能体来访问它,而你通过语音与其对话来控制硬件。
这就是中国大力投入开源模型的原因之一。它们之所以全力以赴,是因为它们拥有硬件优势。它们有非常复杂的供应链和组件链。它们基本上是在说:"嘿,如果我能按需生成软件,那么我就不再拥有相对于硅谷的这个劣势了。"
这不是它们做开源的唯一原因。它们也落后了,它们在进行模型蒸馏,在追赶,在协作共享资源。但中国政府有资助那些有助于其整个生态系统发展的项目的历史,尤其是在网络效应业务方面。它们希望集中所有资源,在AI方面迎头赶上,并利用它来增强其硬件优势。
具有讽刺意味的是,它们在做所有这些开源的事情,是因为OpenAI不开源。Grok会发布模型,但它们落后一两个版本。谷歌有一些本地模型,但没有什么真正有竞争力的。Anthropic,据我所知——我甚至不知道它们有任何开源模型。所以所有开源的重任都来自中国。这帮助了我们的硬件创始人,但更大地帮助了它们的硬件创始人和工厂。所有那些你为了在慵懒的周六下午捣鼓而从亚马逊上买来的乱七八糟的小玩意儿、小配件所附带的垃圾软件——这些软件很快就会变得好很多。
吉列尔莫:每个人都已经被敲响了警钟:没有出色的前沿编码模型,你就无法实现自我改进。想象一下,整个中国没有能力生产前沿的一切。这不仅仅是关于生产软件——在这个硬件流水线的任何一个环节,就像布莱克所说的,你都需要生成软件。如果你在生成软件的能力上落后了,你就在一切生成能力上落后了。
你总是想要最聪明的模型
吉列尔莫:我很好奇一件事:大家都喜欢谈论中国模型。你们用中国模型吗?你们认识哪个用中国模型的人吗?
纳瓦尔:没有。这是我昨天晚餐时的一个争论。桌上一个人声称,你会用DeepSeek处理97%的事情,因为它太便宜了,如果你需要更多智能,你只要反复运行它——针对同一个问题。你只会对最先进的任务使用OpenAI、Anthropic等。我有点像是说:"我不知道。"我认为智能是一种纯粹的善。你总是想要更多的智能。当这些模型犯错时,你并不知道。而且它们总是比真人便宜,而且是实时的。
所以你会只使用最可用的智能模型。这不是什么好消息,因为这意味着最终会在AI领域形成垄断或寡头局面。但我总是想要最聪明的程序员。我总是想要最正确的答案。我总是想要最好的判断力。考虑到我将投入其中的杠杆量——通过资本、代码、人力和营销——我希望每次都做出正确的决定。当我拥有两个模型,我知道其中一个比另一个稍微聪明一点,它们都给出了答案,而我常常不知道哪个才是正确的。所以如果我知道一个模型稍微聪明一点,我会选择那个答案,最终我会停止问我认为不那么聪明的那个模型。你们发现这些所谓的"不那么聪明"的模型有什么用途吗?
吉列尔莫:我们看到了用途。我们有AI网关的数据——基本上每个应用程序智能体都经过它。肯定有开源模型的使用,但顶部被前沿智能模型严重主导。
有一个注意事项:在合理的成本和性能下,前沿智能模型在规模上表现优异。Gemini——人们对Gemini不太兴奋,但它们推出的模型在正确的性能-成本组合下非常聪明。有趣的是,对于除编程之外的许多任务,它们是最好的工业生产模型。你可以把它们用于支持任务或浏览器自动化。我总会把Gemini模型放在那里,然后我会为中国模型寻找这类用途。
但任何时候我致力于推动前沿,我都需要最好的编码模型。基本上就是那么两三个模型。中国模型肯定不在其中。
软件仍然需要双手
纳维:马克斯,你在垂直整合和追求极致效率方面推进得很猛。想谈谈这个吗?
马克斯:对于很多事情,你买不到,所以你必须自己制造出来。我们显然不会在前沿模型这类事情上这么做——我订阅了Anthropic。我们确实用一些中国模型,回应纳瓦尔的观点。我们用一些Qwen模型和DeepSeek模型。我们内部有一个基于3.2版本的大规模微调模型,我用来做很多事情——我们很快会考虑移植到4版本。但那是个人层面的,不是公司层面的。
我们的偏好总是买现成的。如果有供应商能以很好的价格提供服务——比如PCB板。我们不制造PCB板。那些基本上是免费的。你可以从亚洲无限量购买。但我们的产品越接近一个共价键合的物质单块,它们就会越好。更低功耗,更小尺寸,更高性能,更长寿命。这些组件是买不到的。为了进行那种类型的整合——真正超越简单拼凑现成零件(这非常局限)进行创新——你必须学会自己去做。这就体现为垂直整合。所以我们在东海岸拥有一个自有的MEMS晶圆厂。没有其他方法能完成我们想要的那种封装和组装。
所有这些都将在未来几年受到AI的深刻影响。现在还没完全到位。具有讽刺意味的是,我们在公司内看到的AI最大影响之一是在监管交互方面。如果我们能生成文档,或者如果我们能问——"我们想改进这个产品,可能有数千条ISO标准适用,我们必须遵守哪些,追溯一下"——这过去需要一个完整的监管和质量团队花费好几个月。现在AI基本上就知道了。
当我想到像外科手术项目或MEMS晶圆厂这类事情时——最终软件仍然需要双手。它会比我们更聪明,但如果它不能制造东西,那这些就是真正的边界。我们已经用各种方式对晶圆厂以及公司的许多其他部分进行了仪器化,随着这些模型变得更好,这应该会很快体现在我们正在进行的细胞工程和正在开发的材料科学中。我们的蛋白质工程小组确实大量使用深度学习——我认为我们可能在这方面处于领先地位。但这是非常特定于应用的。它在公司的不同部分意味着不同的事情。没有一个统一的答案。
人类正在成为验证者
纳瓦尔:马克斯刚才谈到的监管方面的事情让我意识到——我已经很久没有请律师起草基本的法律文件了。我不再找律师要保密协议、各种协议、签字、调研。所有基本的法律任务也都消失了。有个老笑话,说法律就像意大利面条式代码——非常复杂的代码,他们试图用英语表达。它和这里的代码矛盾,必须适应那里的代码。没有真正的API。
对于初级工程师和初级工程——初级工程师基本上被提升为高级工程师,初级工程工作被智能体接管了。同样地,在法律领域,你可以说"律师助理刚刚被解雇了",或者你也可以说"律师助理刚刚被提升为高级律师,现在他们可以把时间花在思考法律问题上"。
吉列尔莫:思考软件工程和法律如何演变之间的相似之处,其实很有意思。你永远不知道律师在这些文件中放了什么——你只是信任他们。"嘿,律师,你能看看这份文件吗?你能告诉我它是否合法吗?你能做个修改标记吗?"你与律师关系中的价值在于,他们是值得信赖的权威。他们上过法学院。他们是在用自己的声誉做赌注。
软件工程也有类似之处。今天最大的问题就是堆积如山的劣质代码,最终形成了一个Pull Request。推特上有很多这样的梗——"过去我们可是会逐行阅读PR代码的。"嗯,在我的领域——基础设施——我希望工程师能够说"我理解"那个PR的每一行代码。这不一定意味着你读了每一行。这意味着你能说"我理解这个PR的后果。我在理解后果的基础上签字同意。"或者,"我写了测试框架、仿真、证明、类型检查器——即使没读这个,我有信心可以签字同意它在生产中会是安全的。"
存在一个世界,我们接受所有东西都将是我们不完全理解的意大利面条式代码,但我们编写评估器来给我们信心,并且我们依赖人来——基础设施生产工程师——说"好的,我同意把这个发到生产环境。"如果你的系统宕机,会有人被传呼。另一件人们低估的事情是:创建软件非常容易,从0到1。但想想一千天以后。你的软件看起来怎么样?它安全吗?有测试吗?达到生产级标准吗?性能好吗?而且你还有动力投入所有这些Token去维护它在生产环境中运行吗?
纳瓦尔:人类正在成为验证者。我们就是通过这种方式训练这些模型的——使用好的验证数据——现在我们需要人类验证者。很多人的旧有职能,律师、工程师、运营人员,转变为验证整个技术栈,然后说:"是的,这大致正确,我大致为它背书,如果出问题我会支持你。"
第三部分:监管前沿
监管领域的"红皇后"竞赛
布莱克:我们在监管相关方面看到的一件事情是——它极大地降低了对变革的厌恶,并提高了迭代能力。举个例子:假设你要认证一架飞机。你需要做的无数事情之一是证明它能承受雷击。测试计划的监管文件可能长达200页。传统做法是,你雇佣一个——说实话——不是特别聪明的工程师,愿意坐在那里,像猴子一样敲键盘,写200页的合规文件。这需要几个月时间。而且,如果你改变了飞机设计,你会想哭,因为又需要两个月的时间来重做这些死板的合规文件。
我们发现我们可以构建一个RAG(检索增强生成)系统,让我们基本上通过提示就能在几分钟内完成所有这些工作。第一层效应是你节省了大量时间。第二层效应是,如果你改变了飞机的规格,现在只需要几分钟,而不是几个月。所以你实际上愿意做出改变。第三层效应是,你可以摆脱那些不太出色的工程师,拥有少数真正有创造力、能够快速迭代的人,因为变革的成本降低了。在某种意义上,整个监管负担——它严重损害了迭代能力——消失了。
马克斯:这是目前AI领域一个被严重低估的故事。硅谷的共识是监管很糟糕——我们想走得更快,我们想实现这个惊人的未来,我们想要富足和繁荣,任何拖慢那个未来的事情都应该避免。我们确实监管过度了。我们已经让建设东西变得不可能。在很多地方,建造任何实体物品所涉及的事情都完全疯狂。
但很多监管本身并不是问题。如果你真的读过很多这些东西——拥有没有烟雾弥漫的城市是好事。能够在许多河流里游泳是好事。很多这些东西都是进步。问题在于,人类理解和遵守这些东西非常困难,而且每次你必须与政府交换一封信函,你就要等上好几个月。如果你能利用我们已经学到的很多东西,让它们变得完全无摩擦,那将会非常酷。我认为这是一个被低估的故事。
纳瓦尔:直到监管者也向我们吐出Token。然后你会开始从监管者那里收到大量你必须遵守的文件,这就变成了智能体之间的战争。但至少这是一场公平的战斗。
马克斯:这基本上就是我们现在的情况。
布莱克:我实际上认为这将是对现状的一种改善。现在一件可怕的事情是,如果你要建造任何实体东西,你必须获得建筑许可证。你在被证明无罪之前都是有罪的。我们遇到的最糟糕的事情是消防部门,因为他们有着从燃烧的建筑中救人的道德光环——而实际上他们所做的只是几个月地折腾你的建筑设计方案。如果我们能用智能体取代消防局长,快速批评你的建筑计划——即使它的反馈过度——也会比现在的延误好得多。
吉列尔莫:当马克斯在说这可能是件好事——我们有所有这些监管时——我想到的是:让智能体成功的关键是,人类或其他智能体设置了正确的测试护栏。人们对/slash goal或Ralph循环之类的东西非常兴奋,你告诉模型:"去做这个,这是你的退出标准。"我对布莱克说:"去让我们所有人都能超音速飞行。你的退出标准是你已经遵守了所有这些规定。"完全存在这样一个世界,我们说监管很棒——它们就像我们的测试套件。只要通过它们不会产生矛盾,而且监管本身是合理的,它们就是很棒的护栏。否则我们就会直接把劣质产品排放到空中。
纳瓦尔:这会变成一场"红皇后"竞赛。它们会有智能体,我们也会有智能体。我认为我们的智能体可能更好——这很好,相比人类对人类的竞争。但它们的周期时间、响应时间可能会变长。App Store现在正被垃圾信息淹没。我敢肯定专利局也被垃圾信息淹没。这些机构采用AI的速度会很慢。它们会被聪明的企业家用海量文件DDoS攻击。这些东西的审批时间可能会延长,因为突然之间会被洪水般的信息淹没。
为什么医疗没有创新
布莱克:这创造了一个真正改变监管模式的机会。想象一下,如果我们像今天建造东西一样开车。在你能够去任何地方之前,你必须写一个计划,寄给某个监管者,然后等待。你的计划必须详细说明:"我们将走某某路线,以这个速度行驶,使用转向灯,在每个停车标志前停车,绝不闯红灯,"等等等等。三个月后你收到一份批评反馈:"我们认为你应该走另一条街。"最终你获得批准,然后你开车去某个地方。这太疯狂了——你哪儿也去不了。然而,这绝对是我们在这个国家建设实体基础设施的方式。我们应该更多地让这些东西基于执法,而不是基于预先批准。
马克斯:我不想太放松——如果我把一个医疗器械发货给很多人,需要有一些——存在未知数。我们负责任,我们做了临床试验,我们报告了所有数据,但是——
纳瓦尔:马克斯,这就是为什么现在医疗领域创新如此之少。FDA的审批过程是一场噩梦。事实上,过去十年硅谷在科技领域的两大进步——AI和之前的加密货币——它们都属于数学领域,因为那是最後一个不受监管的领域。当他们开始监管前沿模型和GPU时,那也会停止。彼得·蒂尔感叹实体领域没有创新。嗯,它一直被巨大的监管障碍所阻碍。
你总能找到一个可怕的案例——一种疫苗,或者一个著名的医疗灾难——但监管无处不在,触角伸向四面八方,而且存在所有这些相互矛盾的监管机构。SpaceX因为不够——我忘了是什么——移民或难民之类的被起诉,但另一方面,政府法规不允许他们雇佣这些人,因为他们不是公民。这不像是在一个地方就能编译的逻辑代码。这些都是遍地的、随意编造的法规。你可能遵守了一个州的法律却违反了另一个州,违反了联邦法律,惹恼了这个人,那个人选择起诉五十个人中的一个是他的朋友。这是武断的。这是反复无常的。
布莱克:而认为这会让我们更安全的想法完全是一个神话。看看波音公司。他们认证了737 MAX,它有一个传感器,对这个飞机的抬头/低头姿态拥有完全的控制权。没有哪个实习生会蠢到认为这是一个好主意。然而它一路通过了整个认证系统。这些东西实际上并没有让我们更安全,只是让我们更慢了。
马克斯:嗯,这里肯定存在功能失调。我认为其中一些确实让我们更安全,比如核管理委员会让我们更安全——他们的工作是确保核能安全,他们通过从七十年代起直到大概一年前都未批准任何新核电厂来实现这一点。如果我们永远不建任何东西,那当然会完全安全。
我想说得非常清楚——在很多方面我都支持放松管制。我同意布莱克的观点,很多这些事可以更有效地完成。但我也认为简单地说"这只是FDA,这些机构"有点过于轻率了。问题更深层次。如果FDA批准了十种非常重要的药物,他们得不到任何赞誉。一个病人死了,他们就被拖到国会面前被训斥。他们有着非常负面的激励机制。现实是,这反映了美国人民的信仰。在人体研究的风险感知和我们获得新药的速度之间存在着权衡。
布莱克:这完全是失衡的。如果你批准了一件坏事,你的职业生涯就结束了。如果你阻止了一件好事,没人会注意到。这就造成了一种失衡的减速。我认为这是监管机构中需要解决的最重要问题。
马克斯:这是一个非常深层次的问题,因为它关系到选民的想法。我们对我们正在研究的一些未来项目进行了民意调查,以了解美国人民对此的看法。如果你逼得太紧,你可以绕过它——去普罗斯佩拉,有各种方法试图加快速度。但如果你被视为一个不良行为者,你就会被我们生活的这个社会所排斥。这是你需要找到答案的事情。这比仅仅说"我们需要监管改革"要深刻得多。
我们需要一个真正的50州实验
纳瓦尔:马克斯,你说到了一个深层问题——它关系到选民、公民的想法。我们喜欢责备政客。你在X上会经常看到——人们说:"这个政客,那个政客,另一个政客。"他们是通过多数票当选的。这就是人民真正所处的位置。那就是那个包裹,那就是他们选择的组合。你可能不喜欢这个实例,但如果你移除这一个,会有非常相似的东西取而代之,因为选民们会再次把他们选回来。
在文化上,大多数人都很难理解我们失去了什么,错过了什么。法国——X上有一个法国企业家在感叹,GDP的57%被政府吸走了,所以你无法创建公司。但对于普通法国公民来说,这是看不见的。他们没注意到自己缺失了什么。他们只知道他们比美国稍微穷一点。
《经济学人》刚刚发了一篇小文章——经济学家们三十年后终于又开始回归资本主义——谈论美国如何超越所有人,增长更快,变得更大。但他们立刻话锋一转说:"这是因为海洋,因为自然资源"——什么都提到了,就是没提资本主义。他们不想说那个肮脏的C开头的词,因为出于某种原因,所有这些杂志在某个时候都变成了马克思主义者。他们无法想象,如果我们当初能更自由放任一点、更开放一点,可能会是什么样子。
我很想看到在五十个州之间进行一场真正的实验。不同的法规,不同的税收结构。现在联邦税收结构和联邦法规主导一切。但想象一下,如果你得了癌症,你可以去某个小州,尝试每个人正在研制的每一种药物。"买者自负"——你得自己做研究。这被称为实验区。无人机也一样。飞机也一样——稍微难一点,因为你需要跨越很多区域——但没错。
布莱克:这里面有一些神奇的东西——创新区的概念。我们有一个巨大的邻避主义问题。但如果你创建了选择加入的迎臂主义区域,它们就创造了那个实验框架。顾名思义,它发生在人们同意的地方。你可以尝试不同的规则,或者没有规则,或者不同的执行方式——在被证明有罪之前是无辜的——然后看看实际会发生什么。创新后果是什么?安全后果是什么?然后成功就可以传播开来。
马克斯:针对纳瓦尔的观点,一个创新区并不能解决药物发现的问题。"尝试权法案"不久前通过了。我们拥有"单患者IND"这个途径的时间比那要长得多。如果你的医生打电话给FDA说:"我想给我的病人用一种未经批准的药物,"他们批准的比例超过99%。他们甚至可以通过电话批准。
问题是,要给病人用药,你仍然需要临床级别的药物。唯一拥有这个的通常是在进行临床试验的知识产权所有者——他们投入了数亿美元来制造这个东西。如果发生在你的病人(可能一开始就病得很重)身上什么不好的事情,FDA会做出负面推断,这被认为是该药物的一个属性,是全球性的——与你的创新区无关。所以有两个问题。第一,你需要让知识产权所有者给你一些他们的药——他们不会这么做的。第二,你需要防止全球监管机构对他们如果给你一些药物后可能发生的临床试验结果产生怀疑。
布莱克:你会在医学领域如何解决这个问题?
马克斯:这是内行人的话题。例如,必须禁止FDA对衣壳的不同使用者做出负面推断。有一些特定的方法,可以通过相对轻松的监管手段,真正加速创新,只需防止这种偏执左右我们的决策。
中国的FDA正在击败我们
吉列尔莫:有比FDA更好的东西存在吗?我们拿什么基准来比较这些监管机构?
纳瓦尔:人人都效仿FDA。人人都复制FDA。
马克斯:两个扩展。首先,欧洲——不一定比FDA好,但它们有不同的体系。它们有这些授权机构——基本上是由其东道国政府批准的私营企业,负责认证事物。火车、飞机、医疗器械。授权机构体系在审查层面创造了稍微好一点的激励,因为它们可以雇佣人员,可以发展,存在竞争。它们自己必须遵守东道国政府设定的条件,但这意味着可以有比美国多出数千倍的审查员。
第二——实际上,今天有一种获批的、有偿的可植入脑机接口在中国。中国的药监局正在独立思考。它们的体系,我认为如果我们不小心,将会让我们面临严峻挑战。将药物或设备推向市场的成本要低得多。你可以在人体中尝试,也可以在市场上尝试。
这是我花了很多时间思考的事情。20年前,我们购买的笔记本电脑和手机少得多;每一台都贵得多。现在它们更便宜了,数量也多得多,我们买得也更多,总支出却上升了。这很棒。高通、三星和苹果的股价大幅上涨。每个人都开心。他们正在用手机和笔记本电脑产生的过剩财富购买更多的手机和笔记本电脑。
这在医疗领域不会发生。由于报销机制——这是一种面向企业的销售——我们用来购买医疗服务的资金池基本上是固定的。它并没有像我们在技术增长行业中看到的那样,随着有更多的东西产生更好的医疗结果而增加。医疗支出增长速度大致与税收收入增长速度相同。如果AI蓬勃发展并取得重大进展,两年后我们在AI上的支出是现在的10倍,这可能会很棒。但如果在两年内我们在医疗上的支出是现在的10倍,这将是一场灾难。这与成为一个技术增长行业根本矛盾。
医疗领域存在这个普遍问题,都跟同一件事有关:把这些东西推向市场的成本太高了。这就是中国正在解决的问题。解决之道不是单一支付方或对健康保险的某种修正。而是要降低成本,使得有人可以用信用卡购买,融资,最坏情况下像买车一样——然后你在交易中向他们收费。要做到这一点,我们必须降低这些东西推向市场的成本。中国正在这样做。这将使它们能够以1万美元而不是10万美元的价格出售这些东西。这就是放松管制。
医疗保健是资本主义内部的共产主义社会
纳瓦尔:从根本上说,医疗领域不存在私人市场。人们有时会做这样的类比:想象一下,不是你付钱去餐馆吃饭,而是你去所有餐馆,月底你把所有收据和账单寄给你的保险公司或政府,然后他们报销你。那么每家好餐馆门口都会排长队。每家差餐馆都随时有空。等待时间会糟糕透顶。产品质量不会改善。你基本上是在一个更大的资本主义社会内部运行一个小型的共产主义社会。这就是我们在医疗保健领域正在做的事情。
布莱克:这也是我们在道路上做的事情,这也是为什么我们有交通堵塞。高速公路没有可变定价,所以总是拥堵。
纳瓦尔:如果你想短暂触及医疗保健的"第三轨"(敏感话题),想想这个计划。告诉我它有什么问题。想象一下,你年收入的第一个20%是你的医疗免赔额。如果你身无分文、无家可归,那就是零。如果你很富有,那就是数百万美元。无论你的年收入是多少,第一个20%就是你的医疗免赔额。剩下的部分由政府或保险系统支付,最高到它们目前的通常上限。
你会很快创造一个私人市场。在牙科、整形外科、许多选择性医疗程序中,会出现竞争局面。你会得到改进。看看眼科中的LASIK手术。看看牙科中的贴面、牙套和牙科手术。看看整形外科。这些领域似乎确实在进步,因为它们是私人支付者——人们用他们的钱投票。
我们需要在正常的医疗体系中做一些类似的事情。但人们会失去理智。他们甚至不愿意多想一步。"不,不,不,那个破产的人怎么办?"破产的人没有收入。"20%对某些人来说太多了。"好吧,你可以在里面设置一些免赔额。但总的来说,如果没有一个私人市场,人们在那里自费支付医疗程序,你就不会有这个反馈循环。你就不会有这种向系统投入更多资金的能力。
现在,非常富有的人可以自愿地向系统投入资金。但价格完全混乱。费率卡完全混乱。系统不是为此设计的。如果你去寻医问药并想自费支付,有时他们会向你报一个比他们向保险公司收费高10倍的价格。
希德的故事:仅有一名患者的医学
马克斯
英文来源:
The AI Industrial Revolution
Full episode with 20 minutes of new material at the end. With Guillermo Rauch (Vercel), Blake Scholl (Boom Supersonic), and Max Hodak (Science).
Part 1: Waste Tokens, Save Time
Nivi: Welcome. You’re listening to Naval Podcast, your authoritative source for new knowledge. We’re trying something new today. I have three frontier founders with us—three good-looking guys, actually, and a fourth good-looking guy, Naval.
Let me just introduce everybody.
Guillermo “the G” Rauch. He’s building Vercel into an AI cloud for the world of agents and whatever comes after that.
Blake Scholl. He’s building Boom Supersonic—supersonic aircraft, in his own factory, and jet engines as well.
And Max Hodak from Science. He’s building a biohybrid brain interface that grows living neurons on silicon to restore sensory functions like sight—but eventually to explore new parts of the brain and new senses.
All three of these guys are not composing their products with off-the-shelf parts. They’re building their own factories. And we don’t care as much about what they’re building exactly as we do about what they’re learning about how they’re building.
What’s the new knowledge they’re generating?
What’s their alpha?
What principles are they discovering that other founders can learn from?
What are they trying to figure out right now?
Naval, any reactions before I jump in to Guillermo?
Naval: Yeah, let’s just have fun.
Nivi: You guys should just jump in.
AI Software Factories
Guillermo Rauch: I can’t remember my exact quote, but I’ve been really pilled with this idea of software factories. The job of the engineer being something where you just show up to work, you ship the output directly, and everything inside the company was—“how good is person A at shipping output B?”
And now what’s happening is, the way I’m judging you as an engineer is, “are you producing the factory that will produce multiplicative outputs B through Z?”
That’s a pretty significant change. We used to believe—and it used to be somewhat controversial—that there are 10x engineers.
Now clearly there’s 100x or 1,000x engineers, and the world hasn’t fully adjusted to this.
Naval: I used to get flamed on Twitter for saying there are 10x engineers, because it flies in the face of so much equality philosophy that everyone’s equal. But the reality is, when you’re operating in idea domains, in intellectual and virtual digital domains, it’s not even 10x—it’s 100x or 1,000x, and it always has been.
Satoshi. Notch. The guy who invented JavaScript, the Brendan Eichs of the world. John Carmack. These are 1,000x programmers.
Not to even mention—if you choose the right thing to work on versus the wrong thing to work on, that’s an infinity difference. And it could just be not necessarily a better programmer, just one who had better judgment on what to work on in the first place.
And now obviously it’s less controversial because of AI leverage.
Guillermo: What’s controversial is the token leaderboards. People are still getting a little confused—“Well, I have a bunch of 100x engineers. Look at all these tokens that I’m paying for.”
I’m curious if you guys have seen the same—how do you measure ROI?
Blake Scholl: It’s like the old measuring of lines of code. Token consumption and lines of code feel like similarly not direct paradigms.
Max Hodak: My observation has been that Claude or ChatGPT is basically as good as you are in a domain. If you’re a really capable developer, these things are really powerful. If you’re a junior developer, you’ll find it to be more of a junior developer. The feedback you give them sporadically seems to be incredibly important—these little updates seem to totally determine the types of performance you get out of them.
Guillermo: There’s a new kind of support I give now—you come to me, you didn’t get good output out of the model, and I tell you what to prompt the model with. The quality of the reprompting is extremely important.
Max: To be clear, I think this will become less important over time. As the models get much smarter, you’ll be able to put in less and get more out. But at this stage, it really seems to reflect back the judgment that the user brings in.
Waste Tokens, Save Time
Naval: I’ve kind of resisted learning all the tricks and tips. “Use Ralph Wiggum. Use OpenClaw. Use Hermes. Use this prompt engine. Use this scaffolding. Plug in this piece. Always use plan mode.”
I just ignored all of that. I assumed the model is going to get better faster than I would figure out how to use it. It would figure out how to use me faster than I would figure out how to use it. So I’ve just been completely ham-fisted with them.
I get frustrated at them and have found myself typing less and less information, doing less and less work as time goes on, because I just assume I can brute-force my way through it. I’ll throw Codex, Claude, and Gemini at the same problem over and over and just waste tokens to save time. No matter how expensive these models might seem, they’re still way cheaper than a human. So I would say—just waste tokens, save time. Don’t look at the tokens either as inputs or outputs. Just look at your time, and look at the final output.
Even if they’re writing low-quality code—which I know in many cases they are—when the time comes and I want to ship to production, I’ll just throw more tokens at it. “Go through, look at it, rewrite it.”
They’re just going to get better every generation. I don’t see where this necessarily stops. As long as we have verifiable domains and solved problems, they’re going to resolve those problems. Now in the unsolved problems domain—maybe you’re Terence Tao, at the cutting edge of creativity—you need to be working very collaboratively and carefully and closely with the model. But I’m not at that level in software engineering.
Models Instructing Humans
Naval: Guillermo, you’re probably the most extreme software engineer on the team. How are you finding these models at the edge of their capability?
Guillermo: There’s one thing that’s happened recently that resonates strongly with what you’re saying. It used to be that you’d give a prompt to the model and it kind of does a classic next-token prediction thing and runs away with your idea. Models now have been doing this intuitive planning mode—without you even having to ask them to plan—where it comes back to you and says, “Look, what you’re asking me for, there are these three routes we can take. Here’s the set of trade-offs.”
That’s the moment where people on X do the whole thing—“Now we have a PhD-level engineer model.”
The models at some point graduated. They used to be junior engineers. Now they’re principal engineers, because they come back to you with a set of trade-offs. And obviously, sometimes they bullshit, which is hilarious—it tells you “this is going to take three weeks” and “this many tokens.” It makes really bad predictions. But I respect the models a lot more as a peer that I’m going back and forth intellectually with.
There are still a lot of gaps. If you’re a really proficient engineer or architect, you’re still extracting more juice.
So the question Max was positing—if you’re junior, do you get junior back?
Clearly not, because a junior gets more advanced knowledge in code than they would have been able to write by themselves. But doesn’t an experienced architect get 10x where a junior engineer gets 2x? That’s what I’m trying to figure out.
Max: There are architectural decisions. I’m seeing this now with some of our junior software engineers on the team—what’s the next step in their career progression? It’s going from writing implementation for a feature to picking technologies. Choosing between Postgres versus some other database. Picking between ZeroMQ versus some other queuing system. The models can suggest them, but that’s the thing—you’ll see it and you’ll go, “No, no, I want to use this other thing.”
That’s the type of little feedback that really matters and the types of output you seem to get at this point.
Naval: It’s taste and judgment, right? That said—you can ask the models “which one should I use and why,” and they know everything. They’ll give you a really good trade-offs matrix.
Guillermo: That’s the change that’s happened recently. You’d say, “Hey, go put this super-high-cardinality telemetry data into Postgres.” And it goes, “No, no, bro. We don’t put that kind of data into Postgres. You should consider ClickHouse or Athena or whatever.”
That’s happened to me a lot. Really impressive.
The thing I’m still struggling with is—clearly the human is still completing the model. At what point is it the other way around? The human is the one starting to get the instructions back: “Go get me this API key, because it’s something only you can do.”
Or “Get me this amount of capital for my next set of investments.” You just watch. Clearly we’re still not there yet.
Naval: That’s a temporary aberration. Pretty soon every good SaaS company or hosting provider will have a CLI and API interface the models can use directly. They don’t even necessarily need an API. As long as it’s text-based, Unix-based—the agent can hack its own API. And the money part—you insert crypto tokens, put in Bitcoin, put in whatever, and the model goes and pays for whatever it needs. People are working on this.
Is Pure Software Dead?
Naval: The thing I’m now thinking through is—is pure software dead? Is pure software engineering an obsolete thing? It’s like saying speaking English. The models now speak English. We had to learn code to communicate with them. Now the models speak English—fuzzy, sloppy English, like a human—and they understand things. So where’s the moat for a founder? Hardware? It’s a boon. You had to build hardware, and it was hard to build a software company alongside. Patrick Collison says, “Software is art, and it’s hard to hire artists.”
Now, as a hardware founder—great, you can have really good software developed fairly quickly.
If you’re creating models, maybe that’s the new software engineering—training, tweaking, post-training, fine-tuning. But classic software engineering—is that dead? Is pure software investable? Is pure software something you can organize a company and a team around, and try to get some leverage?
Guillermo: Did you guys see—there was an article on X by Mitchell Hashimoto called “The Building Block Economy”? His argument is that the most useful thing for agents to have now is really powerful reusable building blocks. To Max’s example, you wouldn’t expect your clanker to reinvent a queue infrastructure system every time it needs to send an email. It needs to bring in the right building block, right-sized for the task—“Okay, for this one it’s BullMQ.”
I challenge the notion that I’d want the agent to reinvent the entire universe from first principles in a way that’s incompatible with the rest of society and civilization. It’s almost like reinventing highways, laws, policies—just for you. Even if there’s potential for extra optimization, there’s still cooperation-at-large-scale value of saying “we’re both depending on Postgres 13.2.”
The category of infrastructure software and building blocks these agents are going to use is—obviously in bias, this is what we’re building—extremely valuable. I don’t see the agent reinventing all of that any time soon.
Another metaphor I’ve been using: anything that’s already been created that the models can reuse is like a token cache. You don’t want to churn through a trillion tokens to reproduce what’s already existing. There’s always a starting point the model can fork off from. It’s going to change things quite profoundly.
Naval: So these are like libraries and dependencies, but for models.
Guillermo: Yes—for agents specifically.
You Don’t Get Stuck Anymore
Max: To Naval’s question, though—I learned to program when I was really little. Through all of being a teenager and in my twenties, I’d get sucked into it and code for like twenty hours. It was super fun. I knew all this stuff about different programming languages.
I haven’t written a single line of code in quite a while now. Partly that’s because my job is different. But also—since December, I’ve built a huge amount of software that I now use every day. All these projects I’d kind of fantasized about for years that I’m now using—that I’ve actually built. I didn’t write any of that. And I just can’t imagine going back to actually writing code by hand. I have a hard time seeing that as part of the future.
Guillermo: What’s really cool is that you understand how the pieces click together. Anyone who understands what an API is, how data flows, inputs and outputs, performance—because you have to orient the model around “this is the level of expectation I have out of this operation.” That has always been infinitely more useful than writing code. A really proficient engineering leader has been quote-unquote vibe coding through people on Slack or one-on-ones—you’re transmitting your will, your intent, your experience, and letting others run with it. Now we do the same, but with agents. That’s why you’ve been successful with it. I don’t know that everyone sees the same level of success.
Naval: I went from not having written code in twenty years to coding all the time now—through agents. Building tons of software. It turns out that just understanding the basic principles of software engineering and algorithms gets you a long way. The reason I stopped coding was that I didn’t have time to figure out the latest language, the latest architecture, the infrastructure pieces to plug into. And Vercel makes it a lot easier, but even then—just getting started was a bear. Plugging pieces together, assembling infrastructure was just so annoying.
Max: The thing that really changed is—it used to be you could build a lot, a lot would go straightforward, but then you’d hit some random thing and you could spend an indefinite period of time debugging some narrow thing. Now, with agents, you just don’t get stuck anymore. Which is pretty amazing. Relatively quickly they can find the right way to do things. It used to be that—I remember when other friends would try to learn to program, it was like—“Nope, it’s intrinsically frustrating. That’s part of the deal. That’s how you learn.”
And that just isn’t true anymore.
Part 2: Vibe Coding Hardware
Vibe Coding a Turbine Blade
Nivi: Hey Blake, how are you applying all this at Boom Supersonic?
Blake Scholl: It completely changes the role of software and hardware developers. From day one we tried to take a lot of traditional engineering workflows—hardware engineering workflows—and turn them into software. If you haven’t been around hardware engineering, let me try to make this clear. A lot of hardware engineering happens in Excel spreadsheets on engineers’ laptops in a silo. Very complex spreadsheets, sometimes with VBScript code. All of this is actually software, but it’s treated as if it’s not. There’s no source control, no automated testing. If you want to hand something off from an aerodynamicist to a structures engineer, that’s done manually with a spreadsheet over email. It’s the nineteen-nineties. It’s terrible.
So we started building software frameworks to automate and make repeatable hardware engineering flows, with the idea that we could reduce the cost of iteration. But it was slow going—we could never afford enough software engineers. What we’ve gotten into now is a mind-blowingly different model: the software engineers create the architectures, because they understand systems, algorithms, and division of concerns. Then the hardware engineers can vibe-code their pieces because they know hardware engineering. The result is mind-blowingly different productivity for small teams.
Example. If you’re designing a turbine blade—classically, a turbine blade starts cold, but when it runs it gets hot, so it gets bigger. You have to design both the aerodynamics and the structural design to work in its cold shape and its hot shape. You have to convert between cold and hot, between structures and aerodynamics. This takes one engineer one day for one blade for one piece of the analysis. There are about a thousand blades in a jet engine. You can’t do much. Now, with a combination of software and hardware people creating the solution, you can change blade geometry and see in real time the structures and aerodynamics results. Two engineers can design an entire jet engine. Wildly different.
Guillermo Rauch: One of the things you mentioned is that software engineers are creating the tools and architectures for the rest of the engineers. To me, that’s the biggest cataclysm of enterprise software—there’s no startup that builds hardware collaboration tools that can sell you anything anymore. Internally, you’re just coding the right thing you need at any given time. Even spreadsheets are kind of cooked. The reason spreadsheets were successful is that no one could build custom software. The thing that approximates custom software the most is a spreadsheet with a bunch of VBScript functions.
Naval: Right—they’re lightweight programming.
Max Hodak: I’ve personally moved almost entirely from Excel to Python models, where I can get believable simulations of things. The thing AI hasn’t come to yet, but I think it will within the next year—probably within 2026—and that will be very exciting: right now it can generate software, but soon it will generate STEP files and PCB layouts. When it comes for mechanical and electrical engineering, that’s a whole other thing we haven’t seen yet. Very cool.
Open Source Compounds China’s Advantage
Naval: On the hardware side, this is a boon for all these little gadget companies and part companies that write really bad software because they can’t make great software. Now they’re going to be able to make good-enough software. Or it may not even be software with a human front end—it might just be completely agentic, an agent accessing it, and you talk to it through voice to control hardware.
This is one of the reasons China is big into open-source models. They’re going all in on it because they have hardware superiority. They have these very complex supply chains and component chains. They’re basically saying—“hey, if I can just generate software on demand, then I don’t have this disadvantage anymore against Silicon Valley.”
That’s not the only reason they’re doing open source. They’re also behind, they’re distilling models, they’re catching up, they’re collaborating on resources. But the Chinese government has a history of funding efforts that help their entire ecosystem along, especially in network-effect businesses. They want to pool all their resources, catch up on AI, and use it to give their hardware stuff an advantage.
Ironically, they’re doing all the open-source stuff because OpenAI is not open. Grok publishes models, but they’re a model or two behind. Google has some local models, nothing really competitive. Anthropic, to my knowledge—I don’t even know of any open-source models from them. So all the open-source heft is coming from China. It helps our hardware founders, but it helps their hardware founders and factories that much more. All the crappy little software that goes with all the random knickknacks and thingamajigs you buy off Amazon to tinker with on a lazy Saturday afternoon—that software’s getting a lot better very quickly.
Guillermo: Everyone’s had the wake-up call that without great frontier coding models, you don’t have self-improvement. Imagine China as a whole not having the ability to produce frontier everything. It’s not just about producing software—in any piece of this hardware pipeline, like Blake was saying, you need to generate software. If you fall behind in your ability to generate software, you fall behind in your ability to generate everything.
You Always Want the Smartest Model
Guillermo: One thing I’m curious about: everyone loves to talk about Chinese models. Do you guys use Chinese models? Do you know anybody who uses Chinese models?
Naval: No. This is an argument I had yesterday at dinner. One person at the table was claiming you’ll just use DeepSeek for 97% of things because it’s so cheap, and if you need more intelligence you’ll just run it over and over again—the same problem. You’ll only use OpenAI, Anthropic, etc. for the most advanced tasks. I was kind of like, “I don’t know.” I think intelligence is an unalloyed good. You always want more intelligence. When these models make a mistake, you don’t know it. And it’s always cheaper than a real person, and real-time.
So you’ll just use the most intelligent model available. Which isn’t great news, because it means you’ll end up creating a monopoly or oligopoly situation in AI. But I always want the most intelligent programmer. I always want the most correct answer. I always want the best judgment. Given the amount of leverage I’m going to pour into it—through capital and code and people and marketing—I want to make the right decision every time. When I have two models, one I know is a little smarter than the next, and they both give me answers, often I don’t actually know which is the correct answer. So if I know one model is a little smarter, I’m going to go with that answer, and eventually I’m going to stop asking the model I think is less intelligent. Have you guys found a use for these so-called less intelligent models?
Guillermo: We see uses. We have AI Gateway data—basically every application agent goes through it. There’s definitely usage of open models, but the top is heavily dominated by frontier intelligence.
There’s a caveat: frontier intelligence at reasonable cost and performance slaps at scale. Gemini—people don’t get really excited about Gemini, but they put out models that are super smart at the right performance-cost combination. For a lot of tasks other than coding, interestingly enough, they’re the best industrial production models. You can throw them at support tasks or browser automation. I’d always put a Gemini model there, and I’d look to Chinese models for those kinds of things.
But any time I’m working to push the frontier, you need the best possible coding model. That’s basically two or three models. The Chinese are certainly not in it.
Software Still Needs Hands
Nivi: Max, you’re pushing pretty hard into vertical integration and extreme urgency. Want to talk about that?
Max: For many things, you can’t buy it, so you have to make it somehow. We obviously don’t do this on things like frontier models—I have an Anthropic subscription. We actually do use some of the Chinese models, to Naval’s point. We use some Qwen models and DeepSeek models. We have a big internal fine-tune of 3.2 that I use for a bunch of things—we’re going to look into porting to 4 soon. But that’s on the personal side, not on the company side.
Our preference would always be to buy something. If there’s a vendor that offers a service at a great price—for example, PCBs. We don’t make PCBs. Those are basically free. You can buy them in unlimited quantity from Asia. But the closer our products get to being a single block of covalently bonded matter, the better they’ll be. Lower power, smaller, higher performance, longer lasting. The components aren’t available. In order to do that type of integration—to actually innovate beyond just piecing together things you can buy off the shelf, which is really very limiting—you have to learn to do it yourself. That shows up as vertical integration. So we own a captive MEMS foundry on the East Coast. There was no other way to do the type of packaging and assembly we wanted to do.
All of this is going to be affected heavily by AI over the next few years. It’s not quite there yet. Ironically, one of the biggest impacts we’ve seen of AI inside the company is in regulatory interactions. If we can generate documentation, or if we can ask—“we want to evolve this product, there are thousands of ISO standards that might apply, which ones do we have to comply with, trace this through”—that used to require a whole regulatory and quality team for several months. Now the AI just kind of knows.
When I think about stuff like the surgical program or the MEMS fab—ultimately the software still needs hands. It’s going to be smarter than us, but if it can’t make things, those are real boundaries. We’ve instrumented our foundry as well as many other parts of the company in ways where, as these models get better, that should show up pretty immediately in things like the cell engineering we’re doing and the material science we’re developing. Our protein engineering group really uses deep learning a lot—I think we’re probably state of the art there. But it’s very application-specific. It means different things in different parts of the company. There’s not one answer.
Humans Are Becoming Verifiers
Naval: What Max was talking about with regulatory stuff makes me realize—it’s been a while since I generated a basic legal document using a lawyer. I stopped asking lawyers for NDAs, agreements for this, sign that, research this. All the basic legal tasks are gone too. There’s the old joke that law is like spaghetti code—very complicated code they try to put in English. It contradicts this code over here, has to fit into that code over here. There are no real APIs for it.
For junior engineers and junior engineering—junior engineers basically got a promotion to senior engineer, and junior engineering got taken over by agents. The same way, in law, you can say “paralegals just got fired,” or you can say “paralegals just got promoted to senior lawyers, and now they can spend their time thinking about the law.”
Guillermo: It’s actually interesting to think about the parallels between how software engineering is evolving and lawyers. You never know exactly what lawyers put into these documents—you just trust them. “Hey, lawyer, can you look at this document? Can you tell me if it’s legit? Can you do red lines?” What you’re valuing in the relationship with a lawyer is that they’re a trusted authority. They went to law school. They’re putting their reputation on the line.
There’s a parallel with software engineering. The biggest problem today is this mountain of slop that ends up as a PR. There are all these memes on Twitter—“way back in the day we used to read every line of code of a PR.” Well, in my world—infrastructure—I want engineers to be able to say “I understand” every line of that PR. That doesn’t necessarily mean you’ve read every line. It means you can say “I understand the consequences of this PR. I’m signing off on understanding the consequences.” Or, “I wrote the test harness, the simulations, the proofs, the type-checkers—even without reading this, I have confidence I can sign off that it’ll be safe in production.”
There’s a world in which we embrace that everything is going to be spaghetti code we don’t fully understand, but we write the evaluators that give us confidence, and we rely on people—the infrastructure production engineers—to say, “Okay, I’m fine sending this into prod.” Someone is going to get paged if your systems go down. Another thing people are underestimating: creating software is really easy, zero to one. But think about a thousand days from now. What does your software look like? Is it secure? Is it tested? Is it production-grade? Is it performant? And are you still motivated to invest all those tokens in maintaining it in prod?
Naval: Humans are becoming verifiers. That’s how we train these models—with good verification data—and now we need human verifiers. A lot of the old function of people, lawyers, engineers, operations people, moves to verifying the stack and saying, “Yeah, this is roughly correct, I’ll roughly stand behind it, I’ll support you if it goes wrong.”
Part 3: The Regulatory Frontier
The Regulatory Red Queen Race
Blake: One of the things we’ve seen related to regulatory—it massively reduces change aversion and improves iteration. Example: let’s say you’re going to certify an airplane. One of the zillions of things you have to do is prove it can withstand a lightning strike. The regulatory documentation for the test plan stretches on for, say, 200 pages. What you would classically do is hire a—let’s be honest—not super-bright engineer who’s willing to be there, monkey at keyboard, writing 200 pages of regulatory compliance documentation. It takes a couple of months. And by the way, if you change the airplane, now you want to cry, because there’s another two months of rework of this rote compliance documentation.
What we’ve found is we can build a RAG that will enable us to basically prompt our way through all of that work in—let’s call it minutes. The first-order effect is you save a lot of time. The second-order effect is, if you change the specification of the airplane, it now takes minutes, not months. So you can actually be willing to change. And the third-order effect is you can get rid of the not-very-great engineers and have a small number of really creative ones who can iterate rapidly, because the cost of change goes down. In a certain sense, the entire regulatory burden—which really hurts the ability to iterate—drops away.
Max: This is a really undersold story in AI right now. The consensus in Silicon Valley is that regulation sucks—we want to go faster, we want to realize this amazing future, we want abundance, prosperity, and stuff that slows down that future is to be avoided. Certainly we’ve over-regulated. We’ve made it impossible to build stuff. It’s totally crazy what goes into building any physical thing in a lot of places.
But a lot of the regulations themselves are not the problem. If you’ve actually read a lot of these things—having non-smog-choked cities is great. Being able to swim in many rivers is great. A lot of these things were progress. The problem is that it’s really difficult for humans to deal with understanding and complying with this, and every time you have to exchange a letter with the government, you wait months. If you could take a lot of the things we’ve learned and make them totally frictionless, that would be pretty cool. I think that’s an under-sold story.
Naval: Until the regulator starts spewing tokens back at us. Then you start getting huge amounts of documents from the regulators that you have to comply with, and it’s agent-on-agent wars. But at least it’s a fair fight.
Max: That’s basically what we have now.
Blake: I’d actually argue that would be an improvement from where we are now. One of the terrible things right now is, if you’re going to build anything physical, you have to get a building permit. You’re guilty until proven innocent. The worst thing we’ve run into is the fire department, because they have the moral imprimatur of people pulling people out of burning buildings—and yet what they actually do is just screw with your design for buildings for months. If we could replace the fire marshal with an agent that would critique your building plan quickly—even if its feedback were overdone—it would be massively better than the delays that exist today.
Guillermo: When Max was talking about this potentially being a good thing—that we have all this regulation—my head went to: the thing that makes agents successful is humans or other agents setting up the right testing guardrails. People are really excited about slash goal, or Ralph loops, where you tell the model, “Go do this, and this is your exit criteria.” I’m telling Blake, “Go make us all supersonic. Your exit criteria is that you’ve complied with all of these regulations.” There’s totally a world where we say the regulations are great—they’re like our test suite. As long as passing them doesn’t incur contradictions, and the regulations are actually reasonable, they’re an awesome guardrail. Otherwise we’d be shipping slop directly into the air.
Naval: This is going to turn into a Red Queen’s race. They’re going to have agents, we’re going to have agents. I think we might have better agents—that’s good, as opposed to human-versus-human. But their cycle time, their response time, may get longer. The App Store is drowning in spam right now. I’m sure the patent office is drowning in spam. These agencies are going to be slow adopters of AI. They’re going to get DDoSed by clever entrepreneurs just overloading them with documents. It’s possible the approval time for this stuff may extend out as it suddenly gets flooded.
Why There’s No Innovation in Healthcare
Blake: It creates an opportunity to really shift the regulatory model. Imagine if we drove around a city the way we build things today. Before you could go anywhere, you’d have to write a plan, ship it to some regulator, and wait. Your plan would have to specify, “We’re going to take such-and-such a route, drive this speed limit, use our blinker, stop at every stop sign, never run a red light,” blah blah blah. Three months later you get a critique back: “We think you should drive on this other street.” Eventually you get approval and you go drive somewhere. It’s insane—you can never go anywhere. And yet that is absolutely the way we build physical infrastructure in this country. We should actually make more of these things enforcement-based, rather than pre-approval-based.
Max: I don’t want to be under too much—if I ship a medical device to a lot of people, there needs to be—there are unknowns. We were responsible, we did clinical trials, we reported all the data, but—
Naval: Max, this is why there’s so little innovation in medical right now. The FDA approval process is a nightmare. In fact, the two biggest advancements in tech in Silicon Valley in the last decade—AI and, before that, crypto—they’re both in the math domain, because that’s the last unregulated domain. When they start regulating frontier models and start regulating GPUs, that stops as well. Peter Thiel laments that there’s no innovation in the physical domain. Well, it’s been held back by huge regulatory barriers.
You can always find a scary case—a vaccine, or a famous medical disaster—but the regulations spread everywhere, the tentacles are everywhere, and there are all these contradictory regulatory bodies. SpaceX got sued for not having enough—I forget what—migrants or refugees or whatever, but they’re not allowed to hire them, by government regulation on the other side, because they’re not citizens. This is not like logical code that has to compile in one place. These are made-up random regulations all over the place. You might comply with one state and violate another, violate federal over here, annoy this guy over here, that guy chooses to prosecute one out of fifty people who are his friend. It’s arbitrary. It’s capricious.
Blake: And the idea that this makes things safer is a complete mythology. Watch Boeing. They certified the 737 MAX, which had a single sensor that had complete authority over the nose-up, nose-down attitude of that airplane. No intern is dumb enough to think that’s a good idea. Yet it got all the way through the certification system. This stuff doesn’t actually make us safer, it just makes us slower.
Max: Well, there’s definitely dysfunction here. I think some of this makes us safer in the sense that the NRC makes us safer—which is that their job was to make sure nuclear energy was safe, and they did this by permitting zero plants from the seventies until I think a year ago. It will be perfectly safe if we never build any of it.
I want to be really clear—I’m on the side of deregulation on a lot of this. I agree with Blake that a lot of this can be done more efficiently. But I also think it’s a little too dismissive to say, “This is just the FDA, the agencies.” The problem is deeper. If the FDA approves ten really important drugs, they don’t get any credit. One patient dies, and they get hauled before Congress and yelled at. They have very negatively-biased incentives. The reality is that this is reflective of the beliefs of the American people. There’s a trade-off between the perception of risk taken in human-subjects research, and the rate at which we get new medicines.
Blake: It’s totally asymmetric. If you approve a bad thing, your career is over. If you block a good thing, nobody notices. It creates an asymmetric slowdown. I think that is the most important problem to solve in the regulatory state.
Max: This is a very deep problem because it is where the voters are. We poll some of the stuff we’re working on in the future to understand where the American people are on it. If you push too hard, you can work around it—go to Próspera, all kinds of ways to try to go faster. But if you’re seen as being a bad actor, you’re rejected from the society we live in. That’s the thing you need an answer for. That’s deeper than just saying, “We need regulatory reform.”
We Need a True 50-State Experiment
Naval: You have a deep point there, Max—it’s where the voters, the citizens, are. We like to blame politicians. You’ll see this on X all the time—people are like, “This politician, that politician, the other politician.” They’re elected, by majority vote. This is where the people literally are. That’s the package, that’s the bundle they’ve chosen. You may not like this instantiation, but if you removed this one, something very similar would take its place, because the voters would just vote them right back in.
Culturally it’s very hard for most people to understand what we lost, what we missed. France—there’s a French entrepreneur on X lamenting that 57% of GDP gets sucked up by the government, so you can’t create companies. But to the average French citizen, that’s not visible. They don’t notice what they’re missing. They just know they’re slightly poorer than the US.
The Economist just did a little piece—economists are finally coming back around to being capitalists after thirty years—on how the US is outstripping everybody, growing faster, getting bigger. But they immediately turn around and say, “It’s because of the oceans, because of natural resources”—everything but capitalism. They don’t want to say the dirty C-word, because for some reason all these magazines became Marxists at some point. They can’t envision or imagine what could have been if we had just been a little more laissez-faire, a little more open.
I would love to see a true experiment among the fifty states. Different regulations, different tax structures. Right now federal tax structure and federal regulations dominate everything. But imagine you could go to some small state if you had cancer, and you could try every drug everyone was cooking up. Caveat emptor—you’ve got to do your research. This is known as the experimental zone. Same for drones. Same for aircraft—a little harder, because you’ve got to cross a lot of areas—but yeah.
Blake: There’s something magical in there—the notion of innovation zones. We have a huge NIMBY problem. But if you create opt-in YIMBY zones, they create that experimentation framework. By definition, it happens where people are consenting. You can try different rules, or no rules, or different ways of enforcing—innocent until proven guilty—and see what actually happens. What are the innovation consequences? What are the safety consequences? Then the successes can spread.
Max: To Naval’s point, an innovation zone would not solve the problem in drug discovery. The Right to Try Act passed a little while ago. We’ve had this pathway called Single Patient IND for a lot longer than that. If your doctor calls the FDA and says, “I want to give my patient an unapproved drug,” they approve over 99% of those. They can even grant them over the phone.
The problem is that to dose a patient you still need clinical-grade drug. The only entity with that is typically the IP owner who’s in the middle of running a clinical trial—they’re investing hundreds of millions of dollars into making this thing. The FDA will draw an adverse inference if something bad happens to your patient who’s probably really sick to begin with, and that’s seen as a property of the drug, which is global—not related to your innovation zone. So there are two problems. One, you need to get the IP owner to give you some of their drug—they’re not going to do that. Two, you need to prevent the global regulator from casting doubt on what might happen with their clinical trial if they give you some.
Blake: How would you address that in medicine?
Max: This is inside baseball. The FDA has to be prohibited from drawing adverse inferences across different users of a capsid, for example. There are specific ways you could really accelerate innovation with a relatively light regulatory touch by just preventing this paranoia from driving our decisions.
China’s FDA Is Beating Ours
Guillermo: Is there anything better than the FDA out there? What are we benchmarking these regulators against?
Naval: Everyone follows the FDA. Everyone copies the FDA.
Max: Two expansions. First, Europe—not really better than the FDA, but they have a different system. They’ve got these notified bodies—basically private businesses blessed by their host governments to certify things. Trains, planes, medical devices. The notified-body system creates slightly better incentives at the review layer because they can hire people, they can grow, there’s competition. They themselves have to be compliant with conditions placed by host governments, but it means there can be many thousands more reviewers than in the US.
Second—there actually is one approved, getting-paid implantable BCI today, which is in China. The CFDA is thinking for itself. They have a system that I think is going to give us a run for our money if we’re not careful. The costs to bring a drug or device to market are just much lower. You can try things in humans and try things on market.
Here’s the thing I’ve been spending a lot of time thinking about. Twenty years ago we were buying far fewer laptops and phones; each one was much more expensive. Now they’re cheaper, there are far more of them, we buy more of them, total spending has gone up. This is great. Stock prices of Qualcomm and Samsung and Apple are way up. Everybody’s happy. They’re using the excess wealth generated by phones and laptops to buy more phones and laptops.
This doesn’t happen in healthcare. Because of the reimbursement mechanism—there’s this enterprise sale happening—the bucket of money we use to buy healthcare is basically fixed. It is not increasing as there’s more stuff producing better healthcare outcomes, the way we see in technological growth industries. The rate of spending on healthcare grows at roughly the rate of growth of tax receipts. If AI is booming and there are major advances, and two years from now we’re spending ten times as much on AI, this could be great. But if in two years we’re spending ten times as much on healthcare, this would be a catastrophe. This is fundamentally at odds with being a technological growth industry.
There’s this omni-problem in healthcare, all related to the same thing: it’s just too expensive to bring these things to market. That’s what China is getting at. The way out of this is not single-payer or some revision to health insurance. It’s to bring down the costs so that someone can buy this with a credit card, finance it, maybe like a car, worst case—and then you charge them in the transaction. To do that, we have to make it cheaper to bring these things to market. China is doing that. That will allow them to sell these things for $10,000 instead of $100,000. That is deregulation.
Healthcare Is a Communist Society Inside Capitalism
Naval: Fundamentally, there’s no private market in healthcare. The analogy people make sometimes—imagine that instead of going to restaurants and paying, you’d go to all the restaurants, and at the end of the month you’d send all the receipts and bills to your insurer or to the government, and they would reimburse you. There’d be a line outside every good restaurant. Every bad restaurant would be available. The waits would be terrible. The product wouldn’t improve. You’re basically running a small communist society inside a larger capitalist society. That’s what we’re doing in healthcare.
Blake: It’s also what we’re doing on roads, which is why we have traffic. There’s no variable pricing for getting on the highway, which is why it’s always clogged.
Naval: If you want to step on the third rail of healthcare for a moment, think about this plan. Tell me what’s wrong with it. Imagine that the first 20% of your annual income was your healthcare deductible. If you’re broke and homeless, it’s zero. If you’re rich, it’s millions of dollars. Whatever your annual income is, the first 20% is your healthcare deductible. The rest is paid by the government and the insurance system, up to the usual caps they have today.
You’d create a private market pretty quickly. In dental, plastic surgery, a lot of optional medical procedures, you’d get a competitive situation. You get improvement. Look at optometry with LASIK. Look at dental with veneers and braces and dental surgery. Look at plastic surgery. Those fields do seem to be advancing because they’re private payers—people voting with their money.
We need to do some equivalent of that in the normal healthcare system. But people lose their minds. They don’t even want to think one step ahead. “No, no, no, what about the broke person?” The broke person has no income. “Twenty percent is too much for some people.” Okay, you can put some deductible in there. But generally, if you don’t have some private market where people are paying out of pocket for what are medical procedures, you’re just not going to get this feedback loop. You’re not going to get this ability to spend more money into the system.
Right now, very wealthy people can spend voluntarily into the system. But the prices aren’t anywhere. The rate cards aren’t anywhere. The system’s not designed for it. If you go shopping for medical care and you want to pay out of pocket, sometimes they’ll quote you a price that’s 10x what they charge the insurance company.
Sid’s Story: N-of-1 Medicine
Max: Have you heard Sid’s story from GitLab? He had a massively successful IPO, then was diagnosed with a rare cancer. He has lived way past the prognosis. He really took it into his own hands. He did frontline chemo, then there was one alternative available, he exhausted it, and the doctors were like, “We’ve got nothing for you.” Since then, six or seven companies have come out of it. There are now twenty or thirty drugs in his escalation ladder. He’s still alive.
Guillermo: He’s doing great. I saw him the other day. He basically created his own personalized medicine and treatment plan.
Max: There are a handful of these anecdotes I’ve heard now. It is really clear to me that at the high end—if you’re not dealing with insurance, you have the resources, you’re like, “I want the full toolbox of modern science”—outcomes are possible that are crazy. If you go ask your doctor, “What will happen if I do this?” they will start shouting and throwing things. But crazy things are possible at the high end. This type of N-of-1 medicine is going to end up being a really rich source of research for understanding how to build more translatable things.
Guillermo: It requires a ton of agency from the patient in a moment where they’re at their weakest, which is pretty ironic. My friend passed away from cancer, and the last thing he wanted to do was research N-of-1 medicine—he was dying by the week. This is where AI should really shine, and democratize what you can actually do when you find yourself in that situation. It’s kind of crazy how few people get access to this, just from a knowledge perspective, not just monetarily.
Part 4: The Autonomous Company
Autonomous Infrastructure
Nivi: How much autonomous software do you have in your organizations that’s running on its own, or near-autonomous and improving on its own?
Guillermo: A lot of our infrastructure is already autonomous. We have a capability that fires off upon finding anomalies—I recommend everyone create a version of this, or Vercel offers one. Today most engineering organizations respond to anomalies by setting up alarms or monitoring thresholds by hand, which is pretty insane, but that’s how the entire industry works.
We’ve automated a lot of the SRE job—Site Reliability Engineering. Any metric that slows down, speeds up, or changes throughput fires an anomaly alert, an agent investigates, and the agent can decide to create an incident. If an incident is filed, people get looped in and the agent begins remediation. We’re doing everything except giving the agent the tools to change prod—we’re serving solutions on a silver platter to engineers.
The other thing working really well: autonomous optimization and autonomous security research. We open-sourced a tool called deepsec. It’s incredible—like Mythos, but you get it today. We run it against our entire monorepo using ten thousand concurrent agents in the cloud. It found several quarters’ worth of security-research progress in a couple of days, for fourteen thousand dollars of tokens—months of red-teaming, entire teams of people.
Cybersecurity is becoming a nightmare: too many vulnerabilities, too much work, adversaries too powerful. You have to invest proactively. You’ve probably seen people on Twitter translating codebases from one language to another—once you’ve done the work to get a working program, optimizing or rewriting it in a native language is now quite doable with frontier models.
Naval: Just from my own vibe-coded app—I built a bug-reporting queue for my TestFlight users. They report bugs from inside the app; it uploads the logs and a screenshot. Of course they use it for feature requests too. A simple daemon compiles all the bug reports, proactively analyzes and fixes them in the background, then ships me a TestFlight build to try before I ship it to the testers. I could see an app in the future literally built by its users. I’m not saying that’s a good idea—it might be a mess.
Guillermo: We should ship that, just to see what happens.
Naval: As a social experiment. You’d end up with a Homer Simpson car—an umbrella, a flashlight, a clown horn, every feature. But for bug-fixing, you could definitely do it.
Your Job Is to Train the Agent
Blake: We did a version of that experiment. I stopped all project work across the entire company for a week and said, “Everybody, from the receptionist to the engineers, build whatever you think is the most important thing to build. Your only requirements: you have to use AI, and you have to demo it for the whole company when you’re done.” I expected a large number of silly projects and a small number of needle-movers. We got the opposite—a large number of needle-movers and very few silly projects. Two or three were trajectory-changing; they’d absolutely change the direction of the company.
What surprised me most: the receptionist—the ship-and-receive associate whose job was to take packages off a truck and email people when their stuff hit inventory—built an automation for that. We’re actually using it.
The conclusion I came to: everybody has some idea of what could exist that would make the world better, but their first-order ideas are often stupid, and they can’t project that out and see it. But if they can go from idea to an actual thing, they can react and iterate. Give them a week, and by the end they’ve built something that makes sense.
Guillermo: Imagine if all work was like that. How do you set up a workforce that doesn’t do the work directly—all they do is train the agent that does it for them? You have to remind people, create hackathons. There’s a culture change happening: a lot of people coming in intuitively know their job isn’t to work on the thing, it’s to train the agent that works on the thing.
Naval: It could get a lot crazier. Maybe you just turn on all the cameras, and the agent watches everything happening, sees that the shipping-and-receiving process is inefficient, and writes the app and presents it.
Guillermo: We’re likely going to ship a feature into AI Gateway that lets people opt in to preserving inputs and outputs. Then you can say, “For all my inputs and outputs, extract the skills—learn from my work and dump it as skills I can download for myself.”
You could imagine people in companies wanting to share and pool this together.
Naval: It’s funny—for me that’s unimaginable, because my own work isn’t repetitive. I look for things to automate, and there’s almost nothing left to automate in my own work. I hope that’s where everybody ends up: you work in your maximum zone of creativity and interest at all times. If there’s anything left to automate, automate it—get it out of your life, it’ll free you to be creative, and that’s where you generate all the value.
That’s hard to see in the job-career mindset, because you hire people to do the same thing over and over, and that’s going away. It’s scary—people ask, “What am I going to do?” You’re going to do creative things. You don’t have to come up with a new thing every day—that’s impossible—but once in a while you come up with a new thing that creates a point of leverage.
The Next Lord of the Rings
Max: Historically the returns were maybe 70% intelligence, 30% agency. Now it’s going to be 70% agency, 30% intelligence—and that’ll shift further as the models get better.
Naval: I’ll take the counterpoint, Max. I think it’s 99% intelligence and 1% agency—because the agents will exercise the agency. You’ll literally say, “Hey agent, I’m making smart decisions and thinking big thoughts; just go implement stuff.” Sometimes I want to build a feature on an app I’m vibe-coding, and I’ll ask the agent, “What feature should I build next? Go look at the logs.”
Max: To be clear, I’m talking about the returns to humans. The humans best fit for the future are the ones who are more agentic—the ones who can open Claude and think, “What should I build?” instead of watching YouTube.
Naval: Here’s a fun experiment. We all know a lot of people now who are coding who weren’t before—including, in many cases, ourselves. The percentage of coders has probably gone up 10x.
Guillermo: It’s why our sign-up numbers are through the roof—a whole new class of people who aren’t engineers.
Naval: But the majority of people still aren’t creating code. I tell people, “Vibe coding is so much fun.” I had a gaming group I used to play first-person shooters with to blow off steam; I completely stopped. That time went to vibe coding instead. It’s more entertaining, you get something real out of it, and the feedback loop is just as tight or better.
I tell my friends, “You should be vibe coding instead,” and they give me a blank look. To them it was always a black box—they assume you’re just talking to the computer. They don’t realize it’s a lot easier now. So we might’ve gone from 0.01% of the population writing code to maybe 1%—call it 100x—but 99% still never will.
Guillermo: It’s crazy. It’s like a video game—a great video game—but real stuff comes out.
Naval: The normies have gotten a little more into it, but through media models—video models. More people have fooled around making videos and images than writing code and apps. But video has its own issues—someday “make me a great movie about X” spits out a good documentary, but right now they don’t have the taste or the judgment.
Max: This is a bet I have with Andrej Karpathy: what year can you dump in a book and get a movie out? I think it’s close—he’s come down substantially on his timeline. By 2030 we’re going to have dozens of Lord of the Rings—some fan saying, “He did it wrong, I’m making my own take.”
One of my benchmarks: I’m a huge fan of The Expanse. There’s a TV series and nine books; they made the first six books but not the last three, and there are meaningful divergences. I’m looking forward to dumping in the last three books, conditioned on the TV series, and saying, “Generate the last three seasons.”
Guillermo: That’s a great feature. When you said, “Get me the next Lord of the Rings,” I got excited—because we haven’t had a breakthrough in imagination, in culture, the likes of Harry Potter and Lord of the Rings.
What’s Your Definition of Art?
Naval: So what can humans uniquely do? This gets to the core issue. Max, you’re an AGI maximalist—so for you it’s nothing; agents will do everything.
Max: I’m not anti-human, but if your identity is how smart and creative you are, you’re going to have a bad time.
Naval: I’m still on the other side of that. Creativity is the thing that surprises you—you step out of the system and do something that wasn’t even imaginable within it. It’s outside the training data, out of the distribution that was fed into the system. There’ll always be room for that.
Guillermo: Have you noticed every Claude website looks the same? People dial in what a Claude website looks like—serif font, brown and cream, monospace with a certain spacing. After a while you get a distribution, and you say, “This isn’t creative. This is slop that came out of Claude.”
Max: To be clear, I don’t think it’s human versus computer—it’s human with computer versus just computer. But the computer’s going to produce crazy super-stimuli; it’s going to make the entertainment. We see a weak form of this in TikTok. My personal definition of art is meaningful out-of-distribution behavior—something surprising, like you’re moving in the Z-axis. And “meaningful” means it changes your future trajectory through the universe—your life is somehow different for having thought it and reflected on it.
Max: My definition is broad. There can be military maneuvers you’d call art. We’re going to see Move 37s all over the place. What’s your definition of art?
Naval: I have multiple definitions. I think of art as conveying emotion—something you felt, transmitted to another person; you create an object that captures an emotion you felt inside. By that definition a computer is almost incapable of it: the exact same piece of art without intent behind it is meaningless. You can argue nature is art—a sunset—but that’s pure intelligence working without motive, so no ego gets involved, and your brain recognizes the complex system. Art in the human sense is: someone felt something and wanted you to feel it. So the identity of who created it matters.
Max: So a beautiful photo—if a person takes it versus AI generating the exact same photo down to the last pixel, the person taking it has more meaning for you.
Guillermo: Do you remember ControlNet, a year or two ago? There was a medieval-village scene with a swirl in it—AI-generated. That was one of the first times I looked at this and thought it was really cool.
Naval: But doesn’t that break your premise? A human came up with the training and the prompt to arrive at that riddle. It’s possible AI does that itself in the future, but I give whoever came up with that optical-illusion ControlNet idea the credit.
The bar is going to be raised massively—it’ll take more and more to surprise you. Like Studio Ghibli: OpenAI destroyed Studio Ghibli for everybody. Nobody wants to see another Studio Ghibli work again. It’s been done.
Naval: Right, but art has to be out of distribution. Once you’ve seen tons of Studio Ghibli everywhere, it’s in distribution—no longer surprising, and the art value is gone. Humans are the ones who generate surprise completely out of the data distribution, and they do it with intent—and intent matters for meaning. Take an AI trained to be perfect at mathematics, within the formal system. Then Kurt Gödel comes along with something completely outside the system—the incompleteness theorem—stepping outside it to break it. That kind of thing I don’t think an AI can get to. The meaning comes from the fact that a human did it for a purpose and conveyed something.
Can AI Have New Ideas?
Max: The really deep question: is it possible for an LLM or transformer to go out of distribution—to have a new idea that wasn’t present in the training set?
Naval: The training sets are so large it’s hard to imagine ideas that aren’t in them somewhere. But if they exist, they probably lie in the natural domain—physics, interaction, feeling, emotion, evolution—things language isn’t subject to. There are still things outside of language, though language is a great compressor of a lot of it.
Max: I think the question is how you go out of distribution without randomness. In reinforcement learning you can sample an action from a distribution and get randomness that walks you into new territory. Can humans go out of distribution—where does any new idea come from? Are we also dependent on randomness?
Naval: We’re not dependent on pure randomness. Natural selection works through pure randomness—mutate a gene and see what happens. But humans seem able to cut through infinite space, eliminate huge swaths, so our creativity makes sense within the larger scheme. That’s one of our unique capabilities. Maybe AI is starting to do that at the edges, as we’re seeing with some math problems—but math is a very bounded domain. At the moment, truly stepping outside and surprising people is still the domain of humans. Human plus AI is where it’s all moving. Human without AI, forget it; pure AI isn’t there yet—but human plus AI, we’re in that era, and I’m betting we stay there longer than people think.
A Very Large Number of Small Teams
Naval: Humans will have an enormous amount of value—more value. Everyone here, our productivity has gone through the roof. Basic economics says that when productivity is higher, you’re wealthier and you hire more people, not fewer. If someone’s really good with AI and really smart and creative, I want to hire them more than ever, for the leverage.
Guillermo: That’s a new requirement. We’re hiring juniors and super-seniors, as long as they’re really good with agents and quick to adapt. My hypothesis is we end up with a larger number of smaller teams. The number of people required for any given task drops a lot. People who only see first-order effects say, “All the jobs disappear—I can do a jet engine with two people, not a thousand; 998 jobs gone.”
But what it actually means is you can create a lot of different jet engines. We’ll get an explosion of entrepreneurship, an explosion of founders, and a very large number of very small teams.
Naval: AI provided base-level intelligence and domain knowledge and cut through the jargon; now agents provide a lot of agency. So what’s left is creativity, taste—and yes, you need enough agency to get started and to stick with it, but you don’t need to spend twenty years learning one thing before you can contribute. That barrier going down means generalists are having a field day.
At the end of the day we’re all generalists—we like to think about everything. Max is here talking about consciousness and the FDA and brain science and creativity. The people on Twitter fond of saying “experts, credentials, sources” are the ones getting hurt, because the expertise matters less now.
You spend five or ten years getting a PhD—hopefully it developed your creativity, instincts, taste, and judgment, because if all it did was help you memorize jargon and scaffolding, AI cuts right through that. It’s a bicycle for the mind, accelerated. So it’s people with AI versus people without AI—and the single best thing you can do for yourself is get really good with these tools, and always know the edges of what they can and can’t do. And that’s a moving target.