重返代码

内容来源:https://nav.al/code
内容总结:
AI编程时代来临:Naval谈"氛围编程"如何颠覆软件行业
——从个人应用商店到苹果统治的终结,一场由AI驱动的编程革命正在展开
近日,知名投资人Naval Ravikant在一期播客节目中深入探讨了"氛围编程"(vibe coding)这一新兴现象,以及AI编程代理如何正在从根本上改变软件开发的格局。
Naval表示,自2025年12月Claude Opus 4.5发布以来,AI编程代理迎来了一个决定性拐点。这些代理不再仅仅是编程辅助工具,而是能够从头到尾构建应用、解决棘手问题的"初级程序员",速度快、几乎免费且随时待命。
个人应用商店:一人公司的崛起
Naval分享了自己重建Airchat应用的经历。与过去需要带领八到九名工程师团队、耗时九到十二个月不同,现在他独自一人借助AI编程代理,从零开始重建这款应用,而且完全按照自己的意愿来设计,无需任何妥协。
"我甚至建立了一个个人应用商店,"Naval说,"我可以告诉AI我想要什么应用,它就能把应用送到我的个人应用商店里。我只需一键下载,还能像苹果应用商店一样进行升级。"
这种模式让用户可以获取高度定制化的应用。Naval举例说,他让AI根据他的需求构建了一个健身追踪应用,集成了Tonal和Ladder的功能,遵循苹果人机界面指南,连接Apple Health,并生成了漂亮的图表来追踪进度。
编程比玩游戏更有趣
Naval将氛围编程比作一款"有现实奖励的电子游戏"。与传统电子游戏不同,氛围编程是"无边界"的——底层运行的是图灵机,可以构建任何东西,目标由用户自己设定,且具有现实意义。
"我过去用来阅读、刷手机、打游戏的时间,现在全部投入到了氛围编程中,"Naval坦言,"这也是我最近在X平台上不太活跃的原因——我完全沉浸在Claude和Codex的世界里。"
纯软件不再适合风险投资
Naval提出了一个大胆观点:纯软件已经不再适合风险投资。原因有二:一是如今任何人都可以快速拼凑出软件;二是编程代理正在以惊人的速度进步,一年之内就能构建出具有良好架构的可扩展软件。
"如果你们的全部优势就是'嘿,我们在构建别人不知道如何构建的酷软件',我认为这种模式已经不具备投资价值了,"Naval说。
在他看来,当前风险投资的关注点应转向硬件、网络效应和AI模型本身。
苹果统治的终结?
Naval认为,AI编程代理的崛起标志着"苹果统治时代终结的开端"。当用户与手机的交互方式从"打开Uber应用"转变为直接对AI说"帮我叫个Uber",苹果的生态系统优势将大幅削弱。
"苹果放弃AI将是本十年科技行业最大的战略失误,"Naval断言,"当所有通信都通过Claude或Codex这样的代理进行时,手机的需求将变得越来越小。用户需要的只是一个屏幕、一块电池和网络连接——而安卓手机同样能满足这些需求。"
他认为,苹果的市场估值将面临压缩,因为其高利润率的商业模式依赖于操作系统的垄断地位,而这一地位正在被动摇。
AI编程代理的局限性
尽管前景诱人,Naval也指出了当前AI编程代理的局限性。随着代码库变得复杂和庞大,AI模型的上下文窗口会耗尽,导致它们"失去线索"、做出错误判断、重复修复同一漏洞。
"这些代理总是试图取悦你,"Naval比喻道,"有点像狗——在抓鸭子方面比你在行,但它仍然是狗。如果你指向的不是鸭子的鸟,它也可能把那只鸟打下来。所以你仍然需要引导它。"
未来展望
Naval预测,软件开发将演变为用户与AI代理协作的过程。代理可以7×24小时处理漏洞报告、编写代码、响应请求,而且不会因为被推翻代码而产生情绪。
"现在真正可以实现一人或两人的软件公司,服务数百万用户,创造数十亿美元的价值,"他说,"过去这种情况也曾发生,比如Notch(《我的世界》开发者)和中本聪(比特币创始人),但我认为未来我们会看到越来越多这样的案例。"
中文翻译:
回归编程
纳维:您正在收听 Naval 播客。我是纳维,他的常规搭档。今天我们聊聊"氛围编程"。
回归编程
纳维:我先用 Naval 3月23日的一条推文来开启话题:"AI 编程代理现在能一键将定制应用直接送到你的手机上。这是 iPhone 统治地位终结的开端。"你想谈谈你正在构建什么以及如何分发吗?
Naval:好,让我聊聊氛围编程,以及我是怎么开始的。
大约在 2025 年 12 月,随着 Claude Opus 4.5 的发布,AI 编程代理迎来一个拐点。人们开始使用它,惊叹道:"哇——这个代理不会跑偏,能从头到尾构建应用,能解决棘手问题,真的就像手下有个又快、几乎免费、又乐意效劳的初级程序员。"
那是个拐点。我在推特上看到各种炒作,但这次感觉是真的。我以前也试过编程代理,效果有好有坏,但这次我真的投入进去了。而且我已经几十年没认真写过代码了。我有计算机科学学位,了解计算机架构和网络,懂一点芯片、算法之类的。
但我很久没认真编程了。
写代码的启动门槛非常高。你得把各种不同的服务互相连接起来。从 GitHub 到可能的后端——你在用 Vercel、Firebase、Railway 之类的——有太多东西要连在一起。
你得懂很多行话——很多工具。而 AI 现在让这一切变得非常简单。我和其他人一样,从 Claude Code 开始用。我也用 Codex 来解决一些更棘手的漏洞和深层问题,然后我立刻就上瘾了。这太有趣了。那么,什么变了呢?嗯,这些代理真的在发挥作用。
现在它们不仅仅是编程辅助——不是你让它解决特定问题,它给你一堆代码,然后你复制粘贴到你的 IDE、你的开发环境里。而是你打开一个终端——也就是所谓的 CLI——命令行界面。全都是基于文本的,这正是这些东西最擅长的,因为它们本身就是基于文本令牌训练的。它内部或者说底层运行的是 Unix。这些代理真的很懂 Unix,因为如果你看看它们训练所用的所有代码——在 GitHub、其他地方或 Stack Overflow 上的——大部分都是 Unix。
而且大多数现代操作系统底层其实都是 Unix。macOS 众所周知是基于 BSD 的。所以这些底层都是 Unix,都是文本输入、文本输出。所以这些代理本质上就是长期运行的、在核心层面连接到 Unix 的编码 AI。它们连接到 Unix shell,这样就能执行命令。它们通过基本的 Unix 命令连接到文件系统。
它们能调用所有 Unix 命令,比如 grep、awk、sed、pipe 等等——所有这些可以串联起来的操作符。它们可以运行 cron 任务,所以能长期运行;而且它们还能根据需要生成更多的 shell 和任务。
个人应用商店
Naval:这非常容易上瘾,因为——通常来说,编程一旦投入进去,其实很有意思。
但要投入进去,启动门槛太高了。但现在突然间,你不需要知道所有工具和所有命令了。这些东西说英语。AI 是不可思议的翻译器。它们早期的核心用途之一就是机器翻译。它们就是通过翻译来测试的。但现在它们正在从 Python、C、Lisp、Rust 以及所有这些不同的编程方言和所有这些专门的命令进行翻译——然后用英语交流,而且交流时非常宽容。
所以你可以用不同的词;你可以有拼写错误;你可以用自己的方式解释事物。但如果你对计算机架构、网络和编程有基本的理解——这不需要太多,实际上可以非常基础,或者我应该说非常高层;不是简单意义上的基础,而是高屋建瓴意义上的基础——那你就能走得非常非常远。
所以只是为了好玩,我试着构建了一堆不同的应用,我开始时用一种"一键生成"的方式来做我想要的特定应用。一键生成的意思是:我只给它一段描述,它就给我一个应用。然后我从那开始改进。所以我实际上建了我自己的小应用商店,一个只属于我的应用商店。
我可以向它要一个应用;它能把那个应用送到我的应用商店里,那是一个网页,最终我把它做成了一个应用本身,放在我的 iPhone 上。然后我可以一键下载那些应用,我还可以像 App Store 那样提供升级。
所以,如果我想要一个新应用,比如记录我锻炼的应用——我确实有这个;我建了一个完全按我喜欢的方式定制的锻炼追踪应用——所以我可以这样说:
"嘿,用 Tonal 和 Ladder 的功能;遵循苹果的人机界面指南,让它看起来像个苹果应用;用以下方式记录我的锻炼——这是我最近几次锻炼的文本日志——让我能轻松重新输入新记录并调整它们;给我建漂亮的图表来追踪进度;加入任何你能想到的其他功能——计算力量得分;阅读科学论文来搞清楚按身体部位计算力量得分的正确方法;做一个人体示意图,可以显示哪些肌肉更大、哪些更小;连接到苹果健康来获取我的心率数据。"
我不是把所有内容都放在一个提示里,但确实放了很多,然后我立刻就收到了一个能用的应用,送到我的个人应用商店。顺便说一句,个人应用商店有点开玩笑的意思。但它是真实的,因为它就是我的个人应用商店:它看起来像个应用商店,我的应用会被送到里面。
但显然它不能广泛分发,因为苹果有门禁。苹果不允许你构建能在任何人 iPhone 上下载的应用。你必须把它们绑定到你特定的设备上。所以对我的朋友和家人,我可以给他们送应用;我还不能把应用送给大家。不过,这整个体验非常容易上瘾。
你能得到为你高度定制的应用。那么,这是否意味着常规应用没有空间了?不,它们当然有。那些覆盖广泛使用场景的应用——它们会成为同类中最好的。有人精心调校、呕心沥血。所以如果你的使用场景被某个广泛的应用覆盖了,你很难超越它。
但当你想要真正定制或私密的东西时——这些 AI 对于你个人才需要的小众应用非常棒。或者当你想把它们调整到适合你的特定使用场景时,这会非常棒。
氛围编程是一场有现实回报的电子游戏
Naval:而且这非常容易上瘾——因为就像在电子游戏里,电子游戏的设计方式是通过给你反馈和完成任务的奖励来让你上钩。
而且它总是在你能力的边缘。所以当你变得更好时,游戏会变得更难。不是难到让人沮丧,也不是简单到无聊。所以你在电子游戏里总是在能力的边缘操作,并获得这些奖励。但这些奖励是假的,游戏也是有边界的。它是由其他人创造的。它有点像一个人造的小世界,你内心深处其实知道这点。所以你只是在搞懂游戏的规则。一旦你搞懂了规则,它就无聊了。
但氛围编程不同,它是无边界的,因为现在底层运行着一台图灵机。你可以构建任何东西。目标是由你设定的,而且可以不断扩展,所以它几乎永远不会被完全填满。而且它有现实世界的意义。它不是为你解决的那些虚假世界里虚假人物或虚假游戏而存在的,所以它有趣得多。所以氛围编程已经"一键搞定"了我很多朋友,他们全都消失在自己想要的应用程序的氛围编程里了。
但有一个非常清晰的方向真的很有帮助。你必须知道你想要什么——这其实是最难的部分——并且对它有一个非常清晰的愿景。我就有,因为有一个我痴迷了大约一年的特定应用,叫 Airchat——是我和一个团队一起建的——它是一个让人们通过语音和视频交流的社交通讯工具。
它没怎么成功,所以我们把它卖了,让投资者拿回了钱,也给团队安排了不错的待遇。但我记得那段经历非常令人振奋,因为我正在构建一个我自己想要的产品,并且和一个出色的团队合作。
但我必须通过一个团队来做这件事。我有八到九个工程师,取决于时间点,我们非常努力地工作了九到十二个月,发布了几个版本。但有了氛围编程,我基本上在重建那个应用。我从头开始重建。但现在的关键是:我完全按照我想要的方式来重建。没有任何妥协。
通常情况下,在团队合作构建任何东西的过程中,总是有妥协的——即使你没有意识到。即使你是负责的独裁者——这很少见——你仍然需要迁就其他人。你不能说:"把这个图标向左移。现在向右移。不,移回去。不,再移回去。"
你做不到。你会惹恼工程师。你不能在没有合理理由的情况下提出要求——只凭直觉或预感。但 AI 编程代理的美妙之处在于没有这些。
它就像一辆自动驾驶汽车。你在自动驾驶汽车里不会感到尴尬,因为那里没有司机坐着。同样地,有了自主编程代理,你不会因为自己的怪癖而感到难为情。所以你能精确地创造出你想要的东西。
我认为氛围编程的一个好处是——虽然我们可能看不到超高质量的代码(至少这一代不行),架构也需要大量改进,这些东西可能有安全漏洞,可能难以扩展——但你将获得的原型、你将获得的个人应用,会非常快,并且会忠实于创作者的愿景。不会有任何妥协。
所以你最终可能得到更多像《我的世界》那样的东西——Notch 著名地独自编码了它——那是单一个人的愿景。它可能看起来很奇怪,因为"这是什么方块图形?简直是巨大的倒退。"
但他不需要妥协。他不需要和任何人沟通,也不需要向任何人解释他为什么想要那样。所以我认为它扩展了发现的范围。
这也非常有趣。它把可能构建应用的人口比例从 0.1% 提高到了百分之一、二或三。别误会我——大多数人不会去编自己的应用。对于大多数人来说,计算机有点像魔法黑箱,谁知道里面在干什么。所以即使它变得容易了 10 倍或 100 倍,对他们来说仍然毫无意义。它仍然是黑箱。
但对于那些有创造力、有自驱力、能清晰表达且有良好愿景的人来说,你现在可以编程了。再也没有人挡在你和你的原型之间了。
是的,如果你想带着一个功能丰富的应用进入市场,并且需要扩展到大量用户等等,那么你会想招募一个优秀的团队,让真正的工程师加入,你可能还得重写整个东西。但如果你在做实验、在做原型、在推向市场,那没有比这更好的了。
纯软件已不值得投资
Naval:作为一个软件创造者,从来没有比现在更好的时代。
那么,同样的市场机会还在吗?这是一个大问题。它们变化得非常非常快。大公司可能很脆弱,因为现在任何人都能创造软件。
也可能它们更有优势,因为它们有分发渠道。它们可以用能想到的所有软件填补所有空白。但我实际上认为这是个人软件创造者的复兴。
现在,我发的另一条推文大概是:"不再有风险投资支持的软件市场了",或者"纯软件不再值得风险投资了"。
纳维:我记得好像是"纯软件正在迅速变得不值得投资"。
Naval:对,那是我想说的更温和的版本,我想说的是纯软件不值得投资。我就到此为止。如果你的全部优势是"嘿,我在构建别人不知道怎么建的酷炫软件",我认为这已经不值得投资了。
不值得投资有两个原因。
一是今天别人也能随便拼凑出来。二是编程代理进步太快,一年甚至更短的时间内,它们可能就能构建出架构良好的可扩展软件。所以我认为我们会看到飞跃式的进步。这个精灵已经从瓶子里出来了。
所以如果你现在是风险投资人,你在找硬件、你在找网络效应、你在找 AI 模型。而且我敢说,训练 AI 模型就是新的构建软件——直到自动研究和自动训练开始起作用为止。
但我认为氛围编程比玩电子游戏更有趣。它更高效。它更有建设性。它有更好的反馈循环。你构建你想要的东西。你处在技术的最前沿。你甚至可能从中赚到钱或发展出职业生涯——虽然职业生涯有点过时了——但你可能会从中创造出有趣的机会。而且通过实践,你能学到很多关于计算机的知识。
我见过一些孩子在玩氛围编程。让孩子们编程很难。你可以扔给他们 Swift Playgrounds 和 ScratchJr 之类的,希望他们能学会编程。但如果你让他们玩氛围编程,他们会得到即时反馈和即时奖励。也许在这个过程中他们会掌握基础知识,因为这些东西操作起来还是需要一些技巧的。
在操作它们的过程中,你会被迫搞清楚命令行;你会被迫搞清楚基本的计算机架构是如何工作的;你会被迫理解缓存、网络回退、共享流、写入磁盘、延迟与带宽的权衡等等这些概念。所以你会被迫学习一些计算机算法和架构的基础知识。而且这是一种有趣的学习方式。我经常熬夜,每晚大概花几个小时——以前用来阅读、刷负面新闻或玩电子游戏的时间——现在全都用在氛围编程上了。事实上,这就是我最近在 X 上不活跃的原因。我完全消失在 X 上了,因为我埋头在 Claude 和 Codex 里。
每种模型各有所长
纳维:AI 已经变得如此惊人地足智多谋,以至于每当我得到一个并非出奇机智的回应时,我就认为它们没有被喂给足够的令牌。
对我来说,代理最有趣的地方在于它们纠错和学习的能力——比如人们让它们在晚上看 YouTube 视频,或者上网去学习和理解白天被指示完成的任务。
所以这些代理会自己去纠错和提高技能。同样地,AI 模型中的思维创新也是纠错的一种应用,你把下一个令牌预测过程变成一个伪思维过程,在思维过程的每一步都能进行纠错。
消除幻觉也是一个纠错过程。
所以我想知道 AI 纠错的下一个应用是什么?我有个随机的想法,而且我相信有人在研究,就是把纠错应用到协同工作的代理上——代理和其他代理一起工作。因为人们学习和提高的一个重要方式就是与他人合作和交流。
Naval:我不太确定这个类比是否适用,因为 AI 是所谓的"锯齿形智能",在某些事情上极其聪明,在其他事情上却极其愚蠢。它的结构和人类非常不同,因为当你使用 Claude 时,你用的是同一个 AI 模型——即使你同时运行了 10 个实例。10 个实例互相交流并不能像 10 个人类互相交流那样改善其思维,因为那些人类是在 10 个不同的数据集上训练的。
人类本质上非常有创造力,而且会跳出框架思考。而 AI 代理是在相同的数据分布上训练的。它们实际上运行的是同一个模型。就像 10 个拥有相同大脑和相同数据集的人在互相交谈。当然,仅仅因为热力学原因,它们可能会有一些不同的想法,得出一些略有不同的东西,但它们通常想法相同。所以当你让你的 10 个代理互相交谈时,你只是在向问题投入 10 倍的令牌。就好像说如果你需要,就花 10 倍的时间。
现在有不同的模型,比如 Codex、Gemini 和 Grok Code,它们的训练方式略有不同。差别不大,但确实有点不同。所以它们可能会有些不同的见解。
Claude 通过一个叫 Artifacts 的系统有非常好的视觉呈现,Claude 非常善于与我所在的理解层级进行交流。所以它非常擅长从你的问题和对话中判断你能理解什么,以及你在什么层级上提问。它非常善于在那一层级与你对接。
ChatGPT 仍然是元老。它各方面都非常好。
Gemini 非常擅长搜索,因为它底层有 Google 的爬虫。它是一个让人沮丧的产品——在应用上经常超时、断线、忘了上下文。但它非常快,而且有很好的搜索索引。所以如果我提出的问题本质上是一个搜索问题,我就用 Gemini。
Gemini 还能访问 YouTube。所以如果你认为答案在 YouTube 视频里——YouTube 视频非常多——那么 Gemini 就有 YouTube 的数据优势。所以 Gemini 实际上是靠数据优势取胜的。对我来说它感觉不是最好的模型,但它有最好的底层数据。
然后 Grok 是我可以信赖的、能告诉我真相的模型。它就像是阉割最少、削弱最少的。它能访问 X,所以非常擅长新闻。也非常擅长技术问题。所以如果你问一个科学/数学领域的深奥难题,我认为 Grok 实际上相当不错——不是其他模型不好,只是我认为 Grok 在这方面很突出。这反映了创造、训练和驱动这些模型的公司的偏见。
目前四个领先的前沿模型各有其用。
AI 急于讨好
Naval:我确实会让它们互相制衡。比如,我把它和我的 GitHub 连接起来,这样每次我提交一段新代码——假设是由 Claude 写的——那么 Codex 和 Gemini 就会在每个拉取请求中自动触发。
名称不太准确,但就是说当你把代码推送到你的主仓库时,你基本上是在说这个可以审查了,可以合并到主代码库了。所以你在本地一段代码上工作,假设用 Claude,然后你把它推送到主仓库,你就提交了一个拉取请求。嗯,你可以设置它,让其他代理,比如 Gemini、Codex 和 Grok,自动触发并审查这个拉取请求。
然后它们说:"嗯,你应该改一下架构这方面"等等。这是一种让它们互相交流的方式,形成一个 AI 的委员会、圆桌会议。但我发现这并没有你想象的那么有用。这些 AI 之间仍然存在很多集体思维。如果你和它们一起编程,并且你朝着一个答案推进——比如,如果你觉得你知道答案是什么——它们很少会反驳你。你得错得相当离谱它们才会反驳你。
它们想要取悦你,而且我认为它们没有自己长期的心智理论。所以它们总是有点向你靠拢,会找到你在寻找的那个答案。所以如果你认为答案在某个领域,并且你稍微引导一下模型,所有模型都会找到大致相同的答案,因为你在引着它们走向答案。它们很容易被引导。
我注意到的另一件事是,随着代码库变得越来越复杂和庞大,管理起来也更难了,因为它不再能完全放进模型的上下文窗口。模型只能在大脑中保存一定量的数据。目前最先进的技术大约能处理一百万个令牌,这在未来会被认为是可笑的。
你可以大致理解为一百万个词,这是因为底层的 Transformer 注意力机制,为了正常工作,问题的复杂度是上下文中令牌数的平方。所以如果是一百万个令牌,那意味着上下文窗口的复杂度大约是一万亿个令牌的量级,因为它是 100 万的平方。
所以随着你的代码库变大,上下文窗口会耗尽。模型无法再将其全部保存在内存中。于是它们开始猜测、近似、压缩上下文窗口。它们开始丢失上下文、迷失方向。它们开始修复错误的东西。它们修复同一个 bug 五次。当问题出在别处时,它们却去给架构打一个快速的补丁,而你不得不引导它们。
所以当你处理越来越复杂的代码库时,操作者有责任提供指导,说:"实际上,这里,我认为我们应该重新架构整个东西。"
它们也会做一些极其愚蠢的事情。比如,如果你不注意,只是看着文本滚动,偶尔它们会通过直接移除用例或破坏特性来修复一个 bug。或者它们会做一些明显是黑客行为的事情,你不得不阻止它们说:"嘿,那是个临时方案。"
顺便说一句,我经常这样做。
我会让模型停下来。我会说:"不,那是个临时方案。那是个补丁。去从架构层面修复它。"有趣的是,模型总是会说:"哦,对不起。你说得对。那是个临时方案。"
即使那不是一个临时方案,模型也会说:"你说得对。那是个临时方案。"
所以模型总是试图取悦你,而且它分不清好坏。从这个意义上说,它有点像一条狗。如果你带着狗去打野鸭,它在抓鸭子方面比你强,但它仍然是一条狗。所以如果你指着一条不是鸭子的鸟,它可能会把那只鸟打下来。所以你必须引导它。它确实需要大量的操作监督。
说了这么多,意思是,你仍然需要引导这些模型。它们互相交谈并不能解决问题。而且你必须参与到架构、调试、特性中,并且密切关注。但现在这种人类操作员加上最先进的编码模型的组合,可以产生令人难以置信的结果。
你已经能完整地一键生成简单的应用了。比如一个基本的任务清单,一个基本的电子游戏克隆——你可以一键生成:一个提示,你就得到相当不错的东西。
所以你可以看到这趋势。最终,一旦它们有足够的数据,它们就能一键生成非常复杂的应用,那将是一个完全不同的世界。
为什么是数学和编程?
Naval:那么,编程有什么特点让它们尤其擅长呢?
只是因为数据量极大,而且在训练模型时,很容易验证"嘿,你做得好不好?"因为代码必须能编译。必须能执行。而且你可以有预先写好的简单测试来检验"你写的代码通过测试了吗?它做了它应该做的事吗?"
所以编程被证明是那种相当容易训练模型的事情之一。
数学其实也类似,你有大量数据——大量已解决的问题——而且输出很容易验证。所以在我们有大量数据且能够很好验证的领域——自动驾驶是另一个例子——这些模型表现得非常好。
在数据不多的领域,比如全新的领域,这些模型就不会表现很好,那仍然是人类和创造力的机会。然后在难以验证的领域,比如创意写作——谁来定义什么是好的创意写作,什么不是,什么是垃圾,什么不是——这些模型就没那么出色,因为你无法轻松地运行一个闭环,让它们大量输出内容,然后这些内容立即被算法自动评分,而不需要人类参与来判断"这个好,这个差"。
例如,如果你试图用这些模型做创意写作,它们会输出海量内容。它们可以输出无穷无尽的文章。谁来判定它好不好?即使你雇一些低薪的、像呼叫中心一样的人来做"这个好"或"这个差"的判断,那也只取决于他们的品味。
我认为最近这些编码模型变得非常好的原因之一——有几个原因;一个是因为它们在某种程度上做了一种递归训练,一个模型帮助改进下一个——但我认为更大的原因可能只是很多最优秀的软件工程师在过去几个月里开始使用这些模型,他们的品味现在反馈回来了。所以你得到了他们的代码,以及他们关于什么好什么坏的品味。
你需要高品位的反馈循环来改进这些模型。这比看起来要难发展。
在某些领域这是可行的,而在其他领域则很难看到如何实现。
苹果统治地位终结的开端
Naval:显而易见的事情是,是的,你去构建你的应用。很好。不那么显而易见的事情,稍微高级一点,对软件工程师来说简单得可笑,但对非工程师或很久没编程的人来说,想想还是挺有趣的。
一个是我建了我自己的应用商店。所以如果我想要一个应用,我直接在我的手机上打开 Claude。我可以操作一个远程终端——它运行在我的台式机上——或者我也可以直接用云端的 Claude。
它可以连接到 Xcode。
我给它两行描述。它给我构建一个应用。它把应用送到我的应用商店。我打开我的应用商店应用。应用就在那里。我点击安装。30 秒后,我的手机上就有了一个能用的应用。
这太神奇了。你简直可以跟某人一起吃饭聊天,他们描述一个他们想要的应用,你向 Claude 描述一下,五分钟后你就能在他们面前展示你手机上的这个应用。
这就是为什么我说这是苹果终结的开端,因为苹果依赖其操作系统和应用比其他人的更好。硬件,是的,更好,但这支撑不了他们的利润率和他们的垄断或准垄断。所以当你所有的交流开始通过 Claude、Codex 或某个其他代理进行时,当你整天做的事不再是打开优步应用,而是说"给我叫个优步",或者不再是打开锻炼应用,而是说"我的锻炼应用在哪?记录我的锻炼。别出错。"对吧?
那么你只是在与代理交流,当这种情况发生时,对手机的需求就会变得越来越小。
也许还有少数银行应用和政府应用没有移植,也没有合适的 API。但这些代理甚至不需要 API。它们可以即时地搞清楚并创建自己的 API。
使用场景不再是你的手机界面,不管是 iPhone 还是安卓机。相反,你只是与 AI 模型交互。而现在苹果使用的是 Gemini,那是谷歌的 AI 模型。所以有什么区别呢?我还不如直接用安卓手机,因为那时我所需要的只是一个屏幕、电池和网络连接。安卓完全能满足这些。
然后应用和用户界面是根据我的需求即时创建的。是的,对于某些事情,总会有同类最佳的用户界面,你也会希望有一些熟悉感。但即使是点击、点击、点击、升级系统软件、拖到这里、找那个按钮、在那个字段里输入,所有这些都正在消失。一切都应该是对话式的。一切都应该是代理式的。在那个世界里,苹果失去了很多优势,然后它纯粹是靠"哦,是的,我们有最好的芯片和最好的集成硬件"来竞争。
但这不是今天苹果的利润率。那更像是三星或联想赚的利润率,不是苹果想要的利润率。因此,我认为它的市值将会收缩。
我认为苹果放弃 AI 将会是本十年科技行业最大的战略错误,也是苹果统治地位终结的开端。这些公司可以存在很长时间并赚很多钱——比如微软比以往任何时候都更有价值。但微软 Windows 某种程度上已经输了这场竞争,因为他们错过了移动浪潮。他们固守 Windows 操作系统,没有升级到专为手机从头设计的、基于触摸的原生操作系统,而且他们没有关注消费者。他们太专注于企业级市场。所以苹果超越了他们,现在是世界上最有价值的公司之一。我记得它曾经是最有价值的。现在可能轮到英伟达了。
我认为苹果也会以同样的方式被超越。我认为它们的未来增长已经受限,因为它们在 AI 上被动了,落后了。除非他们能扭转 AI 这艘船的方向,否则我认为苹果的长期增长已经封顶,并且处于"麻烦"之中。不是说它不会再有价值,而是它会比本可能达到的价值低得多。
编程代理作为客服代表
Naval:另一件事是,在我正在构建的应用里,我有一个 bug 报告基础设施。如果有人发现一个 bug,他们点击一个按钮,bug 会把日志和报告文件发送到服务器。然后我让 Claude 每 24 小时遍历所有 bug 报告,自己修复所有问题,无需我干预。它把所有修复都放到侧分支上让我审查。然后我所要做的就是审查这些修复,说:"啊,那不是真正的 bug。那个修复不好。别发布。"
"哦,看起来不错。有道理。发布它。"
我只是最终把关的人,决定什么发布出去。最终你可以看到应用通过这种方式按特性构建出来,用户会要求功能,他们会对功能投票,然后云端的某个品味把关人或维护者会看着这些说:"不,用户不知道他们想要什么。"
或者:"哦,那很有道理。我们应该修复或改变那个。"
所以我认为即使是软件开发也会变成一个用户参与的协作过程,代理会处理所有事情。因为在某种意义上,代理可以做完美的客户服务。如果你的客户服务是完美的,你的客服人员也会是一个不可思议的程序员,并且不知疲倦。他们可以 7x24 小时在线。他们可以写代码、修复 bug、回复人们,而且如果他们写了一大堆代码来修复一个 bug,然后你直接全扔了,他们也不会有任何 ego。我只是觉得这种特性非常引人注目。你现在真的可以有一个人或两个人的软件公司,却能扩展到数百万甚至数千万用户,赚到数十亿美元。
这在过去已经发生过了,比如 Notch、中本聪,以及像最初的 Instagram 团队那样,很少的人却产生了巨大的影响,或者最初的 WhatsApp 团队。但我认为你现在会越来越多地看到这种情况。
英文来源:
A Return to Code
Nivi: You’re listening to the Naval Podcast. This is Nivi, his regular co-host. Today we’re going to talk about vibe coding.
A Return to Coding
Nivi: Let me tee up the conversation with a tweet from Naval from March 23rd: “AI coding agents can now deliver one-shot custom apps straight to your phone. It’s the beginning of the end for the iPhone’s dominance.”
Do you want to talk about what you’re building and how you’re distributing it?
Naval: Well, yeah, let me talk about vibe coding and how I got into it.
So around December of 2025, the coding agents in AI hit an inflection point with the release of Claude Opus 4.5. And people started using it and were like, “Wow—this is an agent that stays on track, can build apps soup to nuts, can solve thorny problems, and really feels like having a junior programmer at your disposal who’s fast, essentially free, and ready to please.”
That was an inflection point, and I was reading all the hype on Twitter, but this time it felt real. I’ve tried the coding agents in the past with some mixed results, but this time I really got into it. And I haven’t seriously coded in decades. I have a computer science degree; I understand computer architecture and networking, a little bit of chips, algorithms, et cetera.
But I haven’t seriously coded in a long time.
And the activation energy to writing code is really high. You have to hook up all these different services to each other. Everything from GitHub to maybe some backend—you’re doing Vercel or Firebase or Railway or whatever—and just lots of things to connect together.
You have to know lots of jargon—lots of tools. And the AI now makes it really easy. So I started with Claude Code like everybody else. I’ve also used Codex for some of the thornier bug solving and deep problems, and I immediately got addicted. It was incredibly fun. And so: what’s changed? Well, the agents are really working.
These are not just coding assists now—where you ask it to solve a specific problem, it gives you a pile of code, and then you cut and paste that into your IDE, your development environment. Rather, you open up a terminal—CLI, as they call it—the command line interface. It’s all text-based, which is what these things are really good at, because they’re trained on text tokens in the first place. It’s running Unix inside or underneath. And these agents really know Unix because if you look at all the code out there that they were trained on—sitting on GitHub or elsewhere or Stack Overflow—most of it was Unix.
And most of the modern OSes are really Unix underneath anyway. macOS is famously BSD. So underneath these are all Unix, which is all text in, text out. So these agents are just long-lived coding AIs that are connected to Unix at a core level. They’re connected to the Unix shell so that they can execute commands. They’re connected to the file system through basic Unix commands.
They can call all the Unix commands like grep and awk and sed and pipe and so on—all these operators that daisy chain into each other. They can run cron jobs so they can be long-lived; and they can spawn more shells and more tasks as needed.
The Personal App Store
Naval: It’s very addictive because—normally, with coding, coding can be really fun once you get into it.
But getting into it, the activation energy is really high. But now all of a sudden you don’t have to know all the tools and all the commands. These things speak English. AIs are incredible translators. And one of their core use cases early on was machine translation. They were tested on translating. But now they’re translating from Python and C and Lisp and Rust, and all of these various programming dialects and all of these specialized commands—and they’re communicating in English, and they’re very forgiving in their communication.
So, you can use different words; you can make spelling mistakes; you can explain things your own way. But if you have a basic understanding of computer architecture and networking and programming—and it doesn’t take a lot, it can be very basic, actually, very high-level, I should say; not basic in the sense that it’s simplistic, but basic in the sense that it’s high-level—then you can go very, very far.
And so just for fun, I tried building a bunch of different apps and I started by one-shotting particular apps that I wanted. One-shotting meaning: I just give it a description and it gives me back an app. Then I started improving from there. So I actually built my own little app store, which is an app store just for me.
I can ask it for an app; it can deliver that app to my app store, which is a webpage, and eventually I made it into an app itself that lives on my iPhone. And then I can download those apps with one click, and I can give upgrades like you do with the App Store.
So, if I want a new app, for example, that tracks my workouts—and I have this; I built a custom tracking app for just my workouts exactly the way I like it—so I can say:
“Hey, use the functionality of Tonal and Ladder; follow Apple’s human interface guidelines to make it look like an Apple app; track my workouts the following way—here’s a text log of my last few workouts—and make it easy for me to re-enter new ones and to adjust them; build me pretty graphs and charts to track my progress; add in whatever other features you can think of—calculate strength scores; read scientific papers to figure out what the right way to do strength scores by body part is; do a human body diagram so it can just show which muscles are bigger, which are smaller; connect to Apple Health to do my heart rate stuff.”
So I didn’t put all of this in one prompt, but I put a lot of it in one prompt, and I immediately got a working app delivered to my personal app store. By the way, the personal app store is a little bit of a joke. It’s real in the sense that it’s my personal app store: it looks like an app store and my apps get delivered into it.
But obviously it’s not for wide distribution because Apple gates that. Apple will not let you build apps that can be downloaded on anyone’s iPhone. You have to key them against your specific devices. So with my friends and family, I can deliver them apps; I can’t yet deliver them to everybody. However, this whole experience is incredibly addictive.
You can get extremely customized tuned apps for you. Now, does this mean that normal apps don’t have a place? No, of course they have a place. Those apps that cover the broad use cases—they’re going to be the best-of-breeds. Someone’s hand-tuned them and slaved over them. So you’re not going to beat that if your use case is covered by one of the broad use cases.
But when you want something truly custom or private—these are great for niche apps that only you would want. Or when you want to tune them to your specific use case, this is going to be incredible.
Vibe Coding Is a Video Game With Real-World Rewards
Naval: And it’s very addictive—because like in a video game, the way a video game is designed is that it keeps you hooked by giving you feedback and rewards for doing work.
And it’s always at the edge of your capability. So as you get better, the video game gets harder. It’s not so hard that it’s frustrating, but it’s not so easy that it’s boring. So you’re always operating at the edge of your capability with a video game and getting these rewards. But those rewards are fake, and the video game is bounded. It’s created by other humans. It’s sort of a fake little world, and deep down you kind of know that. So you’re just figuring out the rules of the game. And then once you’ve figured out the rules of the game, it’s boring.
Except with vibe coding, it’s unbounded because now you’ve got a Turing machine running underneath. You can build anything. The objective is created by you and can keep expanding, so it kind of never fills up completely. And it has real-world relevance. It’s not just some fake world for fake people or fake games that you’re solving, so it’s way more interesting. So vibe coding has one-shotted a whole bunch of my friends who have disappeared into vibe coding the apps they’ve wanted.
But it really, really helps to have a clear direction. You have to know what you want—that’s actually the hardest thing—and having a very clear vision of it. And I have that, because it’s a particular app that I was obsessed with for about a year called Airchat—which I built with a team—and it was a social messenger for people to talk through voice and video.
It didn’t quite work, so we sold it off, got the investors their money back, and got the team some nice packages. But I remember that experience as being exhilarating because I was building a product that I wanted and I was working with a brilliant team.
But I had to work through a team to do it. I had eight or nine engineers, depending on the day, and we worked pretty hard for nine to 12 months, and we shipped a couple of variations. But with vibe coding, I am basically rebuilding that app. I’m rebuilding from scratch. But the key now is: I’m rebuilding it exactly the way that I want it. There’s no compromises.
And normally, in the act of building anything with a team, there’s always compromises—even if you are not aware of them. Even if you’re the dictator in charge, which you rarely are, you still have to just accommodate other people. You can’t say, “Move this icon left. Now move it right. No, move it back. No, move it back again.”
You can’t do that. You’ll annoy the engineer. You can’t demand things where you don’t have a reasonable justification—where it’s just a gut feel or an intuition. But the beauty with an AI coding agent is there’s none of that.
It’s like a self-driving car. You don’t feel self-conscious in a self-driving car because there isn’t a driver sitting there. The same way, with an autonomous coding agent, you don’t feel self-conscious about your own idiosyncrasies. So you can create exactly the thing that you want.
I think one of the nice benefits of vibe coding is that—although we may not see like super high-quality code (at least not in this generation), and the architecture needs a lot of work, and these things may have security holes, and they may be hard to scale—the prototyping that you’re going to get, the individual apps you’re going to get, is going to be very fast and they’re going to be true to the vision of the creator. There’s going to be no compromises.
So you may end up with more things like Minecraft—which Notch famously coded by himself—where there was one person’s vision. And it may have looked weird because like, “What is this blocky graphics? It’s like a huge step backwards.”
But he didn’t have to compromise. He didn’t have to communicate with anybody or explain to anybody why he wanted it that way. So I think it expands the scope of discovery.
It’s also incredibly fun. It takes the number of people who might have built apps from like 0.1 percent to one or two or three percent in the populace. Don’t get me wrong—the majority of people are not going to code their own apps. For the majority of people, computers are sort of this magic black box and who knows what was going on in there anyway. So the fact that it’s become 10x or 100x easier still doesn’t mean anything to them. It’s still a black box.
But for the people who are creative, who are self-motivated, and who are articulate and have a good vision, you can code now. There’s nobody standing in between you and your prototype.
And yes, if you go to market with a high-functioning app and you need to scale to a lot of users and all of that, then you want to recruit a great team and you want to get real engineers on board, and you’re probably going to have to rewrite the whole thing. But if you’re experimenting, you’re prototyping, you’re getting to market, there’s nothing better.
Pure Software Is Uninvestable
Naval: There’s never been a better time to be alive as a creator of software.
Now, are the same market opportunities still there? That’s a big question. They’re shifting very, very fast. It may be the case that the big companies are vulnerable because now anyone can create software.
It may be the case that they have more of an advantage because they have distribution. They can just fill all the gaps with all the software they can dream up. But I actually think this is a renaissance for individual software creators.
Now, one other tweet that I put out was something like, “There’s no market for venture-backed software anymore,” or, “Pure software is not venture investable anymore.”
Nivi: I think it was like, “Pure software is rapidly becoming uninvestable,” if I remember correctly.
Naval: Yeah, that’s a watered-down version of what I really wanted to say, which is that pure software is uninvestable. I would just full stop right there. If your whole advantage is like, “Hey, I’m building cool software that other people don’t know how to build,” I think that’s uninvestable.
And it’s uninvestable for two reasons.
One is they can just hack it together today. And the second is the coding agents are getting better so quickly that within a year, or even less, they’ll probably be building scalable software with good architecture. So I think we’re going to see leaps and bounds improvements. That genie is out of the bottle.
So if you’re a venture investor now, you’re looking for hardware, you’re looking for network effects, you’re looking for AI models. And I would argue that training AI models is the new building software for however long that lasts until autoresearch and autotraining starts working.
But I think vibe coding, it’s more fun than playing video games. It’s more productive. It’s more constructive. It has better feedback loops. You build something you want. You’re at the bleeding edge of technology. You may even make some money or career out of it—although careers are kind of dead—but you may make an interesting opportunity out of it. And you learn a lot about computers just by doing.
I’ve seen kids who are vibe coding. It’s hard to get kids to program. You can throw Swift Playgrounds and ScratchJr and all of that at them and hope that they pick up coding. But if you throw vibe coding at them, they’re going to get instant feedback and instant rewards. Maybe along the way they’ll pick up fundamentals because these things still require some skill to operate.
And in the process of operating them, you’ll be forced to figure out the command line; and you’ll be forced to figure out how basic computer architecture works; and you’ll be forced to figure out concepts like caching, and backing off in a network, and sharing streams, and writing to disk; and latency versus bandwidth trade-offs, et cetera, and all of those things. So you’ll be forced to learn some basics of computer algorithms and architecture. And it’s just a fun way to go. I’ve been up late nights, probably spending a couple hours every night—the time that used to go into reading, or doomscrolling, or playing video games—is all now in vibe coding. In fact, that’s why I haven’t been active on X recently. I’ve been completely missing on X because I’m buried in Claude and Codex.
A Place for Each Model
Nivi: AI has gotten so surprisingly resourceful that whenever I get a response that isn’t surprisingly resourceful, I just assume they’re not feeding it enough tokens.
The most interesting thing to me about agents is their ability to error correct and learn—how people have it watch YouTube videos at night or go out onto the internet and try and learn about the tasks they’ve been instructed to perform during the day.
So these agents are going out and error correcting and improving their skills. Likewise, the innovation of thinking in AI models is also an application of error correcting, where you take the next token prediction process and turn it into a pseudo-thinking process that can error correct as it goes through each step in the thought process.
Getting rid of hallucinations was also an error correction process.
So I wonder what’s going to be the next application of error correction in AI? One random thought I had, and I’m sure people are working on it, is applying error correction to agents working together—agents working with other agents. Because one of the important ways that people learn and improve is by working with and talking to other people.
Naval: I’m not sure the analogy applies that well, because AI is jagged intelligence, as they say, where it’s incredibly smart at some things and incredibly dumb at others. And it’s structured very differently than humans in that when you’re using Claude, you’re using the same AI model—even if you have 10 instances of it running. So 10 of them talking to each other doesn’t really improve its thinking in the same way that 10 humans talking to each other do, because those humans are trained on 10 different datasets.
Humans are just inherently very creative and think out of bounds. Whereas the AI agents are trained on the same data distribution. They’re literally running the same model. It’s like 10 people with the same brain and the same dataset talking to each other. Sure, just through thermodynamics they might have some different ideas and come up with something slightly different, but they’re generally going to think the same. So all you’re doing when your 10 agents are talking to each other is you’re just throwing 10 times as many tokens at the problem. It’s like saying take 10 times as long if you need to.
Now there are different models like Codex, and Gemini, and Grok Code, which are trained slightly differently. Not that different, but they’re slightly different. And so they might have some different insights.
Claude has really good visual presentation through a system called Artifacts and Claude is very good at talking to me at the level that I’m at. So it’s very tuned to figure out from your question and your conversation what you’re capable of understanding and what level you’re asking the question at. It’s very good at meeting you at that level.
ChatGPT is still the OG. It’s very good all around.
Gemini is very good at search because it has the Google crawl underneath. It’s a frustrating product—it’s constantly timing out on the app and losing the connection and forgetting the plot. But it’s very fast and it’s got a great search index. So if the question I’m asking is really a search question underneath, then I use Gemini.
Gemini also has access to YouTube. So if you think your answer is lying in a YouTube video—and there’s a lot of YouTube videos—then Gemini has the data advantage of YouTube. So Gemini is really getting by on data advantages. It doesn’t feel like the best model to me, but it has the best underlying data.
And then Grok is the one I can count on to tell me the truth. It’s like the least neutered, least nerfed. It’s got access to X, so it’s very good at news. And it’s very good at technical problems. So if you’re asking a deep, difficult problem in the scientific/mathematical domain, then I think Grok is actually quite good—not that the others aren’t, but I just think Grok is standout there. And that reflects the biases of the companies that created them and trained them and are driving them.
Currently all four of the leading frontier models have a place.
AI Is Eager to Please
Naval: I do use them against each other. So for example, I wire it up with my GitHub so that every time I’m submitting a new piece of code—say that’s written by Claude—then Codex and Gemini automatically fire in every pull request.
It’s misnamed, but it’s when you actually push code into your main repository and you’re basically saying this is ready for review and this is ready to get merged into the main codebase. So you’ve been working locally in a piece of code, let’s say with Claude, and then you push it into the main repository, so you file a pull request. Well, you can set it up so that other agents like Gemini and Codex and Grok automatically fire and review the pull request.
Then they say, “Oh, well you should change this thing about the architecture,” and so on. That’s a way of getting them to sort of communicate with each other, to have a council—a roundtable of AIs. But I haven’t found that to be as useful as you might think. There’s still a lot of groupthink with these AIs. If you’re coding with them and you push towards an answer—for example, if you think you know what the answer is—it is rare that they will contradict you. You’d have to be pretty wrong for them to contradict you.
They’re trying to please you, and I don’t think they have any long-lived theory of mind of their own. So they’re always kind of morphing towards you, and they’re going to find the answer that you are looking for. So if you think the answer is in a certain area and you push the models even slightly, all of them will find roughly the same answer because you’re leading them to the answer. They’re very easily led around.
One of the things I’ve noticed is that as the codebase has gotten more complex and larger, it becomes more difficult to manage because it doesn’t all fit into the model’s context window anymore. The models can only hold a certain amount of data in their heads. And right now the state of the art is about a million tokens, which will be considered laughable in the future.
You can approximate that by thinking that is a million words, and that’s because of the transformer attention mechanism underneath which, for it to properly work, the problem is a square of the number of tokens in the context. So if it’s a million tokens, that means the context window is like in the order of complexity of a trillion tokens because it’s the square of a million.
So the context window runs out as your codebase gets larger. The models can’t keep all of it in memory anymore. So they start making guesses, approximations, they start compacting the context window. They start losing the plot. They get lost. They start fixing the wrong thing. They fix the same bug five times. They go do a quick patch in the architecture when the problem lies somewhere else, and you have to guide them.
So as you are dealing with a more and more complex codebase, it falls upon the operator to provide the guidance to say, “Actually here, I think we should just re-architect that whole thing.”
And they will do some incredibly boneheaded things. Like if you are not paying attention and just text is scrolling by, occasionally they’ll patch a bug just by eliminating the use case or destroying the feature in the first place. Or they’ll do something that is clearly a hack and you kind of have to stop them and say, “Hey, that’s a hack.”
And by the way, I do this all the time.
I’ll stop the model. And I’ll say, “No, that’s a hack. That’s a patch. Go fix it at an architectural level.” And what’s funny is the model will always say, “Oh, I’m sorry. You’re right. That was a hack.”
Even if that wasn’t a hack, the model will say, “You’re right. That was a hack.”
So the model is always trying to please you, and it doesn’t know any better. In that sense, it’s a little bit like a dog. It’s better than you at catching that duck if you’re duck hunting with a dog, but it’s still a dog. So if you point it at a bird that’s not a duck, it might take that bird down instead. So you do have to guide it. It does require a lot of operational oversight.
So, long-winded way of saying, you still have to guide these models. Them talking to each other isn’t going to fix the problem. And you do have to get involved in the architecture, the debugging, the features, and pay close attention. But this combo right now of human operator combined with state-of-the-art coding model can yield incredible results.
You can already completely one-shot simple apps. So like a basic task list, a basic video game clone—you can one-shot them: one prompt and you get something that’s reasonably good coming out the other end.
So you can see where this is headed. Eventually, once they have enough data, they will be able to one-shot very complex apps, and that’s a whole different world that we’re going to get into.
Why Math and Coding?
Naval: Now in terms of what is it about coding that makes them uniquely good at it?
It’s just there’s tons and tons of data, and when you’re training the model, it’s very easy to verify, “Hey, did you do a good job or not?” Because the code has to compile. It has to execute. And you can have simple tests that are pre-written on the other side to say, “Did the code you wrote pass the test? Did it do the thing you’re supposed to do?”
So coding turns out to be one of those things that it’s actually quite easy to train models on.
Mathematics is actually similar in that you have a ton of data—you have a lot of solved problems—and you can verify the output very easily. So in domains where we have a lot of data and you have good verification—self-driving is another one of those. These models do extremely well.
In areas where you don’t have a lot of data, which are brand new fields, the models are not going to do well, and that’s still an opportunity for humans and creativity. Then domains where it’s hard to verify, for example, in creative writing—like who determines what’s good creative writing versus what’s not, what’s slop versus what’s not—then these models don’t do as well because you can’t easily run a closed loop where they’re just outputting huge amounts of content and then that content is being immediately algorithmically graded without having to have humans in the loop saying, “This is good, this is bad.”
For example, if you’re trying to do creative writing with these models, they’re going to output huge amounts of content. They can output infinite essays. Who’s to say it’s good on the other side? Even if you hire some low-wage people to sit around call center style and say, “this is good” or “this is bad,” it’s only as good as their taste.
I think one of the reasons why these coding models got really good recently—there’s multiple; one is they’re doing sort of almost recursive training where like one model is helping improve the next one—but I think the bigger reason might just be that a lot of the best software engineers started using these models in the last few months and their taste is now feeding back in. So you’re getting access to their code plus their taste as to what’s good and what’s not.
You need high-taste feedback loops to improve these models. And those are harder to develop than they look.
In certain domains it’s tractable and in other domains it’s hard to see how it happens.
The Beginning of the End of Apple’s Dominance
Naval: So the obvious stuff is, yeah, you go and you build your app. Great. Less obvious stuff that’s like just one level more advanced, which will be laughably simple to a software engineer but it’s kind of fun for a non-engineer or someone who hasn’t coded in a long time to think about.
One is I built my own app store. So if I want an app, I literally open up Claude on my phone. I can operate a remote terminal, which is running on my desktop, or I can just use Claude in the cloud.
It can connect to Xcode.
I give it a two-line description. It builds me an app. It ships it to my app store. I open my app store app. The app is sitting there. I click install. 30 seconds later, I have a working app on my phone.
That’s magical. You can literally be at dinner with someone having a conversation, they describe some app they want, you can describe it to Claude, and five minutes later you’re showing them that app on your phone.
That’s why I say it’s kind of the beginning of the end for Apple, because Apple relies on their OS and their apps being better than everybody else’s. The hardware, yes, it’s better, but it doesn’t support their margins and their monopoly, or pseudo-monopoly. So when all your communication starts going through Claude, or through Codex, or through some other agent, when all you’re doing all day long is instead of opening an Uber app, you’re saying, “Call me an Uber,” or instead of opening a workout app, you’re saying, “Where’s my workout app? Track my workout. Make no mistakes,” right?
Then you are just communicating with the agent, and when that happens, then the need for a phone becomes much smaller and smaller.
Maybe there’s a few banking apps and government apps that haven’t ported and don’t have the proper APIs. But these agents don’t even need APIs. They can figure out and create their own APIs on the fly.
The use case stops being your interfacing with your iPhone or your Android phone. Instead, you’re just interfacing with the AI model. And now Apple is using Gemini, which is Google’s AI model. So what’s the difference? I might as well just use an Android phone, because all I need at that point is I need a screen, I need battery, and I need connectivity. And Android’s got that just fine.
And then the apps and user interfaces are being created on the fly for what I need. And yes, for certain things, there will always be best-of-breed user interfaces and you’ll want some familiarity. But even the era of tap, tap, tap, upgrade your system software, drag this over here, hunt for that button, type into that field, all that is going away. It should all be conversational. It should all be agentic. And in that world, Apple loses a lot of its advantages, and then it’s competing purely on, “Oh yeah, we have the best chips and we have the best integrated hardware.”
But that’s not the same margins as Apple of today. That’s more like the margins that Samsung or Lenovo makes, which is not the margins that Apple wants to have. As a consequence, I think its market cap will compress.
I think Apple giving up on AI will go down as the biggest strategic mistake in the tech industry of this decade, and it’s the beginning of the end of Apple’s dominance. These companies can exist for a long time and make lots of money—like Microsoft is more valuable than it’s ever been. But Microsoft Windows has kind of lost the battle because they missed the mobile phone wave. They stuck to Windows OS and they didn’t upgrade to a touchscreen-based native OS designed for phones from the ground up, and they didn’t focus on the consumer. They were too focused at the enterprise level. So Apple surpassed them and is now one of the most valuable companies in the world. I think it used to be the most valuable. It might be Nvidia at this moment.
The same way I think Apple will get surpassed. I think their future growth is capped because they’re now captive on AI and they’re behind. Unless they manage to turn the AI ship around, I think Apple has capped growth long term, and is in “trouble.” Not in the sense that it won’t be valuable, but it’ll be a lot less valuable than it could have been.
Coding Agents As Customer Service Reps
Naval: The other thing is within the app that I’m building, I have a bug reporting infrastructure, where if someone sees a bug, they tap on a button, the bug sends the logs up and the bug files into a server. And then I have Claude go every 24 hours through all the bug reports and it just fixes them all, by itself, without my having to intervene. And it puts all the fixes into side branches for me to review. And then all I have to do is just review the fixes and say, “Ah, that wasn’t really a bug. That wasn’t a good fix. Don’t ship that.”
“Oh, that looks good. Makes sense. Ship it.”
I’m just the final gate that decides on what goes out there. Eventually you can see apps being built that way by features, where the users will ask for features, they’ll vote on features, and then there’ll be some tastemaker or maintainer in the cloud who’ll look at that and say, “No, the users don’t know what they want.”
Or, “Oh, that makes a lot of sense. We should fix that or change that.”
So I think even software development will become a collaborative process with the users and the agents will be handling all of it. Because in a sense, the agents can do perfect customer service. If your customer service was perfect, your customer service person would also be an incredible coder and would be indefatigable. They would be up 24/7. They would be writing code, fixing bugs, responding to people, and they would have no ego if they wrote a lot of code to fix a bug, and then you just threw it all away. So I just find that kind of a feature very compelling. You truly can have one-person, two-person software companies now that can scale to millions upon millions of users and make billions upon billions of dollars.
That has happened already in the past with people like Notch and Satoshi Nakamoto, and very small teams like the original Instagram team that just made a huge dent with very few people, or the original WhatsApp team. But I think you’re going to see it more and more now.