AI周刊第509期：AI生产力：最适用于那些即将失业的人

qimuai 发布于 2026-6-30 14:01 阅读：0 一手编译

内容来源：https://aiweekly.co/issues/ai-productivity-it-works-best-for-the-people-losing-their

内容总结：

AI生产力真相：效率提升并非人人平等，新手获益最大，专家反被拖累

经过三年“AI提升生产力”的承诺，大量实证终于给出了明确答案：与AI协作确实能提高效率——对某些人和某些任务而言效果惊人；但对另一些人来说，不仅无效，甚至适得其反。收益真实存在，但流向却与市场宣传截然不同。

核心发现：AI生产力提升幅度差异巨大

综合四项被引用最多的职场研究，可以得出清晰结论：AI带来的效率提升严重不均衡，倾向于新手、任务边界清晰和结果可验证的场景——而非处理高难度熟悉工作的专家，也非采购试点项目的企业。

客服代理平均效率提升14%，但新手提升34%，资深员工几乎为零
使用GPT-4的BCG顾问完成任务数量增加12.2%，速度快25%，质量更高——但仅限AI“锯齿边界”内的任务；边界外的任务中，AI辅助的顾问出错率高出19%
资深开源开发者使用AI处理自己的代码库，效率反而降低19%——但他们自认为快了20%
95%的企业生成式AI试点项目对利润没有产生可衡量的影响

最反直觉的发现：AI提升的是底层而非顶尖

在最清晰的结论中，AI最大的受益者是新手和表现平庸者，而非专家。在5179名客服代理的研究中，14%的平均增益几乎完全由经验最少的员工贡献（+34%），资深员工几乎原地踏步。AI的作用是将最佳员工的隐性知识编码，传递给新人——将数月的在岗学习压缩成一次提示。

而专家的情况更令人警醒：METR的随机试验中，16名资深开发者在自己的代码库上使用AI工具后，耗时反而增加了19%。更值得深思的是，他们预计能提速24%，即使在完成后仍坚信AI帮他们快了20%——他们完全感觉不到自己付出的“效率税”。

企业层面的残酷现实：95%的AI试点项目未见成效

MIT NANDA的研究发现，仅5%的GenAI试点项目实现了快速收入增长，其余95%停滞不前。瓶颈并非模型质量，而是“学习差距”——企业在未改变工作流程的情况下强行嫁接AI。购买现成供应商项目的成功率约67%，内部自建的成功率仅为三分之一。真正的投资回报率隐藏在不起眼的后台自动化中，而非消耗大部分预算的销售和市场工具。

被隐藏的三项成本

隐形人力成本：许多“AI生产力”实际上是转移了人类劳动，由全球南方的数据工人完成训练和校正
释放的时间被重新填满：AI清空琐事后，解放的时间被更高风险的任务和更快的节奏填满
技能萎缩：工程师开始公开讨论AI正在侵蚀“心流状态”，卸载困难部分正在掏空组织在AI出错时所需的专业知识

最残酷的对称性

斯坦福经济学家埃里克·布林约尔弗森的研究同时显示：AI最大程度提升初级员工的效率，而另一项基于ADP薪资数据的研究发现，22-25岁从事最易被AI替代岗位的年轻人就业率正在下降。这两个发现并不矛盾——AI给新手带来最大效率提升，恰恰是企业认为可以裁撤这个岗位的理由。当学徒阶段本身被优化掉时，没有人回答专家培养管道将如何维系。

结论：AI是否真的有效？

AI提高生产力的可靠条件：任务边界清晰、结果可验证、模型能力范围内、工作者是处理常规任务的新手或把AI视为可审核草稿的专家。

AI失败或适得其反的条件：开放式判断、人类无法轻易验证答案、专家本身的效率已经快于“审核AI输出”的循环、组织采购工具后未重新设计工作流程。

赢家不是最技术或最不技术的人——而是那些清楚知道“锯齿边界”在哪里、从不在边界外信任AI的人。

中文翻译：

在关于AI提升生产力的承诺提出三年后，终于有足够确凿的证据来直截了当地回答这个问题：与AI协作真的能让你更高效吗？答案是肯定的——对于某些人、某些任务来说，效果惊人。而对于另一些人来说，答案是否定的，甚至更糟。收益是真实存在的。只是它们并未流向营销宣传中所说的方向。本专题将描绘谁受益、谁付出代价，以及为何两者之间的界限并非你所想的那样。

赞助商
正在构建或部署AI智能体？
Spec27帮助团队在智能体接触用户之前验证其行为。我们将在AI工程师世界博览会B-3展位亮相，现向全球任何地方测试智能体的团队开放早期访问权限。

简而言之
将AI时代被引用最多的四项职场研究放在一起对比，能得出一个结论：AI带来的生产力提升是真实的，但分布极不均衡，并且倾向于经验不足者、任务界定清晰者以及结果可验证的领域——而不是那些从事困难、熟悉工作的专家，也不是那些购买试点项目的企业。

客服人员平均生产力提升了14%——但新手提升了34%，而资深员工几乎为零（Brynjolfsson, Li & Raymond，《工作中的生成式AI》，NBER/QJE）。
使用GPT-4的BCG顾问完成任务数量增加12.2%，速度快25%，工作质量更高——但这仅限于AI的“锯齿状前沿”之内。对于一项被刻意选在该前沿之外的任务，AI辅助顾问的正确率反而低了19%（Dell'Acqua等，SSRN）。
经验丰富的开源开发者在处理自己的代码库时，使用AI后速度慢了19%——而他们却自认为快了20%（METR随机试验，arXiv 2507.09089）。
95%的企业生成式AI试点项目未对损益产生可衡量的影响（MIT NANDA，《2025年AI在商业领域的现状》，转引自《财富》杂志）。

将这些数据整合起来，“对谁有效？”这个问题就不再只是一句口号，而成了一幅清晰的图景。

谁变得更高效——而且并非你所猜测的那群人
整个文献中最清晰的发现也是最反直觉的：AI提升的是底层，而非顶层。
在Brynjolfsson对5179名客服人员的研究中，14%的平均增长几乎完全由经验最少的员工贡献（+34%），而经验最丰富的员工几乎无变化。AI的运作方式是，将最优秀员工的隐性知识编码，并传递给新手——将数月的在职学习压缩成一条提示指令。BCG/哈佛商学院的咨询实验从另一个角度发现了同样的模式：表现低于平均水平的顾问受益最大，而这项技术缩小了强弱顾问之间的差距。

再看看专家。METR的随机试验让16名资深开发人员在其维护的代码库上完成246项真实任务。允许使用AI工具后，他们耗时增加了19%。这一点应该让所有那些“AI让我效率提升10倍”的领英帖子感到不安：开发人员原本期望速度提升24%，即便实际完成得更慢，他们仍然认为AI让他们的速度提升了20%。他们感受不到自己付出的隐性代价。METR对研究范围持谨慎态度——这一结果是基于2025年初的工具、资深开发人员、熟悉的代码库得出的，明确不适用于初级开发人员或不熟悉的代码。但这一限定条件本身就是发现：效率下降恰恰集中在专业知识和背景熟悉度最高的地方。

统摄这些现象的核心概念是“锯齿状前沿”。AI在某些任务上表现出色，但在看似相似的任务上却自信地犯错，而两者之间的边界是隐形且不规则的。在前沿之内，BCG顾问如鱼得水。而一旦被推至前沿之外——去做一项看似相似但需要AI所不具备的判断力的任务——AI辅助的顾问出错的可能性就显著增加，因为该工具流畅自信的表现，无论在前沿内外都如出一辙。因此，生产力并非AI的固有属性。它是任务、工具以及人类能否辨别AI何时在“撒谎”这三者之间匹配程度的产物。

谁在为从未显现的收益买单
如果说个人确实看到了真实的（尽管分布不均）收益，那么企业层面的图景则更为严峻。MIT的NANDA研究——包括150次高管访谈、350份员工调查和300次公开部署——发现仅5%的生成式AI试点项目带来了快速的收入增长；其余95%的项目停滞不前，对损益几乎无影响。障碍并非模型质量，而是“学习差距”——即组织将AI生硬地嫁接到未改变的既有工作流程上。值得注意的是，从供应商处购买的项目成功率约为67%；内部构建的项目成功率则只有其约三分之一；而真正的投资回报率出现在平淡无奇的后台自动化上，而非那些吞噬了大部分预算的销售和营销工具。

市场如今正切身感受到这一差距带来的资金压力。据专家本周分享的报道：随着推理成本急剧膨胀，各公司正“争相减少在AI上的巨额支出”（404 Media的“Token末日”报道）；而本轮周期中的警示故事——福特“雇佣AI，解雇人类”并导致严重后果——正在广泛流传，恰恰因为这是对理想宣传的反面写照。

演示所隐藏的部分
有三项成本不会出现在任何生产力仪表板上，却在各类报道中无所不在：

隐性人力成本。许多“AI生产力”实际上是人类生产力被转移并隐形化：这些系统由庞大的数据工作者大军训练和纠错——其中许多人在南半球，薪水仅为西方工资的一小部分——DAIR研究所对此有详细记录。产出看似自主，但其背后的人力成本通常并非如此。
收益被转化为更多的工作，而非更少。彭博社的“AI倦怠时代”报道记录了最接近AI的人正比以往任何时候都更努力地工作——一位初创公司CEO报告称连续三周睡在办公室。当AI清除了琐碎工作后，腾出的时间又被更高风险的任务和更快的周期所填满。节省下来的时间变成了被要求增加的产出。
技能退化。工程师们开始撰写相关文章——本周有一篇在Hacker News上登上热搜——认为AI正在侵蚀让这门手艺值得从事的“心流状态”，而将困难部分外包出去，会悄然掏空组织在工具出错时所需的核心专长。

残酷的对称性
关键在于，这一转变应成为你解读所有这些信息的基点。那位其研究最清晰地表明AI能提升初级员工生产力的经济学家——斯坦福大学的Erik Brynjolfsson——同时也运营着一个显示AI正导致初级员工失业的仪表盘。他的“金丝雀”指标，基于覆盖约六分之一美国工人的ADP薪资数据，发现22至25岁、处于AI影响最大岗位的年轻人就业率正在下降，而受影响较小岗位的同龄人就业率则在上升——这是早期大规模的 evidence，表明入门级的晋升阶梯正被抽走（《财富》杂志，6月27日）。这两项发现并不矛盾；它们是同一事实的两个方面。AI为新手带来了最大的生产力提升——而这正是公司认为首先可以裁撤新手岗位的原因。我们正最高效地自动化那级曾经让你一步步攀登、最终成为AI无法替代的专家的阶梯。当学徒期本身成为我们优化掉的对象时，专家的培养通道将何去何从，目前无人能答。

那么——它到底有效吗？一份实地指南
诚实的结论不是简单的“是”或“否”。而是一系列条件。当以下情况同时满足时，AI能可靠地提升生产力：任务范围界定清晰且输出结果可验证（能快速区分好坏）；任务处于模型的能力范围内；且工作者要么是处理常规工作的新手，要么是把AI当作待检查的草稿而非“神谕”的专家。在以下情况下，AI会失败或适得其反：工作是开放式的判断任务；人类难以轻易验证答案；专家自身的速度已经快于“审阅AI输出”的循环；或者组织在购买工具时没有围绕它重新设计工作流程。赢家既非最懂技术的人，也非最不懂技术的人——而是那些清楚知道“锯齿状前沿”确切位置，并且绝不信任工具越过该边界的人。

核心要点

收益真实存在，但向底层倾斜。AI带来的最大可量化生产力提升归于新手和表现低于平均水平的员工；对于从事熟悉工作的专家，其影响缩减至零甚至为负。
你感受不到这种影响。METR研究中的开发者慢了19%，却百分之百确信自己快了20%。自我报告的“10倍”提升不是证据；可验证的吞吐量才是。
瓶颈在于组织，而非模型。95%的企业试点项目无回报——而那5%成功的项目，购买的是聚焦特定任务的工具，并围绕AI重新设计了工作流程，而非简单叠加。
生产力故事与就业故事是同一个故事。AI对初级员工帮助最大，而这正是初级岗位最先被裁撤的原因。现在就应为缺失的专家培养通道做好规划。

值得一读

《工作中的生成式AI》——Brynjolfsson, Li & Raymond。奠定“新手收益最大”这一结论的基础研究（NBER /《经济学季刊》）。
《衡量2025年初AI对经验丰富开发者的影响》——METR。那份“慢19%却感觉更快”的随机对照试验，附有自身坦诚的局限说明。
《驾驭锯齿状技术前沿》——Dell'Acqua等，哈佛商学院/BCG。阐述AI在何处有帮助，又在何处悄然造成损害。
《生成式AI鸿沟：2025年AI在商业领域的现状》——MIT NANDA。揭示那95%的数字，以及那5%的成功者有何不同做法。
《煤矿中的金丝雀》——斯坦福数字经济实验室。提供关于AI与初级岗位就业情况的实时薪资数据证据。

等等，什么？
从7月8日起，Claude可能会要求你出示护照——以及你的面部信息。对于被标记为存在滥用行为的账户，Anthropic将要求提供政府签发的身份证件以及面部识别自拍，由第三方供应商Persona处理，该供应商会构建面部几何模板——这类数据在美国多个州被归类为受法律保护的生物识别信息（TechCrunch报道）。对于一个聊天机器人而言，这是异乎寻常的物理身份要求，也预示着双方关系的发展方向：那个本应为你服务的工具，正越来越想确切地知道你是谁。

值得关注
AI从业者目前正在传阅的视频——已在AI电视上策划展示。

本周投票
AI在哪些方面真正提升了你的生产力？
上周，119位读者参与了投票：
整个前沿领域信息量庞大的一周。你正在密切关注前沿的哪个角落？
AI在哪些方面真正提升了你的生产力？

这是特别版。请回复并告诉我你属于哪个阵营——真诚的答案是我们发布的内容中最有价值的。
—— 亚历克西斯

英文来源：

Three years into the productivity promise, there's finally enough hard evidence to answer the question plainly: does working with AI actually make you more productive? Yes — spectacularly, for some people on some tasks. And no, or worse, for others. The gains are real. They're just not flowing where the marketing said they would. This issue maps who wins, who pays, and why the line between them isn't where you think.
Sponsor
Building or deploying AI agents?
Spec27 helps teams validate agent behaviour before it reaches users. We'll be at AI Engineer World's Fair, booth B-3, and early access is open for teams testing agents from anywhere.The short version
Four of the most-cited workplace studies of the AI era, read side by side, tell one story: AI productivity gains are real but radically uneven, and they bend toward the inexperienced, the well-scoped, and the verifiable — not toward experts doing hard, familiar work, and not toward enterprises buying pilots.

Customer-support agents got 14% more productive on average — but +34% for novices and near-zero for veterans (Brynjolfsson, Li & Raymond, Generative AI at Work, NBER/QJE).
BCG consultants using GPT-4 did 12.2% more tasks, 25% faster, with higher-quality work — but only inside AI's "jagged frontier." On a task chosen to sit outside it, AI-assisted consultants were 19% less likely to be right (Dell'Acqua et al., SSRN).
Experienced open-source developers were 19% slower with AI on their own codebases — while believing they were 20% faster (METR randomized trial, arXiv 2507.09089).
And 95% of enterprise GenAI pilots delivered no measurable P&L impact (MIT NANDA, State of AI in Business 2025, via Fortune).
Put those together and the answer to "for whom?" stops being a slogan and becomes a map.
Who gets faster — and it's not who you'd guess
The cleanest finding in the whole literature is also the most counter-intuitive: AI levels up the bottom, not the top.
In the Brynjolfsson study of 5,179 support agents, the average 14% gain was almost entirely carried by the least-experienced workers (+34%), while the most experienced barely moved. The AI worked by encoding the best agents' tacit know-how and handing it to the newest ones — compressing months of on-the-job learning into a prompt. The BCG/HBS consulting experiment found the same shape from the other direction: below-average performers gained the most, and the technology narrowed the gap between weak and strong consultants.
Then the experts. METR's randomized trial put 16 seasoned developers through 246 real tasks on repositories they maintain. With AI tools allowed, they took 19% longer. The part that should haunt every "AI made me 10x" LinkedIn post: the developers expected a 24% speed-up, and even after finishing slower, still believed AI had sped them up by 20%. They could not feel the tax they were paying. METR is careful about scope — the result is early-2025 tools, expert developers, familiar codebases, and explicitly does not generalize to juniors or unfamiliar code. But that caveat is the finding: the slowdown is concentrated exactly where expertise and context are highest.
The unifying idea is the "jagged frontier." AI is wildly good at some tasks and confidently wrong at adjacent ones, and the border between them is invisible and irregular. Inside the frontier, the BCG consultants soared. Pushed just outside it — onto a task that looked similar but required judgment AI lacked — AI-assisted consultants were measurably more likely to be wrong, because the tool's fluent confidence is the same on both sides of the line. Productivity, then, isn't a property of AI. It's a property of the match between the task, the tool, and whether the human can tell when it's lying.
Who's paying for gains that never show up
If individuals see real (if lopsided) gains, the enterprise picture is starker. MIT's NANDA study — 150 executive interviews, 350 employee surveys, 300 public deployments — found just 5% of GenAI pilots produced rapid revenue acceleration; the other 95% stalled with little or no measurable P&L impact. The blocker wasn't model quality. It was the "learning gap" — organizations bolting AI onto unchanged workflows. Tellingly, bought-from-a-vendor projects succeeded ~67% of the time; internal builds barely a third as often, and the real ROI sat in unglamorous back-office automation, not the sales-and-marketing tools that ate most of the budget.
The market is now feeling that gap in cash. Expert-shared reporting this week: companies are "scrambling to stop spending so much on AI" as inference bills balloon (404 Media's "Tokenpocalypse"), and the cautionary tale of the cycle — Ford "hired AI and sacked humans," and it backfired badly — is making the rounds precisely because it's the inverse of the pitch deck.
The part the demos hide
Three costs don't appear in any productivity dashboard but show up everywhere in the reporting:
The hidden human payroll. A lot of "AI productivity" is really human productivity, relocated and made invisible: the systems are trained and corrected by armies of data workers — many in the Global South, paid a fraction of Western wages — whose labor the DAIR Institute documented in detail. The output looks autonomous; the payroll behind it usually isn't.
The gains get spent as more work, not less. Bloomberg's "AI burnout era" documents the people closest to AI working harder than ever — one startup CEO reports sleeping at the office for three straight weeks. When AI clears the busywork, the freed hours get refilled with higher-stakes work and faster cycles. Time saved becomes output demanded.
Skill atrophy. Engineers are starting to write essays about it — one trended on Hacker News this week — arguing AI is eroding the "flow state" that made the craft worth doing, and that offloading the hard parts quietly hollows out the expertise an organization will need when the tool is wrong.
The cruel symmetry
Here's the twist that should anchor how you read all of this. The economist whose research most clearly shows AI making junior workers more productive — Stanford's Erik Brynjolfsson — also runs the dashboard showing AI making junior workers unemployed. His Canaries indicator, built on ADP payroll data covering ~1 in 6 U.S. workers, finds employment for 22–25-year-olds in the most AI-exposed roles falling while it rises for less-exposed peers — early, large-scale evidence that the entry rungs are being pulled up (Fortune, June 27). The two findings aren't in tension; they're the same fact. AI delivers its biggest productivity boost to the novice — and that is precisely why the novice's job is the first one a company decides it can do without. We are most efficiently automating the rung you used to climb to become the expert AI can't replace. Nobody has answered what happens to the expert pipeline when the apprenticeship is the thing we optimized away.
So — does it actually work? A field guide
The honest synthesis isn't "yes" or "no." It's a set of conditions. AI reliably raises productivity when: the task is well-scoped and the output is verifiable (you can tell good from bad fast); it sits inside the model's competence; and the worker is either a novice on routine work or an expert who treats AI as a draft to be checked, not an oracle. It fails or backfires when: the work is open-ended judgment, the human can't easily verify the answer, the expert already moves faster than the review-the-AI loop, or an organization buys a tool without redesigning the workflow around it. The winners aren't the most or least technical people — they're the ones who know exactly where the jagged frontier is and never trust the tool past it.
Key Takeaways
The gains are real but bottom-weighted. AI's largest measured productivity boosts go to novices and below-average performers; for experts on familiar work, the effect shrinks to zero or reverses.
You can't feel it. METR's developers were 19% slower and 20%-sure they were faster. Self-reported "10x" gains are not evidence; verifiable throughput is.
The org, not the model, is the bottleneck. 95% of enterprise pilots return nothing — and the 5% that work buy focused tools and redesign the workflow instead of bolting AI onto it.
The productivity story and the jobs story are one story. AI helps the junior worker most, which is exactly why the junior role is cut first. Plan for the missing expert pipeline now.
Worth Reading
Generative AI at Work — Brynjolfsson, Li & Raymond. The foundational "novices gain most" study (NBER / Quarterly Journal of Economics).
Measuring the Impact of Early-2025 AI on Experienced Developers — METR. The 19%-slower-but-felt-faster RCT, with its own honest caveats.
Navigating the Jagged Technological Frontier — Dell'Acqua et al., Harvard Business School / BCG. Where AI helps, where it quietly hurts.
The GenAI Divide: State of AI in Business 2025 — MIT NANDA. The 95% number and what the 5% do differently.
Canaries in the Coal Mine — Stanford Digital Economy Lab. Live payroll-data evidence on AI and entry-level employment.
Wait, What?
Starting July 8, Claude may ask to see your passport — and your face. For accounts flagged for abuse, Anthropic will require a government-issued ID plus a facial-recognition selfie, processed by the third-party vendor Persona, which builds a facial-geometry template — data that several US states classify as legally protected biometric information (TechCrunch). It's an unusually physical demand from a chatbot, and a small sign of where the relationship is heading: the tool that's supposed to work for you increasingly wants to know exactly who you are.
Worth Watching
The videos AI practitioners are passing around right now — curated on AI TV.
This week's poll
Where has AI actually made you more productive?
Last week, 119 of you voted:
A content-heavy week across the whole frontier. Which corner of the cutting edge are you watching most closely?
Where has AI actually made you more productive?
That's the special edition. Reply and tell me which camp you're in — the honest answers are the most useful thing we publish.
— Alexis

AI周刊

文章目录

📚 推荐阅读

扫描二维码，在手机上阅读