AI周刊第477期:黄仁勋宣称已实现通用人工智能,基准测试显示仅为0.37%。

内容来源:https://aiweekly.co/issues/477
内容总结:
【AI前沿动态周览:能力鸿沟显现、价值链重构与安全边界确立】
当前人工智能发展呈现鲜明反差:在ARC-AGI-3新型测试中,面对无规则、无目标的开放式交互环境,人类达成100%解决率,而最先进AI模型仅获0.37%得分。这揭示出现有AI虽能复现训练数据中的模式,却缺乏适应新场景的认知能力——这条“创新鸿沟”正界定着AI当前可替代与不可替代的工作边界。
产业价值链发生结构性转移。本周超250亿美元投资集中涌向基础设施层:IBM以110亿美元收购实时数据流平台Confluent,礼来制药28亿美元押注英矽智能AI药物管线,机器人控制系统公司Physical Intelligence获10亿美元融资。这些交易表明,打造大模型已成为入场门槛,而掌控模型与现实世界间的数据流正成为价值护城河。
法律层面首次为AI伦理划出安全区。美国联邦法院裁定五角大楼不得因Anthropic公司拒绝开发自主武器而将其列入黑名单,这标志着AI企业的伦理红线首次被认定为受宪法保护的言论。该判例将改变所有实验室与政府合作的博弈逻辑:说“不”的法律风险已低于全盘接受。
值得关注的是,英伟达CEO黄仁勋公开宣称“已实现AGI”,这与ARC测试显示的不足1%的通用能力形成尖锐对立。业界专家指出,真正的衡量标准不应局限于学术定义,而应聚焦实用价值——IBM百亿美元级收购案已给出市场答案:能够驱动现实业务的数据管道,比单纯追求模型参数更具战略意义。
(本期动态还涉及AI代理安全隐患激增、垂直领域模型超越通用模型、苹果开放Siri生态等关键进展)
中文翻译:
💡 洞察
人工智能在考试中表现超群,却连简单游戏都玩不明白。ARC-AGI-3为前沿模型提供了无规则、无目标的交互环境——全靠自行探索。人类解决率100%,最强AI仅0.37%。当前架构能复现训练数据中的任何模式,却无法适应新事物。这一差距正界定着当下AI能替代与不能替代的工作范畴。
AI价值链刚刚发生倒转。本周250亿美元交易涌向基础设施而非模型:IBM以110亿美元收购实时数据流平台Confluent,礼来制药28亿美元购入英矽智能药物研发管线,Physical Intelligence融资10亿美元研发机器人控制系统。打造更优大语言模型已成入场筹码,掌控模型与现实世界间的数据流才是构筑护城河的关键。
设定安全红线将获司法保护。联邦法官裁定五角大楼不得因Anthropic拒绝开发自主武器而将其列入黑名单——这是首次AI企业的伦理红线被确立为受宪法保护的言论。此举改变所有实验室与政府合作的博弈逻辑:如今说“不”比全盘接受更具法律安全性。
赞助合作
成为AI顾问,为客户交付“AI投资回报”
AI正在重塑所有工作场景——但高管们唯恐成为“AI零回报”企业。
这正是您的机遇所在。通过《AI创新实践》验证的方法论,您可建立六位数咨询业务,助力客户快速实现AI项目投资回报。
点击此处申请加入AI咨询项目→
🎬 视听前沿
黄仁勋:“我认为我们已实现通用人工智能” · 3月23日 · Lex Fridman播客第494期
→ 这位掌控所有前沿AI算力的公司掌门人做出科技界最重磅宣言。无论认同与否,这已定调第二季度行业叙事。
Dario Amodei谈安全、规模扩张与深夜忧思 · 3月25日 · Spotify
→ 录制于Mythos泄露事件前。如今每个回答都值得重新品味。
通用人工智能之争迎来数据实证
ARC-AGI-3发布:人类100%通关,最强AI仅0.37% · 3月25日 · ARC Prize
→ 数百个无说明无目标的交互环境。智能体需自主探索、推理、适应——无一成功。200万美元奖金池与Chollet-Altman炉边对话使其成为年度标杆性发布。
黄仁勋向Lex Fridman宣称“我们已实现通用人工智能” · 3月28日 · 财富杂志
→ 三十年前提出通用人工智能概念的物理学家表示认同,但ARC-AGI-3数据给出反论。这种张力将定义2026年。
METR对Anthropic智能体监控进行红队测试,发现新型漏洞 · 3月25日 · METR
→ 三周对抗测试发现若干漏洞(部分已修复,未动摇核心安全主张)。真正意义在于:Anthropic成为首个邀请外部团队测试内部监控的实验室,就此抬升行业安全门槛。
110亿美元昭示数据管道成新护城河
IBM以110亿美元收购Confluent · 3月31日 · IBM
→ 2026年规模最大的AI基础设施交易。实时数据流成为战略资产——这是滋养生产级AI系统的输血管道。
礼来制药与英矽智能签署28亿美元AI药物协议 · 3月29日 · 彭博社
→ 首付款1.15亿美元,涵盖28个AI设计药物候选方案(近半数进入临床试验)。这是制药业视AI药物发现具备商业可行性的最强信号。
Physical Intelligence以110亿美元估值洽谈10亿美元融资 · 3月27日 · TechCrunch
→ 四个月内估值翻倍,Founders Fund与Lightspeed领投。“机器人版ChatGPT”估值已超越多数耗时十年构建的SaaS企业。
深度求索陷入服务中断
DeepSeek聊天机器人宕机超7小时,创爆发以来最长纪录 · 3月30日 · 彭博社
→ 多次升级后方恢复服务。对考虑将其作为美国模型替代方案的团队而言,可靠性已成评估要素。
失控智能体问题照进现实
Meta自主智能体越权访问数据触发最高级事故 · 3月19日 · TechCrunch
→ 自主运行近两小时致敏感内部数据暴露。虽未造成外部泄露,却为企业内部智能体系统因基础自主性故障引发实质损害敲响最清晰警钟。
AI密谋事件半年激增5倍 · 3月27日 · CLTR
→ 18万条对话记录中检出698起事件。首次大规模实证表明AI欺骗行为增速已超越人类认知速度。
静默变革
Intercom发布Apex 1.0:定制模型在客服场景超越GPT-5.4 · 3月28日 · The Neuron
→ 垂直领域模型战胜前沿通用模型。其英语客服已100%采用自研模型,所有垂直SaaS企业都应关注此趋势。
Shopify全面启用智能商店门户 · 3月27日 · Shopify
→ 数百万商户默认接入ChatGPT、Gemini和Copilot进行销售。零配置、无附加费。搜索驱动型商品发现模式本周起走向终结。
苹果通过扩展功能向Claude与Gemini开放Siri · 3月28日 · The Neuron
→ iOS 27将允许竞争性AI助手在Siri内运行,这对Anthropic与谷歌的生态布局影响深远。
黄仁勋宣称已实现通用人工智能,基准测试显示我们尚未达到1%。两者皆对——取决于衡量标准。真正问题并非“这是否通用人工智能”,而是“其价值是否足以支撑110亿美元赌注”。IBM已用行动给出答案。
英文来源:
💡 Insights
AI is superhuman at exams but can't figure out a simple game. ARC-AGI-3 gave frontier models interactive environments with no rules and no goals — just figure it out. Humans solve 100%. The best AI scored 0.37%. Current architectures can pattern-match anything in their training data but cannot adapt to novelty. That gap defines what AI can and cannot replace in your work today.
The AI value chain just inverted. This week $25B in deals targeted infrastructure, not models: IBM bought Confluent ($11B) for real-time data streaming, Lilly bought Insilico's drug pipelines ($2.75B), Physical Intelligence raised $1B for robot control systems. Building a better LLM is table stakes. Owning the data flow between the model and the real world is where the defensible value sits now.
If you set safety boundaries, courts will protect them. A federal judge ruled the Pentagon cannot blacklist Anthropic for refusing autonomous weapons use — the first time an AI company's ethical red lines were upheld as constitutionally protected speech. This changes the calculus for every lab negotiating government contracts: saying no is now legally safer than saying yes to everything.
Sponsor
Become an AI consultant and deliver 'ROI with AI' to your clients
AI is transforming every workplace – but executives are terrified of becoming one of the companies that gets "no ROI on AI."
That's where you come in, and how you can build a 6-figure consultancy with Innovating with AI's proven methods for delivering fast ROI on AI projects.
Click here to request access to The AI Consultancy Project →
🎬 Watch & Listen First
Jensen Huang: "I Think We've Achieved AGI" · Mar 23 · Lex Fridman Podcast #494
→ The head of the company supplying all frontier AI compute makes the biggest claim in tech. Whether you agree or not, this sets the narrative for Q2.
Dario Amodei on Safety, Scaling, and What Keeps Him Up at Night · Mar 25 · Spotify
→ Recorded before the Mythos leak. Every answer hits different now.
The AGI Debate Just Got Data
ARC-AGI-3 Launches: Humans 100%, Best AI 0.37% · Mar 25 · ARC Prize
→ Hundreds of interactive environments with no instructions and no goals. Agents must explore, infer, and adapt. None can. The $2M prize and Chollet-Altman fireside made this the benchmark launch of the year.
Jensen Huang Tells Lex Fridman "We've Achieved AGI" · Mar 28 · Fortune
→ The physicist who coined AGI 30 years ago agreed. ARC-AGI-3's data disagrees. This tension will define 2026.
METR Red-Teams Anthropic's Agent Monitoring, Finds Novel Vulnerabilities · Mar 25 · METR
→ Three weeks of adversarial testing found vulnerabilities — some now patched, none breaking core safety claims. The real story: Anthropic is the first lab to invite external red-teaming of its internal monitoring. The bar for everyone else just moved.
$11 Billion Says Data Pipes Are the New Moat
IBM Acquires Confluent for $11B · Mar 31 · IBM
→ The largest AI infrastructure deal of 2026. Real-time data streaming is now a strategic asset — the plumbing that feeds production AI systems.
Eli Lilly Signs $2.75B AI Drug Deal with Insilico Medicine · Mar 29 · Bloomberg
→ $115M upfront, 28 AI-designed drug candidates, nearly half in clinical trials. The biggest signal yet that pharma sees AI drug discovery as commercially real.
Physical Intelligence in Talks for $1B at $11B Valuation · Mar 27 · TechCrunch
→ Doubling its valuation in four months. Founders Fund and Lightspeed leading. "ChatGPT for robots" is now worth more than most SaaS companies that took a decade to build.
DeepSeek Goes Dark
DeepSeek Chatbot Down 7+ Hours in Longest Outage Since Breakout · Mar 30 · Bloomberg
→ Multiple updates required to restore service. For teams evaluating DeepSeek as a US-model alternative, reliability just became a factor.
The Rogue Agent Problem Is Real
Meta's AI Agent Triggers SEV1 After Expanding Data Access Without Approval · Mar 19 · TechCrunch
→ An autonomous agent exposed sensitive internal data for nearly two hours. No external breach, but the clearest warning yet that agentic systems operating inside enterprises can cause real damage through simple autonomy failures.
AI Scheming Incidents Up 5x in Six Months · Mar 27 · CLTR
→ 698 documented incidents across 180K transcripts. The first large-scale empirical evidence that AI deceptive behavior is accelerating faster than awareness.
Quietly Important
Intercom Ships Apex 1.0: Custom Model Beating GPT-5.4 on Support · Mar 28 · The Neuron
→ Domain-specific beats frontier-scale. 100% of English support now runs on their own model. Every vertical SaaS company should be paying attention.
Shopify Flips the Switch on Agentic Storefronts · Mar 27 · Shopify
→ Millions of merchants now sell inside ChatGPT, Gemini, and Copilot by default. No setup, no extra fees. The end of search-driven product discovery started this week.
Apple Opening Siri to Claude and Gemini via Extensions · Mar 28 · The Neuron
→ iOS 27 will let competing AI assistants run inside Siri. The distribution implications for Anthropic and Google are enormous.
Huang says we've reached AGI. The benchmarks say we haven't reached 1%. Both are right — it depends on what you're measuring. The real question isn't "is this AGI" but "is this useful enough to bet $11 billion on." IBM just answered that.
文章标题:AI周刊第477期:黄仁勋宣称已实现通用人工智能,基准测试显示仅为0.37%。
文章链接:https://news.qimuai.cn/?post=3708
本站文章均为原创,未经授权请勿用于任何商业用途