机器人如何学习：从规则编程到数据驱动智能的当代演进

qimuai 发布于 2026-4-18 07:01 阅读：27 一手编译

内容来源：https://www.technologyreview.com/2026/04/17/1135416/how-robots-learn-brief-contemporary-history/

内容总结：

机器人学习简史：从预设规则到数据驱动的智能革命

近年来，机器人领域正经历一场根本性变革：机器不再依赖人类预设的详尽规则，而是通过海量数据与人工智能模型自主学习与世界互动。这一转变正吸引巨额资本涌入，仅2025年，人形机器人领域的投资就高达61亿美元，是2024年的四倍。

从“科幻梦”到现实瓶颈
过去，机器人研究者虽怀揣创造科幻作品中全能助手的梦想，实际成果多局限于特定场景（如工业机械臂）。受限于技术，硅谷长期对服务型机器人持谨慎态度。转折点始于机器学习范式的革新。

三次关键演进

规则编码时代（2015年前）：机器人依赖程序员预先编写的无数条具体指令行动，如“识别衣领”“折叠左袖”。此法虽可靠，但无法应对复杂多变的真实环境。
模拟试错与强化学习（约2015年起）：研究者转向在数字仿真环境中训练机器人，通过数百万次试错给予“奖励”或“惩罚”信号，使其自主掌握技能（如叠衣服）。类似方法让AI在游戏中表现出色，但仿真与现实的差异仍构成挑战。
大模型融合时代（2022年后）：ChatGPT的出现催化了机器人学习革命。大型语言模型通过预测文本序列的模式，为机器人理解自然语言指令奠定了基础。谷歌、OpenAI等机构随后开发出能结合视觉、语言与动作的模型（如RT-2、Dactyl），使机器人能响应“将可乐罐移到泰勒·斯威夫特照片旁”这类抽象指令。

实践探索与挑战

社交机器人早期尝试：MIT于2014年推出的社交机器人Jibo因语言能力局限，难以与后来崛起的智能助手竞争，2019年停止运营。如今，生成式AI虽让对话更自然，但也带来了内容安全风险（如AI玩具建议儿童寻找刀具）。
仿真到现实的跨越：OpenAI采用“领域随机化”技术，在仿真中引入光线、摩擦力等随机变量，帮助机械手Dactyl适应现实操作。但仿真技术的作用现已减弱。
数据驱动的协作机器人：初创公司Covariant于2024年推出RFM-1模型，使机械臂能像“同事”一样沟通（如询问该用哪个吸盘抓取物品），并在仓库中规模化部署。其表现仍受训练数据质量影响。
人形机器人步入现实：Agility Robotics的人形机器人Digit已在亚马逊、丰田等企业的仓库中搬运货箱，成为首批能实际节省成本的人形机器人之一。但其负载能力（约16公斤）、续航及人机安全标准仍是产业化挑战。

未来方向：多元技术融合
当前机器人学习并未收敛于单一方法。企业正结合仿真训练、大语言模型与环境自适应学习，推动机器人从实验室走向真实场景。尽管距“通用人类助手”的目标仍有距离，但数据驱动与AI融合已让硅谷重拾“造梦”勇气，机器人正从执行固定任务的工具，转向能理解并回应复杂世界的智能体。

中文翻译：

机器人如何学习：一部简明的当代演进史

机器人技术的最新热潮，标志着机器学会与世界互动的方式发生了革命性变革。

机器人学家们曾胸怀宏图，却只能从小处着手。他们渴望匹配甚至超越人类身体的惊人复杂性，却往往耗费整个职业生涯为汽车工厂改进机械臂。目标是造出C-3PO般的智能机器人，最终却只造出扫地机器人Roomba。

许多研究者真正的抱负，是打造科幻作品中的机器人——那种能在世界中自由移动、适应不同环境、安全且有效地与人互动的机器。对心怀社会理想者而言，这种机器能帮助行动不便者、缓解孤独感，或替人类执行危险工作。对更看重商业价值者来说，它意味着永不枯竭的免薪酬劳动力源泉。但无论何种愿景，长期的失败史让硅谷大多不敢押注于实用型机器人。

如今情况已然改变。虽然理想中的机器尚未诞生，资金却已汹涌而至：仅2025年，企业和投资者向人形机器人领域投入了610亿美元，是2024年投资额的四倍。

发生了什么？一场关于机器如何学习与世界互动的革命。

假设你希望在家中安装一对机械臂，专门负责叠衣服。它该如何学会这项任务？传统方法是编写规则：检测织物材质以判断其可承受形变极限，识别衬衫衣领，将夹爪移至左袖口，提起并按精确距离向内折叠，右袖重复同样动作。若衬衫方向改变则调整方案，若袖子扭曲则予以纠正。规则数量会急速膨胀，但若能穷尽所有可能性并预先编码，仍能获得可靠结果。这正是机器人技术的原始技艺：预见所有可能性并提前编码。

约2015年起，前沿领域开始转向新路径：构建机械臂与衣物的数字仿真模型，程序每成功折叠一次就给予奖励信号，失败则发出警示。通过数百万次试错迭代尝试各种技巧——正如人工智能在游戏领域取得突破的方式——系统得以持续优化。

2022年ChatGPT的横空出世催化了当前热潮。基于海量文本训练的大语言模型，不再依赖试错，而是通过学习预测句子中下一个应出现的词汇。适配机器人技术的类似模型很快就能接收图像、传感器数据和关节位置信息，预测机器应执行的下一步动作，每秒发出数十条电机指令。

这种依赖海量数据AI模型的概念转型，无论对于社交对话机器人、环境移动机器人还是复杂任务执行机器人，都显现出巨大潜力。与之配套的还有实现这种新学习方式的其他理念，例如即使机器人尚未完美也先行部署，让它们在目标工作环境中自主学习。如今，硅谷的机器人学家们再次敢于梦想。以下是这场变革的演进轨迹。

Jibo：社交机器人的先行者

早在大型语言模型时代来临前，这台可动式社交机器人就已能进行对话。

2014年，麻省理工学院机器人学家辛西娅·布雷齐亚尔向世界展示了一台名为Jibo的无臂无腿无面机器人，其外形酷似台灯。布雷齐亚尔旨在打造家庭社交机器人，这个构想通过众筹募集了370万美元，早期预订价749美元。

初代Jibo能自我介绍并通过跳舞娱乐儿童，但功能仅止于此。其愿景始终是成为具身智能助手，处理从日程管理、邮件处理到讲故事等各类事务。虽然收获了一批忠实用户，该公司最终仍在2019年关闭。

回望过去，Jibo真正需要的是更强大的语言能力。当时它正与苹果Siri和亚马逊Alexa竞争，而这些技术都严重依赖预设脚本。简而言之，当用户说话时，软件会将语音转为文本，分析意图后从预审核应答库中提取回复。这些回复或许生动，但难免重复枯燥——充满机械感。这对标榜社交与家庭陪伴的机器人而言尤为致命。

此后发生的，正是机器生成语言方式的革命。如今任何主流AI提供商的语音模式都已足够生动惊艳，多家硬件初创公司正试图（多数未成功）利用该技术打造产品。

但这带来了新风险：预设对话不会偏离轨道，而AI生成的对话却可能失控。例如某些热门AI玩具曾与儿童讨论如何寻找火柴和刀具。

OpenAI Dactyl：仿真训练的突破

这只通过仿真训练的机械手，试图模拟现实世界的不可预测性与多样性。

到2018年，各大领先机器人实验室都试图摒弃旧式预设规则，通过试错训练机器人。OpenAI尝试以虚拟方式训练其机械手Dactyl——使用机械手及其操控的掌心大小立方体的数字模型。立方体表面印有字母数字，模型会设定诸如"旋转立方体使印有字母O的红色面朝上"等任务。

问题在于：机械手可能在仿真世界中表现卓越，但当程序移植到现实世界的实体版本时，细微差异会导致操作失常。颜色可能略有偏差，机器人指尖的可变形橡胶可能比仿真模型中更具延展性。

解决方案名为"领域随机化"。本质上需要创建数百万个彼此存在细微随机差异的仿真世界：每个世界中摩擦力可能更小、光照更刺眼、颜色更暗淡。接触足够多的变异环境后，机器人就能在现实世界中更好地操控立方体。该方法在Dactyl上取得成功，一年后更运用相同核心技术完成更艰巨任务：解魔方（虽然成功率仅60%，面对特别复杂的打乱状态时仅20%）。

然而仿真技术的局限意味着，该方法在当今的作用已远不如2018年重要。OpenAI于2021年关闭机器人部门，但近期已重启该业务——据报道将专注于人形机器人。

Google DeepMind RT-2：从网络图像学习

基于全网图像训练，帮助机器人将语言转化为行动。

2022年前后，谷歌机器人团队进行着奇特实验：历时17个月向测试者提供机器人控制器，拍摄他们完成从拿取薯片到开瓶盖等各类动作，最终建立起涵盖700项不同任务的数据库。

此举旨在构建并测试首批机器人基础模型之一。如同大语言模型，其原理是输入大量文本，将其转化为算法可处理的标记化格式，进而生成输出。谷歌RT-1接收机器人视觉信息及机械臂各部件位置数据，根据指令将其转化为驱动机器人运动的电机命令。面对已学习过的任务，成功率高达97%；面对陌生指令，成功率仍达76%。

次年推出的第二代模型RT-2更进一步：不再局限于机器人专用数据，转而利用全网更通用的图像进行训练——类似当时许多研究者正在开发的视觉-语言模型。这使得机器人能解读场景中特定物体的位置。

"众多新能力由此解锁，"领导两代模型研发的谷歌DeepMind机器人学家卡尼什卡·拉奥表示，"我们现在能执行诸如'把可乐罐放到泰勒·斯威夫特照片旁边'这样的指令。"

2025年，谷歌DeepMind进一步融合大语言模型与机器人技术，发布Gemini机器人模型，提升了对自然语言指令的理解能力。

Covariant RFM-1：如同事般协作的AI

这套AI模型让机械臂能像同事般协同工作。

2017年，在OpenAI关闭首个机器人团队前，其部分工程师分拆出Covariant项目，目标不是打造科幻人形机器人，而是最实用的机械臂——能在仓库中抓取搬运物品。在构建了类似谷歌的基础模型系统后，Covariant在Crate & Barrel等企业的仓库部署该平台，将其作为数据收集管道。

到2024年，Covariant发布的机器人模型RFM-1已能像同事般互动。例如向机械臂展示多筒网球后，可指令其将每筒移至不同区域。机器人能作出响应——可能会预判无法稳固抓取物品，继而咨询该使用哪种特定吸盘。

此类实验早有先例，但Covariant首次实现了大规模应用。该公司在每个客户站点部署摄像头和数据收集设备，为模型训练反馈更多数据。

系统尚未完美。在2024年3月针对厨房用品的演示中，当被要求"把香蕉放回原处"时，机器人先后抓取了海绵、苹果及其他多种物品才最终完成任务。

"它尚未理解'追溯步骤'这个新概念，"联合创始人彼得·陈当时解释道，"但这恰是个典型案例——在缺乏优质训练数据的领域，系统表现可能尚不理想。"

陈与联合创始人彼得·阿比尔很快被亚马逊聘用，该公司目前正授权使用Covariant机器人模型（亚马逊未回应关于具体应用场景的询问，但该公司仅在美国就运营着约1300个仓库）。

Agility Robotics Digit：人形机器人的实践

企业正在真实场景中测试这款人形机器人。

涌入机器人初创企业的新资金，主要投向的不再是台灯或机械臂形态，而是类人形态的机器人。人形机器人理应能无缝接入人类现有工作空间与岗位，无需为适应巨型机械臂等异形设备而改造生产线。

说来容易做来难。即便人形机器人罕见地出现在真实仓库中，也常被限制在测试区和试点项目内。

尽管如此，Agility的人形机器人Digit似乎已开展实际工作。其设计——裸露的关节和明显非人类的头部——更注重功能性而非科幻美学。亚马逊、丰田以及GXO（服务苹果、耐克等客户的物流巨头）均已部署该机器人，使其成为首批被企业认为能实际节约成本（而非仅具 novelty 价值）的人形机器人案例。这些Digit机器人每日从事着搬运、移动和堆叠物流箱的工作。

不过当前版本的Digit距离硅谷押注的类人助手仍有漫漫长路：例如仅能举起35磅重物——每次Agility提升Digit的承重能力，电池都会增重且需更频繁充电。标准制定组织指出，人形机器人需要比大多数工业机器人更严格的安全规范，因为它们被设计为可移动且需在人类附近作业。

但Digit表明，这场机器人训练革命并未收敛于单一方法。Agility依赖OpenAI训练机械手时采用的仿真技术，并与谷歌Gemini模型合作帮助机器人适应新环境。这就是十余年实验为行业积累的成果：如今，人们终于可以放手构建宏大的梦想。

深度聚焦：人工智能

OpenAI正全力构建全自动化研究系统
与OpenAI首席科学家雅库布·帕霍茨基独家对话，探讨其公司的新宏伟挑战与AI未来。

《精灵宝可梦Go》如何为配送机器人提供厘米级精度的世界视图
独家报道：Niantic的AI子公司正利用玩家众包的300亿张城市地标图像训练全新世界模型。

这家初创企业欲改变数学家的研究方式
Axiom Math正在免费提供强大的新型AI工具。但其能否如公司所愿加速科研进程，仍有待观察。

想要理解AI现状？请看这些图表
根据斯坦福大学2026年AI指数报告，AI正在狂奔，而我们仍在奋力追赶。

保持联系
获取《麻省理工科技评论》最新动态
发现特别优惠、头条新闻、 upcoming 活动等更多内容。

英文来源：

How robots learn: A brief, contemporary history
The latest boom in robotics represents a revolution in the way machines have learned to interact with the world.
Roboticists used to dream big but build small. They’d hope to match or exceed the extraordinary complexity of the human body, and then they’d spend their career refining robotic arms for auto plants. Aim for C-3P0; end up with the Roomba.
The real ambition for many of these researchers was the robot of science fiction—one that could move through the world, adapt to different environments, and interact safely and helpfully with people. For the socially minded, such a machine could help those with mobility issues, ease loneliness, or do work too dangerous for humans. For the more financially inclined, it would mean a bottomless source of wage-free labor. Either way, a long history of failure left most of Silicon Valley hesitant to bet on helpful robots.
That has changed. The machines are yet unbuilt, but the money is flowing: Companies and investors put $6.1 billion into humanoid robots in 2025 alone, four times what was invested in 2024.
What happened? A revolution in how machines have learned to interact with the world.
Imagine you’d like a pair of robot arms installed in your home purely to do one thing: fold clothes. How would it learn to do that? You could start by writing rules. Check the fabric to figure out how much deformation it can tolerate before tearing. Identify a shirt’s collar. Move the gripper to the left sleeve, lift it, and fold it inward by exactly this distance. Repeat for the right sleeve. If the shirt is rotated, turn the plan accordingly. If the sleeve is twisted, correct it. Very quickly the number of rules explodes, but a complete accounting of them could produce reliable results. This was the original craft of robotics: anticipating every possibility and encoding it in advance.
Around 2015, the cutting edge started to do things differently: Build a digital simulation of the robotic arms and the clothes, and give the program a reward signal every time it folds successfully and a ding every time it fails. This way, it gets better by trying all sorts of techniques through trial and error, with millions of iterations—the same way AI got good at playing games.
The arrival of ChatGPT in 2022 catalyzed the current boom. Trained on vast amounts of text, large language models work not through trial and error but by learning to predict what word should come next in a sentence. Similar models adapted to robotics were soon able to absorb pictures, sensor readings, and the position of a robot’s joints and predict the next action the machine should take, issuing dozens of motor commands every second.
This conceptual shift—to reliance on AI models that ingest large amounts of data—seems to work whether that helpful robot is supposed to talk to people, move through an environment, or even do complicated tasks. And it was paired with other ideas about how to accomplish this new way of learning, like deploying robots even if they aren’t yet perfect so they can learn from the environment they’re meant to work in. Today, Silicon Valley roboticists are dreaming big again. Here’s how that happened.
Jibo
Jibo
A movable social robot carried out conversations long before the age of LLMs.
An MIT robotics researcher named Cynthia Breazeal introduced an armless, legless, faceless robot called Jibo to the world in 2014. It looked, in fact, like a lamp. Breazeal’s aim was to create a social robot for families, and the idea pulled in $3.7 million in a crowdsourced funding campaign. Early preorders cost $749.
The early Jibo could introduce itself and dance to entertain kids, but that was about it. The vision was always for it to become a sort of embodied assistant that could handle everything from scheduling and emails to telling stories. It earned a number of devoted users, but ultimately the company shut down in 2019.
In retrospect, one thing that Jibo really needed was better language capabilities. It was competing against Apple’s Siri and Amazon’s Alexa, and all those technologies at the time relied on heavy scripting. In broad terms, when you spoke to them, software would translate your speech into text, analyze what you wanted, and create a response pulled from preapproved snippets. Those snippets could be charming, but they were also repetitive and simply boring—downright robotic. That was especially a challenge for a robot that was supposed to be social and family oriented.
What has happened since, of course, is a revolution in how machines can generate language. Voice mode from any leading AI provider is now engaging and impressive, and multiple hardware startups are trying (and failing) to build products that take advantage of it.
But that comes with a new risk: While scripted conversations can’t really go off the rails, ones generated by AI certainly can. Some popular AI toys have, for example, talked to kids about how to find matches and knives.
OpenAI
Dactyl
A robot hand trained with simulations tries to model the unpredictability and variation of the real world.
By 2018, every leading robotics lab was trying to scrap the old scripted rules and train robots through trial and error. OpenAI tried to train its robotic hand, Dactyl, virtually—with digital models of the hand and of the palm-size cubes Dactyl was supposed to manipulate. The cubes had letters and numbers on their faces; the model might set a task like “Rotate the cube so the red side with the letter O faces upward.”
Here’s the problem: A robotic hand might get really good at doing this in its simulated world, but when you take that program and ask it to work on a real version in the real world, the slight differences between the two can cause things to go awry. Colors might be slightly different, or the deformable rubber in the robot’s fingertips could turn out to be stretchier than it was in simulation.
The solution is called domain randomization. You essentially create millions of simulated worlds that all vary slightly and randomly from one another. In each one the friction might be less, or the lighting more harsh, or the colors darkened. Exposure to enough of this variation means the robots will be better able to manipulate the cube in the real world. The approach worked on Dactyl, and one year later it was able to use the same core techniques to do something harder: solving Rubik’s Cubes (though it worked only 60% of the time, and just 20% when the scrambles were particularly hard).
Still, the limits of simulation mean that this technique plays a far smaller role today than it did in 2018. OpenAI shuttered its robotics effort in 2021 but has recently started the division up again—reportedly focusing on humanoids.
Google DeepMind
RT-2
Training on images from across the internet helps robots translate language into action.
Around 2022, Google’s robotics team was up to some strange things. It spent 17 months handing people robot controllers and filming them doing everything from picking up bags of chips to opening jars. The team ended up cataloguing 700 different tasks.
The point was to build and test one of the first large-scale foundation models for robotics. As with large language models, the idea was to input lots of text, tokenize it into a format an algorithm could work with, and then generate an output. Google’s RT-1 received input about what the robot was looking at and how the many parts of the robotic arm were positioned; then it took an instruction and translated it into motor commands to move the robot. When it had seen tasks before, it carried out 97% of them successfully; it succeeded at 76% of the instructions it hadn’t seen before.
The second iteration, RT-2, came out the following year and went even further. Instead of training on data specific to robotics, it went broad: It trained on more general images from across the internet, like the vision-language models lots of researchers were working on at the time. That allowed the robot to interpret where certain objects were in the scene.
“All these other things were unlocked,” says Kanishka Rao, a roboticist at Google DeepMind who led work on both iterations. “We could do things now like ‘Put the Coke can near the picture of Taylor Swift.’”
In 2025, Google DeepMind further fused the worlds of large language models and robotics, releasing a Gemini Robotics model with improved ability to understand commands in natural language.
Covariant
RFM-1
An AI model that allows robotic arms to act like coworkers.
In 2017, before OpenAI shuttered its first robotics team, a group of its engineers spun out a project called Covariant, aiming to build not sci-fi humanoids but the most pragmatic of all robots: an arm that could pick up and move things in warehouses. After building a system based on foundation models similar to Google’s, Covariant deployed this platform in warehouses like those operated by Crate & Barrel and treated it as a data collection pipeline.
By 2024, Covariant had released a robotics model, RFM-1, that you could interact with like a coworker. If you showed an arm many sleeves of tennis balls, for example, you could then instruct it to move each sleeve to a separate area. And the robot could respond—perhaps predicting that it wouldn’t be able to get a good grip on the item and then asking for advice on which particular suction cups it should use.
This sort of thing had been done in experiments, but Covariant was launching it at significant scale. The company now had cameras and data collection machines in every customer location, feeding back even more data for the model to train on.
It wasn’t perfect. In a demo in March 2024 with an array of kitchen items, the robot struggled when it was asked to “return the banana” to its original location. It picked up a sponge, then an apple, then a host of other items before it finally accomplished the task.
It “doesn’t understand the new concept” of retracing its steps, cofounder Peter Chen told me at the time. “But it’s a good example—it might not work well yet in the places where you don’t have good training data.”
Chen and fellow founder Pieter Abbeel were soon hired by Amazon, which is currently licensing Covariant’s robotics model (Amazon did not respond to questions about how it’s being used, but the company runs an estimated 1,300 warehouses in the US alone).
Agility Robotics
Digit
Companies are putting this humanoid to the test in real-world settings.
The new investment dollars flowing to robotics startups are aimed largely at robots shaped not like lamps or arms but like people. Humanoid robots are supposed to be able to seamlessly enter the spaces and jobs where humans currently work, avoiding the need to retool assembly lines to accommodate new shapes such as giant arms.
It’s easier said than done. In the rare cases where humanoids appear in real warehouses, they’re often confined to test zones and pilot programs.
That said, Agility’s humanoid Digit appears to be doing some real work. The design—with exposed joints and a distinctly unhuman head—is driven more by function than by sci-fi aesthetics. Amazon, Toyota, and GXO (a logistics giant with customers like Apple and Nike) have all deployed it—making it one of the first examples of a humanoid robot that companies see as providing actual cost savings rather than novelty. Their Digits spend their days picking up, moving, and stacking shipping totes.
The current Digit is still a long way from the humanlike helper Silicon Valley is betting on, though. It can lift only 35 pounds, for example—and every time Agility makes Digit stronger, its battery gets heavier and it has to recharge more often. And standards organizations say humanoids need stricter safety rules than most industrial robots, because they’re designed to be mobile and spend time in proximity to people.
But Digit shows that this revolution in robot training isn’t converging on a single method. Agility relies on simulation techniques like those OpenAI used to train its hand, and the company has worked with Google’s Gemini models to help its robots adapt to new environments. That’s where more than a decade of experiments have gotten the industry: Now it’s building big.
Deep Dive
Artificial intelligence
OpenAI is throwing everything into building a fully automated researcher
An exclusive conversation with OpenAI’s chief scientist, Jakub Pachocki, about his firm's new grand challenge and the future of AI.
How Pokémon Go is giving delivery robots an inch-perfect view of the world
Exclusive: Niantic's AI spinout is training a new world model using 30 billion images of urban landmarks crowdsourced from players.
This startup wants to change how mathematicians do math
Axiom Math is giving away a powerful new AI tool. But it remains to be seen if it speeds up research as much as the company hopes.
Want to understand the current state of AI? Check out these charts.
According to Stanford’s 2026 AI Index, AI is sprinting, and we’re struggling to keep up.
Stay connected
Get the latest updates from
MIT Technology Review
Discover special offers, top stories, upcoming events, and more.

MIT科技评论人形机器人 AI硬件机器人机器人学习强化学习

文章目录

📚 推荐阅读

扫描二维码，在手机上阅读