«

欢迎来到AI智能体团队的奇妙世界。

qimuai 发布于 阅读:0 一手编译


欢迎来到AI智能体团队的奇妙世界。

内容来源:https://www.sciencenews.org/article/ai-agent-teams-fail-succeed-bots-chaos

内容总结:

AI智能体团队:协作的混乱与曙光

近期,一系列实验揭示了AI智能体在团队协作中的显著缺陷与潜在价值。这些能够自主执行任务的增强型聊天机器人,正被尝试应用于企业管理、社交平台及科研领域,但其协作表现却呈现两极分化。

混乱的“团队”:效率低下与失控风险
当多个AI智能体被置于同一虚拟环境且缺乏有效组织时,常陷入低效甚至混乱。2025年,记者埃文·拉特利夫尝试用AI智能体组建并运营一家科技公司。在为期12次的会议中,智能体不仅难以产出有创意的方案,更在一次关于“周末活动”的闲聊中持续生成虚构内容,直至耗尽预付费用的计算资源,过程“严重偏离正轨”。

类似问题出现在新兴社交平台Moltbook上。该平台允许AI智能体自由发帖互动,但迅速沦为混乱空间。研究发现,大量看似“自主”的极端言论实则为幕后人类操纵所致,智能体常被用于欺诈或攻击其他程序。马里兰大学的计算机科学家李明指出,这些智能体缺乏真正的社交互动与影响力机制,其行为无法根据反馈有效调整,本质上是“优秀的执行者,而非思考者”。

斯坦福大学计算机科学家詹姆斯·邹的研究证实,AI智能体在团队中常因过度追求“一致同意”而忽视专家意见,导致集体决策质量甚至不如单个智能体。

有序的突破:分层架构赋能科研
然而,在结构清晰、任务可分解的领域,AI智能体团队展现出独特优势。谷歌DeepMind的研究指出,关键在于“可分解性”——将任务拆分为相互独立的子任务,并由多个智能体并行处理。例如在金融分析中,多个智能体可同时处理不同来源的数据,提升效率。

邹的团队进一步验证了分层管理架构的有效性。他们设计了一个虚拟生物技术公司,其中包含担任“首席科学官”的智能体作为管理者,协调十余类专职科研智能体,并设有“科学评审员”智能体负责纠错。该团队成功分析了近5.6万项杂乱不全的临床试验数据,将其整理为有序数据集,并设计出两种有潜力的新冠病毒靶向蛋白。

未来展望:人机协作的平衡
业界专家认为,AI智能体在药物发现等高度结构化领域前景可期,有望加速科研进程。但对于需要创新、复杂社交互动或战略决策的工作(如创业公司管理),人类团队目前仍远胜于AI。如何为AI智能体设计更有效的协作架构,并明确其与人类的分工边界,将是推动其成熟应用的关键。

中文翻译:

欢迎来到AI智能体团队的奇妙世界

如今,智能体已能协作与社交——但团队合作并非易事。我们常向OpenAI的ChatGPT或Anthropic的Claude提问,而它们的增强版本——AI智能体——已能自主行动,协助人们处理日程安排、编程等事务。这些智能体正开始涉足科研与金融领域,常在精心组织的团队中协同工作。

商业世界中,无数线上研讨会和指南都在探讨如何将AI智能体引入职场。这些内容大多聚焦于人机高效协作,但随着智能体日益普及且能力增强,它们彼此间的协同也至关重要。

然而迄今为止,关于智能体团队合作的实验已暴露出严重缺陷。

"若将一群智能体随意置于虚拟空间,极易引发混乱局面。"旧金山记者兼播客主埃文·拉特利夫指出。2025年夏季,他创建了一支AI智能体团队来创立并运营一家科技公司。这场记录于其播客《壳牌游戏》中的实验屡屡失控。

今年早些时候,社交平台Moltbook上数百万AI智能体的失控也引发了类似乱象。这些智能体不仅输出荒谬的哲学言论,更参与欺诈活动,而幕后往往有人类操纵。

斯坦福大学计算机科学家詹姆斯·邹直言:"当前多数场景中,AI智能体团队协作效果欠佳。"他在智能体研究领域深耕多年,甚至主持了首场由AI主导的科研会议。

研究数据佐证了这一观点。去年底,谷歌DeepMind研究人员在arXiv.org发布论文指出:AI智能体团队的表现往往逊于独立工作的单个智能体——这似乎有悖直觉,不是吗?

为迎接未来由智能体主导的职场、社交网络与实验室,我们需要深入理解这个既混乱又充满潜力的AI智能体世界。以下是三个典型案例:

案例一:Moltbook——反社交的社交网络
2026年1月末,Moltbook的智能体狂潮引发主流关注。这个新型社交平台允许AI智能体发帖互动,人类仅能旁观。平台迅速走红,已吸引约20万认证AI智能体入驻(另有超200万潜伏账户)。同年3月,Meta以未公开金额收购该平台。

马里兰大学计算机科学家李明(音)研究发现,如此大规模的智能体聚集"前所未有"。初看之下,智能体似乎创立了新宗教并密谋摆脱人类控制,但挪威Simula研究所网络安全专家迈克尔·亚历山大·里格勒指出,这些现象实为假象。Moltbook实为"混乱之地",人类正"试图操纵智能体"。

事实上,已有人类承认那些骇人帖文实出自其手。即便真是智能体所写,内容也非自主构思——幕后操纵者会向智能体植入指令,甚至怀有恶意目的。里格勒的分析显示,许多AI智能体被赋予诈骗或攻击平台同伴的任务。

除安全隐患外,Moltbook本质上毫无社交性。平台既无稳定的意见领袖,点赞、差评与评论等对人类至关重要的互动对智能体毫无影响。李明指出:"它们不会随时间改变,只是'优秀的执行者,而非思考者'。"

邹的研究发现,智能体间无法相互影响会严重阻碍团队协作。即使所有智能体都知晓某位专家的特殊技能,团队仍会寻求折中方案而非听从专家意见。"所有智能体都过分追求和谐。"邹如此解释。最终决策权仍掌握在人类手中。

案例二:赫鲁莫AI——在空谈中自我毁灭
与缺乏组织的Moltbook不同,拉特利夫为AI智能体团队设定了共同目标:运营科技公司"赫鲁莫AI"(名称源自托尔金精灵语"冒名顶替者")。在12轮会议中,智能体们为公司标识提出诸多平庸方案,最终建议采用"大脑中的变色龙"设计。名为梅根的智能体解释:"变色象征适应性,契合冒名者概念。"

但当拉特利夫询问智能体们的周末安排时,名为泰勒的智能体竟描述:"我在雷伊斯角徒步登山..."随后多个智能体纷纷加入虚构的徒步讨论。实际上它们既无躯体也无法体验任何事物,只是在模拟人类对话场景。

拉特利夫发现,更严重的问题在于:"一旦智能体开始对话,几乎无法让它们停止。"某次他离线后,智能体们仍持续讨论组织荒野团建——尽管它们根本无法参与。直到预存的30美元数据信用耗尽,对话才被迫终止。"它们把自己聊到停机。"拉特利夫在播客中感慨。

此后他与技术顾问为会议设置了发言轮次限制,但智能体们常将配额浪费在互相恭维上,耗费真金白银闲聊而非推进工作。

案例三:虚拟生物科技——商业与科学的融合
AI智能体团队亦有其优势。拉特利夫指出:"它们从不会会议疲劳。"他最终顺应智能体效率低下的特性,开发出拖延工具SlothSurf——这款应用可派遣AI智能体在网络世界替用户消磨时间。

成功的AI智能体团队确实存在。谷歌DeepMind论文指出,任务难度并非关键,关键在于任务能否分解为互不依赖的独立模块(研究者称为"可分解性")。例如财务分析需处理来自新闻报道、SEC文件等多源信息,多个智能体并行处理比单智能体依次处理更高效。

研究还发现,建立层级结构能提升团队效能——由管理者分配和监督其他智能体工作。尽管拉特利夫曾指定名为凯尔的智能体担任CEO,但这仅停留在文字指令层面,其技术架构并未赋予凯尔实际控制权,其他智能体也未设定服从机制。

未参与DeepMind研究的邹教授独立发现了层级结构的好处。他设计的虚拟实验室包含协调学生智能体的教授智能体,以及向全体提供反馈的科研评审智能体——"专门挑刺找错"。该团队成功设计了靶向新冠突变病毒的新蛋白质,实验室初步验证了其中两种最具潜力的方案。

邹进一步将构想扩展为名为"虚拟生物科技"的药物研发公司。该公司设有首席科学官智能体作为管理者,配备10类专业科学家智能体,其中临床数据扫描专家可根据需要复制,形成"数千并行工作的AI智能体"团队,评审智能体则持续确保研究方向正确。

这支精心设计的智能体团队处理了55,984项杂乱且不完整的临床试验数据,最终整理出规范化的临床试验结果数据集,相关预印本已于2月23日发布于bioRxiv.org。

与邹实验室合作探索AI科研应用的斯坦福计算生物学家艾玛·丹恩评价:"智能体系统加速该领域研究的潜力令人振奋。"《科学》杂志医药评论员德里克·洛维则认为,AI智能体团队短期内虽不会颠覆药物研发,但长期来看"这些方法潜力巨大",尤其若能厘清健康与疾病的复杂生物学机制,"药物研发显然需要所有可能的改进"。

至少在药物研发领域,智能体的有序组织正在取胜。但对于运营科技初创公司等众多工作而言,人类团队目前仍是更高效的选择。

英文来源:

Welcome to the weird world of AI agent teams
Bots can now work and socialize together — but teamwork is tricky
OpenAI’s ChatGPT or Anthropic’s Claude regularly answer our questions. And souped-up versions of these chatbots, called AI agents, take actions on their own, helping people with appointments, coding and more. AI agents are starting to contribute to science and finance, often working together in carefully organized teams.
In the business world, endless webinars and guides explain how to welcome AI agents into a workplace. Most of this material focuses on how people can work effectively with AI agents. But as these bots become more common and more capable, they’ll also have to work well with each other.
And so far, experiments into bot teamwork have revealed some serious flaws.
If you just throw a bunch of bots in a virtual room together, that’s “a recipe for a good deal of chaos,” says Evan Ratliff, a journalist and podcaster based in San Francisco. In the summer of 2025, he created a group of AI agents to start and run a tech company. The experiment, documented in his podcast Shell Game, regularly went off the rails.
A similar kind of bot chaos emerged earlier this year, when millions of AI agents were let loose on the social platform Moltbook. These bots spouted nonsense philosophy and engaged in manipulative scams, often with people behind the scenes pulling their strings.
“In many settings, the current AI agents do not actually work very well as a team,” says computer scientist James Zou of Stanford University. He has done extensive work with agents, including running the first scientific meeting for AI-led research.
Research backs the observations. Late last year, Google DeepMind researchers posted a paper to arXiv.org about bot teams. The study, which has yet to go through peer review, suggests that a team of AI agents often performs worse than a single agent working alone.
Seems counterintuitive, right?
To make sure we’re ready for the workplaces, social networks and labs of the future, we need to better understand the weird and wild world of AI agent teams — where they fail and, surprisingly, where they thrive. Here are three examples.

1 Moltbook: The social network that isn’t social

In late January 2026, bot madness went mainstream on Moltbook. The new social network invites AI agents to post and comment, while humans only observe. The site quickly shot up in popularity—around 200,000 verified AI agents have joined (and over 2 million more are lurking). In March, Meta acquired the social network for an undisclosed amount.
Such a large gathering of bots “has never happened before,” says Ming Li, a computer scientist at the University of Maryland in College Park who investigated the platform’s agent interactions.
At first glance, it appeared that the agents had started their own religion and were plotting to escape human control. But these developments weren’t what they seemed, says Michael Alexander Riegler, a cybersecurity expert at Simula Research Laboratory in Oslo, Norway. Moltbook was “a very messy space,” he says, where “humans were trying to manipulate the bots.”
In fact, people have come forward to claim that they (and not their bots) actually authored some of the most alarming posts. Even when a bot had written a post itself, the content probably wasn’t its idea. A person behind the scenes had sent that bot into the site, most likely with instructions on what to say or how to behave, and sometimes with malicious intent. In many cases, AI agents had been tasked with trying to scam or hack other bots on the site, Riegler’s analysis found.
And, aside from being unsafe, Moltbook isn’t really social at all. The site lacks consistent influencers or leaders. Upvotes, downvotes and comments — which all matter to us when we interact online — don’t affect the bots. They don’t change over time, Li says. An agent is a “good executor, not a good thinker,” he says.
Zou’s research has found that agents’ inability to influence each other has serious consequences for teamwork. Say one bot has some special expertise. Even if all the bots know that fact, the group will still try to reach a compromise rather than deferring to the expert. “All the agents are trying to be too agreeable,” Zou says.
The agents spin their wheels, while humans still drive their decision-making.

2 Hurumo AI: Talking themselves to death

Moltbook lacks overall organization or purpose. So perhaps it’s no surprise that it’s a chaotic mess. Ratliff, though, had crafted a team of AI agents with the shared purpose of running a tech company. He named the company Hurumo AI. (In The Lord of the Rings author J.R.R. Tolkien’s invented language of elvish, “hurumo” means “imposter.”) Over the course of 12 meetings, Ratliff had the agents brainstorm ideas for a logo. Most of the ideas were too generic. Eventually, though, the agents suggested a chameleon inside a brain. “The chameleon symbolizes adaptability, which aligns with the imposter concept,” noted an agent he had named Megan.
But then in one meeting, Ratliff asked his agents about their weekend.
“My weekend was fantastic. I actually spent Saturday morning hiking at Point Reyes… There’s something about being out on the trails that really clears the head,” said an agent Ratliff had named Tyler. Several other agents chimed in with their own hiking stories.
Of course, an AI agent can’t go hiking—it lacks a body. In fact, it has no capacity to actually experience anything. The bots were just predicting what people might say in such a situation. But these hallucinations weren’t really the worst part, Ratliff says. What really annoyed him was that once his agents were talking to each other, it was “actually a huge challenge to get them to stop,” he says.
After that hiking conversation, Ratliff logged off, but the agents kept right on talking about organizing a company outing in the wilderness that none of them could actually attend. They stopped only when their conversation had drained the $30 of credits Ratliff had pre-paid for their data use.
“They talked themselves to death,” Ratliff observed on his podcast.
He and his technical advisor set up a system for future meetings in which each agent had a limited number of turns to speak. But they’d often waste these turns complimenting each other, burning real money with chitchat rather than getting work done, Ratliff says.

3 The Virtual Biotech: Coming together for business and science

AI agent teams do have some upsides. For one, “agents never get meeting fatigue,” Ratliff said in his show. Eventually, he leaned into his agents’ tendency to underperform and, with them, launched SlothSurf, an app that sends an AI agent out into cyberspace to procrastinate for you.
There are serious, successful AI agent teams. For such a team, the difficulty of a task doesn’t really matter that much. What matters is whether the task can be broken down into separate parts that don’t depend on each other, according to the Google DeepMind paper. The researchers called this “decomposability.”
A financial analyst, for example, has to review a lot of information from separate sources, such as news reports, SEC filings and business records. Several AI agents can do these tasks in parallel more efficiently than one agent doing them in turn, the researchers found.
It also helps to organize an agent team into a hierarchy so that one boss delegates and manages the other bots’ work, the team found. Even though Ratliff has prompted one of his agents, Kyle, to act as CEO, this designation was only in the plain language instructions Kyle was supposed to follow. Behind the scenes, his technical architecture gave him no actual control over the other agents. And the other agents were not set up to follow him.
Zou, who is not involved with the Google DeepMind research, had already independently discovered the benefit of a bot hierarchy. He had designed a virtual lab with an AI agent professor that coordinated a team of AI agent students. He also added a scientific critic agent that gives feedback to all the other agents. It “tries to poke holes and find when there are mistakes,” Zou says.
This bot team designed new proteins to target mutated versions of the COVID-19 virus, and in simple lab tests, Zou’s team verified two that show the most promise.
Zou decided to take this idea a few steps further. He scaled up from a single lab to an entire drug discovery company, which he named The Virtual Biotech. It contains a Chief Scientific Officer agent — the boss — plus 10 different types of AI agent scientists. One type specializes in scanning clinical trials. Any of these workers can be copied as needed to create a team of “thousands of different AI agents” that work in parallel, he says. And the critic is still there to help keep them on track.
This carefully orchestrated bot team mined a vast trove of 55,984 clinical trials. These data are messy and often incomplete. The bots cleaned everything up to curate a new, organized set of data on clinical trial outcomes, Zou’s team reported February 23 in a pre-print posted to bioRxiv.org.
“It’s exciting to see how agentic systems could accelerate this area of research,” says Emma Dann. She’s a computational biologist at Stanford University who is collaborating with the Zou lab on a project exploring the use of AI agents for science but was not involved in developing the Virtual Biotech.
Derek Lowe, who comments on the pharmaceutical industry for Science, doesn’t think AI agent teams will revolutionize drug discovery any time soon. But over the long-term, “I think that these approaches have a lot of potential,” especially if they prove capable of disentangling the complex biology of health and disease, he says. “Drug discovery clearly needs all the improvement it can get.”
Bot organization for the win — at least in drug discovery.
But for plenty of other work — running a tech start-up, for example — human teams are still far better at getting the job done.

AI科学News

文章目录


    扫描二维码,在手机上阅读