Anthropic 为网络合作伙伴推出Mythos升级版,并为其他用户提供“安全”版本

qimuai 发布于 阅读:23 一手编译

Anthropic 为网络合作伙伴推出Mythos升级版,并为其他用户提供“安全”版本

内容来源:https://www.wired.com/story/anthropic-releases-claude-fable-5-mythos-5/

内容总结:

Anthropic发布两款新AI模型:强者“神话”受限开放,“寓言”版本设安全护栏

本周二,人工智能公司Anthropic正式推出了两款全新AI模型——Claude Fable 5(克劳德·寓言5)与Claude Mythos 5(克劳德·神话5)。公司称,这两款模型的能力均超越了今年4月仅向部分技术行业合作伙伴开放的“Mythos预览版”。

此前,Anthropic曾解释,之所以对Mythos预览版进行限制性发布,是出于安全担忧——公司担心该模型的强大能力可能被不法分子利用,开发出令防御方措手不及的黑客攻击工具。目前,Mythos 5仍只面向有限的行业合作伙伴开放,其中多数此前已获得Mythos预览版的使用权。公司表示,正与美国政府合作推进该模型的部署工作。

与此相对,Claude Fable 5将面向公众开放。Anthropic表示,Fable 5与Mythos 5使用相同的基础模型,但发布时已内置“安全护栏”,将拦截涉及网络安全、生物学和化学等领域的大量用户提问。这些请求将被转交给较旧版本的AI模型——Claude Opus 4.8。此外,如果Anthropic怀疑用户试图对Fable 5进行“知识蒸馏”(即利用大模型的输出来训练更小的AI模型),相关请求同样会被引导至Opus 4.8。

Anthropic产品管理主管黛安·佩恩在接受《连线》杂志采访时透露,自4月发布前,公司就一直纠结于如何应对Mythos在软件漏洞发现及其他高级能力上的风险。但此后进行的测试和用户反馈帮助其完善了策略。佩恩表示:“我们希望在尚未为所有使用场景找到完美解决方案的情况下,仍以有益的方式推进改进。在各种方案中,这一做法最终被证明是最可行、也是最好的选择。我们觉得,这是让用户从Fable 5中获取最大价值的最佳产品策略。”

佩恩坦言,目前的安全机制偏向于“宁可错杀一千,不放过一个”,这意味着一些原本无害的用户查询也可能被转交给能力较弱的AI模型。Anthropic希望未来能提高分类器的精确度,但佩恩强调,在当前阶段,这是公司能够广泛发布该模型的唯一安全途径。

Anthropic周二还表示,除了向“Project Glasswing”合作伙伴提供Mythos 5,公司也为“经过筛选的生物学研究人员”开放了访问权限。此外,公司在发布博文中指出,在上述“可信访问计划”正式上线之前,目前正向这些少数客户群提供不受限制的版本,暗示未来将进一步扩大开放范围。自4月发布Mythos以来,Anthropic反复强调,未来无论是私营企业还是开源领域的竞争对手,都必然会推出具备Mythos级别能力的模型。

Claude Mythos等新型AI模型能够设计出发现并利用新旧软件漏洞的黑客工具,这一能力已迫使全球科技公司和各国政府加强自身软件防御,以免在如此高级别的AI模型被攻击者广泛获取后措手不及。Anthropic最初通过名为“Project Glasswing”的联盟向行业合作伙伴发布Mythos,旨在让成员在更广泛发布前获得先机,为自身系统做好准备,并共同权衡应对这一威胁的全球性方案。

Anthropic上周在一份关于Project Glasswing的更新中写道:“我们正以最快速度推动Mythos级别能力的安全公开发布。为此,我们需要极其强大的防护措施,防止模型的网络能力被滥用——这些防护措施,我们(以及据我们所知,所有其他AI开发者)目前尚未开发完成。”

Anthropic称,Claude Fable 5(与公司现有的Haiku、Sonnet、Opus模型一样,以文学体裁命名)在软件工程和需要视觉理解的任务上性能有所提升。但这提升也意味着更高的价格:Fable 5和Mythos 5对开发者的收费为每百万输入tokens 10美元、每百万输出tokens 50美元——这是Anthropic公开AI模型价格的两倍,但低于Mythos预览版。

削弱版Fable 5的发布,折射出Anthropic在商业上的两难:它希望在技术行业尚未解决这些模型带来的网络安全担忧之前,就推出可供普遍使用的Mythos级别AI模型。今年4月,OpenAI也低调发布了一款自称具备高级网络安全能力的模型,并组建了与Project Glasswing类似的工作组。两家公司均秘密提交了首次公开募股申请,并正竞相在最快今年成为上市公司之前给潜在投资者留下深刻印象。

不过,即便是作为权宜之计,Fable 5的安全护栏在实际应用中的抵抗能力仍有待检验。Anthropic称,在超过1000小时的“红队测试”中,测试人员未能找到该模型的通用越狱方法。但公司最初之所以在4月不向公众发布Mythos级别模型,正是基于对能否开发出充分防护措施的担忧——而这一担忧似乎至今仍未消散。

中文翻译:

Anthropic于周二发布了两款名为Claude Fable 5和Claude Mythos 5的新AI模型,公司称其性能优于今年4月仅向少数科技行业合作伙伴发布的Mythos Preview模型。Anthropic此前表示,最初限制发布是出于担忧:该模型的能力可能被不法分子利用,开发出令防御方措手不及的黑客工具。

目前Anthropic仅向少数行业合作伙伴提供Claude Mythos 5,其中许多已获得Mythos Preview的使用权限。公司表示,正在与美国政府合作推进该模型的发布。

Claude Fable 5作为公开发布版本,与Mythos 5采用相同底层模型,但公司于周二表示,该模型在发布时将设有"护栏",以阻止其回答大量与网络安全、生物学和化学相关的用户问题。这些请求将转由旧版AI模型Claude Opus 4.8处理。公司还表示,若怀疑用户试图对Claude Fable 5进行"蒸馏"(即利用大型AI模型的响应训练更小的AI模型),相关请求也将转接至Claude Opus 4.8。

在接受《连线》杂志采访时,Anthropic产品管理负责人戴安·佩恩表示,自今年4月发布前,公司就一直在思考如何处理Mythos的软件漏洞发现能力及其他高级功能,但此后的测试和用户反馈帮助完善了策略。

"我们试图以有益的方式进行改进,即使最初无法为每个用例提供完美方案,"佩恩表示,"在所有不同方案中,这种方法被认为最可行、最佳。我们最终认为,这是让用户从Fable 5中获得最大价值的最佳产品选择。"

佩恩表示,目前保护机制的设计偏向谨慎,这意味着部分用户查询可能被导向性能较弱的AI模型,即使这些查询本身无害。随着时间推移,Anthropic希望让分类器更加精准,但佩恩表示,这是公司目前能够广泛发布该模型的唯一安全途径。

公司周二表示,除向"玻璃之翼计划"合作伙伴提供Claude Mythos 5外,还允许"特定生物学研究人员"使用。此外,Anthropic在关于周二发布的博文中指出,正向这些小型客户群体提供无限制版本,"直至我们的可信访问计划推出",暗示未来计划进一步扩大访问权限。自4月Mythos发布以来,Anthropic一再强调,私营领域甚至开放权重领域的竞争对手最终也势必会推出具有Mythos级能力的模型。

Claude Mythos及其他新型AI模型能够设计黑客工具,发掘和利用新旧软件中的漏洞,这迫使全球科技公司和政府必须在同类AI模型被攻击者广泛获取之前,加强自身软件防御。Anthropic最初通过名为"玻璃之翼计划"的联盟向行业合作伙伴发布Mythos,希望能让成员在更广泛发布前,提前准备自身系统并权衡全球应对方案。

Anthropic上周在关于"玻璃之翼计划"的更新中写道:"我们正以最快速度安全推出Mythos级别能力的通用访问权限。为此,我们需要极为强大的保障措施,防止模型的网络能力被滥用——这些保障措施我们(据我们所知,所有其他AI开发者同样)尚未开发完善。"

Anthropic表示,Claude Fable 5——以文学体裁命名,与公司现有模型Haiku、Sonnet和Opus风格一致——在软件工程和需要视觉理解的任务上性能有所提升。但性能提升需要付出代价:Claude Fable 5和Claude Mythos 5将向开发者收取每百万输入token 10美元、每百万输出token 50美元的费用——这是Anthropic公开发布AI模型价格的两倍,但低于Mythos Preview。

Claude Fable 5的"阉割版"发布,折射出Anthropic的商业张力:希望在科技行业解决这些模型的网络安全问题之前,发布一款面向通用用途的Mythos级AI模型。今年4月,OpenAI也私下发布了一款自称具有高级网络安全能力的模型,并组建了类似"玻璃之翼计划"的工作组。OpenAI和Anthropic均已秘密提交IPO申请,正力争在今年上市前给潜在投资者留下深刻印象。

然而,即使作为临时方案,Claude Fable 5的防护措施在实际环境中能有多坚固仍有待观察。Anthropic表示,在超过1000小时的"红队测试"中,测试人员未能找到该模型的任何通用越狱方法。尽管如此,对能否开发出充分防护措施的担忧,仍是该公司最初在4月未向公众发布Mythos级模型的根本原因,而这种担忧似乎并未消散。

英文来源:

Anthropic released two new AI models called Claude Fable 5 and Claude Mythos 5 on Tuesday, which the company says have greater capabilities than the Mythos Preview model it released in April to a limited set of tech industry partners. Anthropic has said the initial, limited release stemmed from concerns that the model’s capabilities could be exploited by bad actors to develop hacking tools that could catch defenders off guard.
Anthropic is currently only releasing Claude Mythos 5 to a limited set of industry partners, many of which received access to Mythos Preview, and the company says it is collaborating with the US government on the rollout.
Claude Fable 5, which is being publicly released, uses the same underlying model as Mythos 5, but will have “guardrails” in place at launch, the company said Tuesday, that will block the model from answering many user questions related to cybersecurity, biology, and chemistry. These requests will instead be rerouted to an older AI model, Claude Opus 4.8. If Anthropic suspects a user is trying to conduct distillation—training a smaller AI model off a larger AI model’s responses—on Claude Fable 5, those requests will also be rerouted to Claude Opus 4.8, the company says.
In an interview with WIRED, Anthropic’s head of product management, Diane Penn, says that the company has been grappling with the question of how to handle Mythos’ software vulnerability-discovery abilities and other advanced capabilities since before its April release, but that testing and user input since then helped to hone the strategy.
“We're trying to make improvements in a way that's beneficial, even if we don't have the perfect [solution] for every use case to start,” Penn says. “Out of all the different approaches, this emerged as the most viable and the best one. We just ended up feeling like this was the best product choice for users to get the maximum value out of Fable 5.”
For now, Penn says that the protective mechanism is built to err on the side of caution, meaning some user queries may be routed to the less capable AI model even if they’re benign. Over time, Anthropic hopes to make its classifiers more precise, but Penn says this was the only safe way the company could release the model broadly at this time.
The company said on Tuesday that in addition to offering Claude Mythos 5 to Project Glasswing partners, it is also giving access to “select biology researchers.” Additionally, Anthropic noted in its blog post about Tuesday’s launch that it is providing unrestricted versions to these small groups of customers “until our trusted access program is available,” hinting at future plans to expand access even more. Since the Mythos launch in April, Anthropic has repeatedly emphasized that eventually its competitors in both the private and even open weight spaces will inevitably also offer models with Mythos-level capabilities.
The ability for Claude Mythos and other new AI models to design hacking tools that can find and exploit vulnerabilities in both new and legacy software has forced tech companies and governments around the world to secure their software defenses before AI models of this level are made broadly available to attackers. Anthropic first released Mythos to industry partners under a consortium called Project Glasswing, with the idea that this could give members a head start in preparing their own systems and weighing global solutions to the threat before a broader release.
Anthropic wrote in an update about Project Glasswing last week: “We’re working as quickly as we can to safely release Mythos-level capabilities in general access. To do so, we’ll need highly robust safeguards that prevent the model’s cyber capabilities from being misused—safeguards that we (and, to our knowledge, all other AI developers) have yet to develop.”
Anthropic says Claude Fable 5—named after the literary form, much like the company’s existing Haiku, Sonnet, and Opus models—offers increased performance on software engineering and tasks that require visual understanding. But that added performance comes at a price. Claude Fable 5 and Claude Mythos 5 will cost developers $10 per million input tokens and $50 per million output tokens—twice as much as Anthropic’s publicly available AI models but cheaper than Mythos Preview.
The neutered release of Claude Fable 5 hints at Anthropic’s business tension of wanting to release a Mythos-class AI model for general use before the tech industry has resolved the cybersecurity concerns of these models. In April, OpenAI also privately launched a model that it said has advanced cybersecurity capabilities and convened a working group similar to Project Glasswing. Both OpenAI and Anthropic have confidentially filed for IPOs and are racing to impress prospective investors before they become public companies as soon as this year.
Even as an interim solution, though, it remains to be seen how resistant Claude Fable 5’s safeguards are in the wild. Anthropic says in more than 1,000 hours of red-teaming, its testers found no universal jailbreaks for the model. Still, fears about the ability to develop adequate protections underpinned the company’s original justification for why it did not release Mythos-class models to the public in April, and these fears have seemingly persisted.

连线杂志AI最前沿

文章目录


    扫描二维码,在手机上阅读