在 Gemini 3.5 Flash 中引入计算机使用功能

qimuai 发布于 2026-6-25 14:00 阅读：0 一手编译

内容来源：https://blog.google/innovation-and-ai/models-and-research/gemini-models/introducing-computer-use-gemini-3-5-flash/

内容总结：

谷歌推出Gemini 3.5 Flash内置“电脑操控”功能，赋能企业自动化新高度

谷歌近日宣布，其新一代AI模型Gemini 3.5 Flash已原生集成“电脑操控”（Computer use）工具。这意味着开发者和企业无需额外导入模型，可直接利用该功能构建能够“看、想、做”的智能代理，在浏览器、移动端及桌面环境执行复杂任务。

此前，这一能力仅作为独立模型提供。如今，Gemini 3.5 Flash在原有函数调用、搜索及地图锚定工具的基础上，新增了内置的电脑操控能力，使其在长周期任务和企业自动化场景中表现更优，例如持续软件测试、跨专业应用的知识工作等。用户可通过Gemini API及企业代理平台立即上手使用。

在演示中，Gemini 3.5 Flash展示了如何利用电脑操控分析自身应用并返回分类功能列表，以及自动审查自身文档的无障碍问题。

安全升级：双重防御机制护航

为降低在真实环境中运行的代理面临提示注入攻击（Prompt Injection）的风险，谷歌针对Gemini 3.5 Flash的电脑操控功能进行了专项对抗性训练。同时，同步推出两项可选的企业级防护机制：

敏感操作确认：要求用户对敏感或不可逆操作进行人工确认；
自动任务中止：检测到间接提示注入时自动终止任务。

谷歌强调采取“纵深防御”策略，建议开发者将上述功能与安全沙箱、人工核验及严格访问控制相结合，并提供了详细的安全最佳实践文档。

目前，已有企业客户在实际业务中应用该功能并取得成效。用户可直接通过“Browserbase”演示环境体验，或登录Gemini API及企业平台查看参考实现与文档，开始构建应用。

中文翻译：

在Gemini 3.5 Flash中引入计算机操作功能
计算机操作现已成为Gemini 3.5 Flash的内置工具，为我们迄今为止最强大的智能体计算机操作任务性能提供了支撑。此前，该功能仅以独立Gemini 2.5计算机操作模型的形式提供，如今已原生集成到主流的Gemini Flash模型中。Gemini已在函数调用及内置工具（如搜索和地图锚定）方面表现卓越。通过内置的计算机操作能力，开发者现可借助3.5 Flash可靠地构建定制化智能体，使其能够在浏览器、移动端和桌面环境中进行观察、推理并执行操作。这将为长期任务及企业自动化场景（如持续软件测试、跨专业应用的知识工作）解锁更优性能。

开发者和企业可通过Gemini API及Gemini企业智能体平台，立即在3.5 Flash中使用计算机操作功能。

3.5 Flash运用计算机操作分析Gemini应用，并返回分类后的功能列表。
具备计算机操作能力的3.5 Flash会自行审核文档的可访问性问题。

保障3.5 Flash中计算机操作的安全性
为降低智能体在真实环境中运行时面临的提示注入风险，我们在Gemini 3.5 Flash的计算机操作功能中采用了针对性对抗训练。同时，我们发布了两项可选的企业级防护系统，助力企业实现：

对敏感或不可逆操作要求用户明确确认。
在检测到间接提示注入时自动终止任务。

我们采取“纵深防御”策略，建议开发者将这些功能与安全沙箱、人工复核机制及严格访问控制相结合。更多安全措施详情可参考我们的最佳实践文档。

我们已见证客户通过计算机操作创造价值。以下是部分用户反馈：
即刻开始利用计算机操作进行开发：

立即体验：在Browserbase托管的演示环境中测试各项能力。
着手构建：通过Gemini API和Gemini企业智能体平台，深入参考实现方案与文档。

英文来源：

Introducing computer use in Gemini 3.5 Flash
Computer use is now a built-in tool supported in Gemini 3.5 Flash, delivering our best performance yet for agentic computer use tasks. Previously only available as a standalone Gemini 2.5 computer use model, computer use is now integrated natively in the main Gemini Flash model. Gemini already excels at function calling and using built-in tools like Search and Maps grounding. With built-in computer use capability, developers can now use 3.5 Flash to reliably build custom agents that can see, reason and take action across browser, mobile and desktop environments. This unlocks improved performance for long-horizon and enterprise automation tasks like continuous software testing and knowledge work across professional applications.
Developers and enterprises can start using computer use in 3.5 Flash via the Gemini API and Gemini Enterprise Agent Platform.
3.5 Flash uses computer use to analyse the Gemini app and return a categorized list of features.
3.5 Flash with computer use audits its own documentation for accessibility issues.
Making computer use safe in 3.5 Flash
To mitigate some of the prompt injection risks for agents operating in live environments, we use targeted adversarial training for computer use in Gemini 3.5 Flash. We’re also releasing two optional enterprise safeguard systems that enable enterprises to:

Require explicit user confirmation for sensitive or irreversible actions.
Automatically stop tasks if an indirect prompt injection is identified.
Taking a “defense-in-depth” approach, we encourage developers to combine these features with secure sandboxing, human-in-the-loop verification and strict access controls. Additional information on safety measures can be found in our best practices documentation.
We are already seeing customers drive value with computer use. Here’s what some of them have to say:
To start building with computer use today:
Try it now: Test the capabilities in a demo environment hosted by Browserbase.
Start building: Dive into our reference implementation and documentation via Gemini API and Gemini Enterprise Agent Platform.

谷歌新消息

文章目录

📚 推荐阅读

扫描二维码，在手机上阅读