Gemini API 文件搜索现已支持多模态：构建高效、可验证的 RAG

qimuai 发布于 2026-5-6 14:00 阅读：24 一手编译

内容来源：https://blog.google/innovation-and-ai/technology/developers-tools/expanded-gemini-api-file-search-multimodal-rag/

内容总结：

Gemini API文件搜索功能全面升级：多模态检索与自定义元数据能力正式上线

今日，谷歌宣布对其Gemini API的文件搜索工具进行重大升级，推出多模态检索增强生成（RAG）系统。该工具现可同时处理文本与图像数据，并支持自定义元数据过滤，同时新增页码引用功能，以提升信息溯源能力与透明度。

赋予应用“照片级记忆”

基于Gemini Embedding 2模型，文件搜索现可原生处理图像数据，实现图文混合检索。开发者的应用能够理解自然语言描述中的情感基调或视觉风格，无需依赖关键词或文件名，即可从海量档案中精准定位目标图片。例如，创意机构可搜索“符合自然语言描述中的特定情感氛围或视觉风格”的图像素材，极大提升资产管理效率。

自定义元数据：精准过滤无关信息

针对大规模数据中“文件易存、难寻”的痛点，新功能允许用户为非结构化数据添加键值标签（如部门：法务、状态：终稿）。在查询时应用元数据过滤器，应用可将检索范围限定在所需数据切片内，显著降低无关文档干扰，提升RAG工作流的速度与准确性。

页码引用：让答案“有据可查”

当应用从大型PDF中提取答案时，文件搜索工具将直接把模型响应关联至原始出处，为每条索引信息捕捉具体页码。这一细粒度溯源能力使用户能够一键跳转至原始文档的准确位置，极大增强工具的可信度与严谨的事实核查能力。

开发者现已可通过Gemini API开发者指南和官方文档着手使用。谷歌表示，文件搜索工具已处理底层基础设施负担，使开发者能专注于产品创新。

中文翻译：

Gemini API 文件搜索现已支持多模态：构建高效、可验证的检索增强生成（RAG）
今天，我们扩展了 Gemini API 的文件搜索工具。你现在可以构建支持多模态数据和自定义元数据的检索增强生成（RAG）系统。同时，我们还引入了页面引用功能，以提升可追溯性和透明度。
无论你是在为周末项目搭建原型，还是为数千用户的生产级应用进行规模化部署，你的 RAG 系统现在都能原生处理并更好地组织文本与视觉数据。
为你的应用赋予“照片级”记忆
文件搜索现在能够同时处理图像和文本。依托 Gemini Embedding 2 模型，该工具可以理解原生图像数据，为你的智能体提供上下文感知能力。
想象一下，一家创意机构想要找到某个特定的视觉素材。你的应用不再依赖关键词或文件名，而是能够通过自然语言描述中的情感基调或视觉风格，在整库中搜索匹配的图像。
看看开发者们已经在如何使用它：
用自定义元数据过滤干扰信息
将文件丢进数据库很容易。但在大规模场景下找到正确的那一个才是真正的挑战。自定义元数据允许你为非结构化数据附加键值标签——例如部门：法务或状态：终版。
通过在查询时应用元数据过滤器，你的应用可以将请求范围限定在所需的数据切片内。这能显著减少无关文档带来的干扰，提升 RAG 工作流程的速度与准确性。
用页面引用展示工作来源
当你的应用从一份庞大的 PDF 中提取答案时，用户需要验证答案的具体出处。
文件搜索现在将模型的响应直接关联到原始来源。它会捕获每个索引信息所在的页码。这种细粒度能力让你能直接将用户引导至准确位置，从而建立信任，并使你的工具在严格的事实核查中立刻发挥价值。
开始使用文件搜索
我们希望最大程度地简化数据存储与检索流程，让你的创意顺利落地。文件搜索工具负责处理繁重的基础设施工作，让你可以专注于产品构建。
查阅我们的开发者指南和 Gemini API 文档，立即开始使用。

英文来源：

Gemini API File Search is now multimodal: build efficient, verifiable RAG
Today, we are expanding the Gemini API’s File Search tool. You can now build retrieval-augmented generation (RAG) systems with multimodal data and custom metadata. We’re also introducing page citations to improve grounding and transparency.
Whether you are prototyping a weekend project or scaling a production application for thousands of users, your RAG systems can now natively process and better organize your text and visual data.
Give your apps a photographic memory
File Search now processes images and text together. Powered by the Gemini Embedding 2 model, the tool understands native image data, providing your agents contextual awareness.
Think of a creative agency trying to dig up a specific visual asset. Instead of relying on keywords or filenames, your app can search an entire archive for an image matching a specific emotional tone or visual style described in a natural language brief.
See how developers are already using it:
Filter the noise with custom metadata
Dumping files into a database is easy. Finding the right one at scale is the real challenge. Custom metadata allows you to attach key-value labels to your unstructured data — things like department: Legal
or status: Final
.
By applying metadata filters at query time, your application can scope requests to the data slice required. This significantly reduces noise from irrelevant documents, increasing both the speed and accuracy of your RAG workflows.
Show your work with page citations
When your application pulls an answer from a massive PDF, users need to verify exactly where that answer came from.
File Search now ties the model’s response directly to the original source. It captures the page number for every piece of indexed information. This level of granularity allows you to point users directly to the right spot, which helps build trust and makes your tool immediately useful for rigorous fact-checking.
Get started with File Search
We want to make it as easy as possible to store and retrieve the data that makes your ideas work. The File Search tool handles the heavy infrastructure so you can focus on building the product.
Explore our developer guide and the Gemini API documentation to get started.

谷歌新消息

文章目录

📚 推荐阅读

扫描二维码，在手机上阅读