Gemma 4：逐字节对比，性能最强的开源模型。

qimuai 发布于 2026-4-3 14:01 阅读：1 一手编译

内容来源：https://blog.google/innovation-and-ai/technology/developers-tools/gemma-4/

内容总结：

谷歌发布迄今最强开源模型系列Gemma 4，以极致能效比推动AI普惠

谷歌今日正式推出其新一代开源大模型系列Gemma 4，该系列被官方称为迄今为止“能力最强的开源模型”。Gemma 4基于与旗舰模型Gemini 3同源的世界级研究与技术构建，旨在为开发者社区提供高性能、易获取的先进人工智能工具。

核心突破：参数效率与能力飞跃

Gemma 4系列包含四个精心设计的尺寸：高效2B（E2B）、高效4B（E4B）、260亿参数混合专家模型（26B MoE）以及310亿参数稠密模型（31B Dense）。其最大亮点在于实现了“前所未有的单位参数智能水平”。例如，310亿参数模型在业界权威的Arena AI文本排行榜上已位列全球开源模型第三，其性能可媲美参数量大20倍的模型。这意味着开发者能够以更低的硬件成本获得接近前沿水平的AI能力。

面向多元场景的专用设计

该模型系列针对不同硬件平台进行了深度优化：

26B与31B模型：专注于为研究者和开发者在个人电脑或工作站上提供前沿的推理能力。其中，26B MoE模型通过仅激活部分参数实现低延迟推理，而31B模型则追求极致的原始输出质量，是微调的强大基础。
E2B与E4B边缘模型：专为移动和物联网设备打造，在推理时仅激活20亿或40亿参数，以最大限度节省内存和电量。通过与谷歌Pixel团队、高通及联发科等移动硬件领导者的合作，这些模型可在手机、树莓派等设备上实现近乎零延迟的完全离线运行，并原生支持视觉、音频输入。

全面升级的模型能力

Gemma 4超越了简单的对话功能，在多个关键维度实现显著提升：

高级推理与智能体工作流：具备多步骤规划与深度逻辑能力，并原生支持函数调用、结构化JSON输出，便于构建能可靠执行任务的自主智能体。
多模态与长上下文：全系列模型原生支持图像、视频处理，边缘模型提供128K上下文窗口，大模型更支持高达256K，可一次性处理长文档或代码库。
广泛的语言与代码支持：覆盖140多种语言，并支持高质量的离线代码生成，可将工作站变为本地AI编程助手。

坚持开放与协作的生态理念

秉承开源精神，Gemma 4采用商业友好的Apache 2.0许可证发布，赋予开发者对其数据、基础设施和模型的完全控制权。谷歌表示，此举旨在消除限制性壁垒，通过协作共建AI未来。

模型现已通过Google AI Studio、Hugging Face、Kaggle、Ollama等平台提供，并获得了从Hugging Face生态、vLLM到MLX等主流工具链的广泛支持。开发者可便捷地获取模型权重，并利用Google Colab、Vertex AI乃至消费级GPU进行微调与部署。对于需要大规模生产级部署的用户，Google Cloud也提供了完整的解决方案。

谷歌强调，Gemma 4与其专有的Gemini模型形成互补，共同为开发者提供了业界最强大的“开源+闭源”工具组合，有望推动从学术研究到全球应用开发的下一波创新浪潮。

中文翻译：

Gemma 4：以同等参数量实现最强性能的开源模型
今日，我们正式推出迄今为止最智能的开源模型系列——Gemma 4。该系列专为高级推理与智能体工作流打造，实现了前所未有的“单位参数智能密度”突破。这一成就源于蓬勃发展的社区生态：自第一代模型发布以来，开发者已累计下载 Gemma 超 4 亿次，并构建了超过 10 万个变体模型的 Gemmaverse 生态。我们深入聆听创新者拓展人工智能边界的需求，Gemma 4 正是我们的回应：以 Apache 2.0 开源协议提供突破性能力，让前沿技术触手可及。
（截至 4 月 1 日 Arena.ai 聊天竞技场开源模型性能与规模对比图）

基于与 Gemini 3 同源的世界级研究技术，Gemma 4 成为您可在自有硬件上运行的最强大模型系列。它们与 Gemini 模型形成互补，为开发者提供业界最强大的开源与专有工具组合。

行业领先能力与移动优先人工智能
Gemma 4 提供四种灵活规格：高效 2B 版（E2B）、高效 4B 版（E4B）、260 亿参数专家混合模型（26B MoE）及 310 亿参数稠密模型（31B Dense）。全系列模型超越基础对话功能，可处理复杂逻辑与智能体工作流。其中大规格模型在同等规模中实现顶尖性能：31B 模型目前位列行业标准 Arena AI 文本排行榜全球开源模型第三名，26B 模型位居第六。在此基准测试中，Gemma 4 的性能甚至超越规模达其 20 倍的模型。对开发者而言，这种单位参数智能密度的跃升意味着能以更低硬件开销实现前沿能力。

在边缘设备场景，E2B 与 E4B 模型重新定义了终端设备效用，优先关注多模态能力、低延迟处理及无缝生态集成，而非单纯追求参数数量。

强大、易用、开放
为赋能新一代前沿研究与产品，我们特别设计 Gemma 4 模型规格，使其能在各类硬件上高效运行与微调——从全球数十亿安卓设备到笔记本电脑 GPU，乃至开发者工作站与加速器。

通过使用这些高度优化的模型，您可针对特定任务微调 Gemma 4 以获得顶尖性能。该方法已取得显著成果：例如 INSAIT 基于此创建了开创性的保加利亚语优先语言模型（BgGPT），我们与耶鲁大学合作推进 Cell2Sentence-Scale 项目以探索癌症治疗新路径等。

Gemma 4 成为我们迄今最强大开源模型系列的核心优势包括：

高级推理：具备多步骤规划与深度逻辑能力，在需要数学运算与指令遵循的基准测试中表现显著提升
智能体工作流：原生支持函数调用、结构化 JSON 输出及系统指令，助您构建能与多种工具及 API 交互、可靠执行工作流的自主智能体
代码生成：支持高质量离线代码生成，将您的工作站转变为本地优先的 AI 代码助手
视觉与音频：全系列原生支持视频图像处理（含可变分辨率），在 OCR 与图表理解等视觉任务中表现卓越；E2B 与 E4B 模型更具备原生音频输入能力，支持语音识别与理解
长上下文处理：边缘模型支持 128K 上下文窗口，大规格模型最高支持 256K，可单次提示处理完整代码库或长文档
140+ 语言支持：基于超 140 种语言原生训练，助力开发者构建面向全球用户的包容性高性能应用

适配多元硬件的灵活模型
我们针对特定硬件与用例发布定制化规格的 Gemma 4 模型权重，确保您在任何场景都能获得前沿级推理能力：

26B 与 31B 模型：在个人计算机实现离线前沿智能
为研究者和开发者在易得硬件上提供顶尖推理能力而优化：未量化 bfloat16 权重可高效运行于单张 80GB NVIDIA H100 GPU；量化版本可在消费级 GPU 原生运行，赋能 IDE、代码助手与智能体工作流。其中 26B 专家混合模型专注低延迟，推理时仅激活 38 亿参数以实现极快 token 生成速度；31B 稠密模型则追求极致质量，为微调提供强大基础。
（模型性能评估涵盖多维度文本生成数据集与指标，详见模型卡片）

E2B 与 E4B 模型：为移动与物联网设备带来智能新高度
专为最大化计算与内存效率设计，推理时分别仅激活 20 亿与 40 亿有效参数以节省内存与电量。通过与 Google Pixel 团队及高通技术、联发科等移动硬件领军者深度合作，这些多模态模型可在手机、树莓派、NVIDIA Jetson Orin Nano 等边缘设备实现近零延迟的完全离线运行。Android 开发者现可通过 AICore 开发者预览版进行智能体流程原型设计，确保与 Gemini Nano 4 的前向兼容性。

开源协议承诺
我们倾听社区反馈，坚信构建 AI 未来需要协作精神。为此 Gemma 4 采用商业友好的 Apache 2.0 协议发布，旨在赋能开发者生态，消除限制性壁垒。

该开源协议为开发者提供完全灵活性与数字主权基础：您可全面掌控数据、基础设施与模型，自由构建并安全部署于任何环境（本地或云端）。

基于信任与安全的基石
本系列模型遵循与专有模型同等严格的基础设施安全协议。企业与主权机构选择 Gemma 4 即获得可信赖、透明的技术基础，在满足最高安全可靠性标准的同时享受前沿能力。

多元生态选择

即刻体验：通过 Google AI Studio（31B/26B MoE）或 Google AI Edge Gallery（E4B/E2B）快速接入；Android 开发者可在 Android Studio 启用智能体模式，或通过 ML Kit GenAI Prompt API 构建生产级应用
兼容主流工具：首发即支持 Hugging Face（Transformers/TRL/Transformers.js/Candle）、LiteRT-LM、vLLM、llama.cpp、MLX、Ollama、NVIDIA NIM/NeMo、LM Studio、Unsloth、SGLang、Cactus、Baseten、Docker、MaxText、Tunix、Keras 等生态工具
获取模型权重：通过 Hugging Face、Kaggle 或 Ollama 下载
定制化微调：使用 Google Colab、Vertex AI 乃至游戏 GPU 等平台训练适配模型
谷歌云部署扩展：Vertex AI、Cloud Run、GKE、主权云、TPU 加速服务为受监管工作负载提供合规保障，突破算力天花板（了解谷歌云入门指南）
跨硬件平台加速：开箱即用优化支持 NVIDIA AI 基础设施（Jetson Orin Nano 至 Blackwell GPU）、通过开源 ROCm™ 栈集成 AMD GPU，或在 Trillium/Ironwood TPU 上实现大规模高效部署
参与影响挑战：加入 Kaggle 平台的 Gemma 4 公益挑战赛，构建创造积极社会影响的产品

英文来源：

Gemma 4: Byte for byte, the most capable open models
Today, we are introducing Gemma 4 — our most intelligent open models to date. Purpose-built for advanced reasoning and agentic workflows, Gemma 4 delivers an unprecedented level of intelligence-per-parameter. This breakthrough builds on incredible community momentum: since the launch of our first generation, developers have downloaded Gemma over 400 million times, building a vibrant Gemmaverse of more than 100,000 variants. We listened closely to what innovators need next to push the boundaries of AI, and Gemma 4 is our answer: breakthrough capabilities made widely accessible under an Apache 2.0 license.
Open model performance vs size on Arena.ai’s chat arena as of 4/1.
Built from the same world-class research and technology as Gemini 3, Gemma 4 is the most capable model family you can run on your hardware. They complement our Gemini models, giving developers the industry's most powerful combination of both open and proprietary tools.
Industry-leading capabilities and mobile-first AI
We are releasing Gemma 4 in four versatile sizes: Effective 2B (E2B), Effective 4B (E4B), 26B Mixture of Experts (MoE) and 31B Dense. The entire family moves beyond simple chat to handle complex logic and agentic workflows. Our larger models deliver state-of-the-art performance for their sizes, with the 31B model currently ranking as the #3 open model in the world on the industry-standard Arena AI text leaderboard, and the 26B model securing the #6 spot. There, Gemma 4 outcompetes models 20x its size. For developers, this new level of intelligence-per-parameter means achieving frontier-level capabilities with significantly less hardware overhead.
At the edge, our E2B and E4B models redefine on-device utility, prioritizing multimodal capabilities, low-latency processing and seamless ecosystem integration over raw parameter count.
Powerful, accessible, open
To power the next generation of pioneering research and products, we've sized the Gemma 4 models specifically to run and fine-tune efficiently on hardware — from billions of Android devices worldwide, to laptop GPUs, all the way up to developer workstations and accelerators.
By using these highly optimized models, you can fine-tune Gemma 4 to achieve state-of-the-art performance on your specific tasks. We've already seen incredible success with this approach; for instance, INSAIT created a pioneering Bulgarian-first language model (BgGPT), and we worked with Yale University on Cell2Sentence-Scale to discover new pathways for cancer therapy, among many others.
Here is what makes Gemma 4 our most capable open model family yet:

Advanced reasoning: Capable of multi-step planning and deep logic, Gemma 4 demonstrates significant improvements in math and instruction-following benchmarks that require it.
Agentic workflows: Native support for function-calling, structured JSON output, and native system instructions enables you to build autonomous agents that can interact with different tools and APIs and execute workflows reliably.
Code generation: Gemma 4 supports high-quality offline code, turning your workstation into a local-first AI code assistant.
Vision and audio: All models natively process video and images, supporting variable resolutions, and excelling at visual tasks like OCR and chart understanding. Additionally, the E2B and E4B models feature native audio input for speech recognition and understanding.
Longer context: Process long-form content seamlessly. The edge models feature a 128K context window, while the larger models offer up to 256K, allowing you to pass repositories or long documents in a single prompt.
140+ languages: Natively trained on over 140 languages, Gemma 4 helps developers build inclusive, high-performance applications for a global audience.
Versatile models for diverse hardware
We are releasing the Gemma 4 model weights in sizes tailored for specific hardware and use cases, ensuring you get frontier-class reasoning wherever you need it:
26B and 31B models: Frontier intelligence, offline on your personal computers
Optimized to provide researchers and developers with state-of-the-art reasoning on accessible hardware, our unquantized bfloat16 weights fit efficiently on a single 80GB NVIDIA H100 GPU. For local setups, quantized versions run natively on consumer GPUs to power your IDEs, coding assistants and agentic workflows. Our 26B Mixture of Experts (MoE) focus on latency, activating only 3.8 billion of its total parameters during inference to deliver exceptionally fast tokens-per-second, while our 31B Dense is maximizing raw quality and provides a powerful foundation for fine-tuning.
These models were evaluated against a large collection of different datasets and metrics to cover different aspects of text generation. See additional benchmarks in our model card.
E2B and E4B models: A new level of intelligence for mobile and IoT devices
Engineered from the ground up for maximum compute and memory efficiency, these models activate an effective 2 billion and 4 billion parameter footprint during inference to preserve RAM and battery life. In close collaboration with our Google Pixel team and mobile hardware leaders like Qualcomm Technologies and MediaTek, these multimodal models run completely offline with near-zero latency across edge devices like phones, Raspberry Pi, and NVIDIA Jetson Orin Nano. Android developers can now prototype agentic flows in the AICore Developer Preview today for forward-compatibility with Gemini Nano 4.
An open-source license
You gave us feedback, and we listened. Building the future of AI requires a collaborative approach, and we believe in empowering the developer ecosystem without restrictive barriers. That's why Gemma 4 is released under a commercially permissive Apache 2.0 license.
This open-source license provides a foundation for complete developer flexibility and digital sovereignty; granting you complete control over your data, infrastructure, and models. It allows you to build freely and deploy securely across any environment, whether on-premises or in the cloud.
Built on a foundation of trust and safety
These models undergo the same rigorous infrastructure security protocols as our proprietary models. By choosing Gemma 4, enterprises and sovereign organizations gain a trusted, transparent foundation that delivers state-of-the-art capabilities while meeting the highest standards for security and reliability.
An ecosystem of choices
Start experimenting in seconds: Get instant access to Gemma 4 and begin building right away. Explore Gemma 4 in Google AI Studio (31B and 26B MoE) or in Google AI Edge Gallery (E4B and E2B). For Android development, use it to power Agent Mode in Android Studio, and start building apps for production on Android with the ML Kit GenAI Prompt API.
Use your favorite tools: With day-one support for Hugging Face (Transformers, TRL, Transformers.js, Candle), LiteRT-LM, vLLM, llama.cpp, MLX, Ollama, NVIDIA NIM and NeMo, LM Studio, Unsloth, SGLang, Cactus, Baseten, Docker, MaxText, Tunix, Keras, you have the flexibility to choose the best tools for your project.
Download the models: Get the model weights from Hugging Face, Kaggle or Ollama.
Customize Gemma 4 to your specific needs: Train and adapt the model using your preferred platform, like Google Colab, Vertex AI or even your gaming GPU.
Scale to production on Google Cloud: While local on-device inference is ideal for offline use, Google Cloud removes all compute ceilings. Deploy your way through Vertex AI, Cloud Run, GKE, Sovereign Cloud, TPU-accelerated serving and the highest compliance guarantees for regulated workloads. Learn more about getting started on Google Cloud here.
Accelerate your AI development across multiple hardware platforms: Gemma 4 is optimized for industry-leading hardware out of the box. Experience maximum performance on NVIDIA AI infrastructure from NVIDIA Jetson Orin Nano to Blackwell GPUs, integrate with AMD GPUs via the open-source ROCm™ stack, or deploy on Trillium and Ironwood TPUs for massive scale and efficiency.
Compete for impact: Join the Gemma 4 Good Challenge on Kaggle to build products that create meaningful, positive change in the world.

谷歌新消息

文章目录

📚 推荐阅读

扫描二维码，在手机上阅读