通过全球合作与开放资源催化科学影响力

内容总结:
谷歌发布开放科学成果:全球合作伙伴与开源资源推动科研突破
2026年5月1日,谷歌研究科学团队宣布,通过构建全球合作伙伴网络与开放共享核心工具及数据,已在基因组学、神经科学、气候建模、生物多样性及医疗健康等领域取得重大科研进展。该团队强调,开放科学的核心在于让全球科研界能够复制、扩展并推动前沿发现。
在合作伙伴方面,谷歌与加州大学圣克鲁兹分校基因组研究所、贾内利亚研究园区、奥地利科学技术研究所、澳大利亚联邦科学与工业研究组织(CSIRO)及印度全印医学科学院等机构深度合作,并支持人类泛基因组研究联盟、地球生物基因组计划及美国国立卫生研究院“大脑计划”等大型国际科学联盟。
开源工具与数据方面,谷歌发布了多个关键成果:
- 基因组学:深度学习工具套件(DeepVariant、DeepConsensus、DeepPolisher)已助力全球处理250万人的外显子组与全基因组数据。
- 神经科学:自动化重建工具及H01(1.4PB人脑组织样本,被访问超20万次)与MICrONS数据集,帮助科学家分析大规模脑组织重建。
- 地球与大气建模:Open Buildings数据集覆盖5800万平方公里、18亿栋建筑;NeuralGCM混合大气模型用于洪水预报,覆盖150个国家20亿人口;FireBench数据集助力野火研究。
- 生物多样性:SpeciesNet模型可对2498种野生动物进行分类。
- 医疗健康:HAI-DEF系列(含MedGemma模型)下载量超480万次;Open Health Stack已在10多个国家部署,惠及超6500万人。
实际应用案例凸显开放科学的影响力:
- 芝加哥大学利用NeuralGCM将印度季风预测提前至一个月,通过短信服务覆盖3800万印度农民,助力优化种植决策。
- 联合国难民署利用Open Buildings数据集优化灾后调查采样。
- 约翰·霍普金斯大学利用H01数据集发现新型神经元通信方式,或改写对大脑连接的理解。
- 与斯坦福大学及加州大学圣克鲁兹分校合作,实现8小时内完成全基因组遗传诊断,创吉尼斯世界纪录。
- 赞比亚Dawa Health利用MedSigLP开发AI宫颈癌筛查工具,助产士可通过WhatsApp上传图像实时识别异常。
- 印度全印医学科学院利用MedGemma开发门诊分诊与皮肤病筛查应用;马来西亚卫生部将其用于150余项临床指南的对话式决策支持。
- 维克森林大学利用SpeciesNet数天内分析超1100万张野生动物图像;奥塔哥大学利用DeepVariant绘制所有现存鸮鹦鹉的遗传图谱,助其种群从51只增至252只。
- CSIRO利用谷歌地球模型与基因组工具筛选耐热巨藻,助力澳大利亚濒危物种恢复。
- 地球生物基因组计划利用谷歌开源工具,已完成13个标志性濒危物种的全基因组测序,另有150个物种进行中。
谷歌研究团队表示,随着AI赋能科学时代到来,智能代理工作流将促使科学家把专业知识转化为可扩展工具,加快全球科研协作与成果共享。未来将继续打造支持新发现周期的基础设施,期待全球科学界的下一突破。
中文翻译:
通过全球合作与开放资源催化科学影响力
2026年5月1日
谷歌研究科学团队
我们践行开放科学的方针,建立在负责任、包容且严谨的研究原则之上,旨在赋能全球科研群体,推动跨学科领域的高影响力发现,加速全人类的进步。
快速导航
一项科学突破只有在赋能他人复现并拓展研究成果、进一步突破科学边界时,方能释放其全部潜力。在谷歌研究院,我们深知开源软件和开放数据集是现代科学的驱动力。我们相信,以负责任的方式创建这些资源,并通过与全球科学界的合作加以维护,正是协作精神的体现。由此,我们坚守开放科学的原则,确保创新不会成为孤立的成果,而是推动全球进步的催化剂。
从重塑自动化语言处理的Transformer架构,到改变医学、基因组学、神经科学、气候、能源以及物理科学、生命科学和社会科学等领域面貌的专业模型,我们为已分享的成果感到自豪,更欣慰于全球研究者正利用这些成果开启各自的突破性发现。这一开放方针与谷歌在更广泛领域的举措相辅相成,通过应用程序接口、出版物、学术会议、可信测试者计划及私有合作伙伴关系,共同参与并强化科研生态体系。
合作伙伴关系与生态协作
我们与众多跨学科领域及全球各地的专业机构展开合作,例如加州大学圣克鲁斯分校基因组学研究所、珍妮莉亚研究园区、奥地利科学技术研究所、群体基因组学中心、澳大利亚联邦科学与工业研究组织,以及全印医学科学研究所。
除单个机构外,我们积极支持承担全球性重大挑战的大型科学联盟,包括人类泛基因组研究联盟、地球生物基因组计划和美国国立卫生研究院大脑计划。
归根结底,我们的开放科学理念延伸至更广阔的生态系统,我们正投入资源,为科研开发者群体建设实践社区,首先在印度、韩国、日本和澳大利亚启动。
我们的开源工具与数据
过去十年间,我们开发、发布、维护并迭代了多项关键开源技术和开放获取数据集。至今,这些成果已赋能全球超过25万名研究者和开发者组成的活跃生态。
- 基因组学:我们的深度学习工具套件(包括DeepVariant、DeepConsensus和DeepPolisher)提升了从原始测序到最终组装的DNA分析质量。这些方法共同助力全球群体处理了250万人的外显子和全基因组数据。
- 神经科学:我们用于连接组数据自动化重建、分析与可视化的方法及工具包括填充网络、Neuroglancer和TensorStore。这些技术使科学家能够无缝分割、导航和分析PB级高分辨率脑组织重建数据。其中包含两个关键公开数据集:H01(1.4PB人类脑组织样本,访问超20万次)和MICrONS(小鼠视觉皮层最大规模连接图谱与功能图谱)。
- 地球与大气建模:我们发布了Open Buildings(覆盖非洲、南亚、东南亚、拉丁美洲和加勒比地区5800万平方公里的18亿建筑检测数据);Caravan(社区驱动的大样本水文数据集,作为洪水预报项目组成部分,现可为150个国家20亿人口提供重大洪水预测);Groundsource(城市暴雨洪水数据集,基于Gemini对150多个国家20年公共数据的分析,涵盖260万历史洪水事件);以及NeuralGCM(完全可微的混合大气模型)。这些也是谷歌地球AI中地理空间工作的一部分。此外,我们还发布了FireBench(高分辨率合成数据集,用于推进野火研究)和手机测量电离层条件的数据集及其随时间变化的可视化配对数据。
- 生物多样性:SpeciesNet是全球规模模型,可对野生动物相机图像中的2498种动物类别(包括哺乳动物、鸟类和爬行动物)进行分类。
- 医疗健康:我们的健康AI开发者基础模型提供一套开放权重基础模型(含MedGemma),专为多模态医疗文本、临床推理和影像理解设计,迄今下载量超480万次。开源工具套件Open Health Stack使开发者能基于现代数字医疗标准,更快更轻松地构建安全、可离线运行的下一代数字健康解决方案。由OHS驱动的医疗应用已在10多个国家部署,惠及超6500万人。
开放科学驱动的真实世界影响力
衡量我们开放科学理念的真正标尺,在于合作伙伴和最终用户所取得的实际影响。以下案例详述了我们的开放工具和数据集如何促成更多突破,并用于帮助全球社区。
赋能全球科学
- 与加州大学圣克鲁斯分校基因组学研究所合作,我们开发了改进泛基因组参考的方法,将基因变异识别错误减少50%。这项工作为人类泛基因组研究联盟及其在基因组参考和工作流程中更好体现人类多样性的努力做出贡献。
- 芝加哥大学“以人为本天气预报计划”利用NeuralGCM和欧洲中期天气预报中心系统,提前一个月预测印度季风爆发,甚至捕捉到季风进程中的异常干旱期。与印度农业与农民福利部合作,这些提前预报通过短信成功送达3800万印度农民,助力其优化种植决策。
- 包括联合国难民署在内的全球组织,利用Open Buildings数据集优化流离失所人群的灾害响应调查抽样。该数据集还推动了其他科学研究,例如评估全球南方国家海平面上升对建筑的风险。
推动医疗健康进步
- 约翰霍普金斯大学研究人员利用H01人类脑重建数据集,发现了一种新的神经元通信形式。这一发现表明当前对大脑组织的认知可能不完整,忽略了一个隐藏的连接层,对阿尔茨海默病等疾病具有启示意义。
- 我们与斯坦福大学医学院和加州大学圣克鲁斯分校合作,调整基因组分析方法,在最紧急的病例中寻找遗传病因。该项目实现了挽救生命的干预措施,并创下全基因组测序在8小时内完成遗传诊断的吉尼斯世界纪录。
- 与加州大学圣克鲁斯分校和美国国立卫生研究院国家癌症研究所合作,我们共同创建了公开的癌症基因组序列集,用于方法开发与评估。我们还合作开发了DeepSomatic,更精准地发现癌症变异。堪萨斯城儿童慈善医院已部署该工具,发现了之前遗漏的癌症病例变异。
- HAI-DEF通过提供开放权重模型,推动了全球广泛参与并产生切实临床影响,特别是在中低收入国家实现了医疗AI开发的民主化。例如,赞比亚的Dawa Health利用MedSigLP构建了AI驱动的多语言宫颈癌教育与筛查工具,助产士可通过WhatsApp上传阴道镜图像,实时识别异常。
- Open Health Stack使全球开发者得以应对医疗健康差距,尤其在资源匮乏地区。例如,Ona公司开发的应用程序帮助医护人员从纸质记录转向数字解决方案。OHS加速了Ona的应用开发,使其采用互操作数据标准,医护人员随后用这些标准在 underserved 社区提供更好护理。
- 在新德里,全印医学科学研究所正利用MedGemma开发门诊分诊和皮肤科筛查应用。在马来西亚,MedGemma驱动着Ask CPG——该国150多份临床实践指南的对话式接口。马来西亚卫生部表示,该工具简化了日常决策支持中对国家临床实践指南的查阅。MedGemma还赋能全球独立开发者,构建临床分诊、医学文档理解和诊断决策支持应用。
赋能生物多样性与保护
- 自2010年起,Snapshot Serengeti相机陷阱项目已捕捉到超过1100万张非洲热带草原野生动物图像。利用SpeciesNet,维克森林大学研究人员现在仅需数天即可分析这一庞大数据集,并且通过笔记本电脑运行模型,他们能根据最新野生动物踪迹实时重新部署相机,收集针对性数据。
- 奥塔哥大学研究人员致力于保护极度濒危的鸮鹦鹉(一种具有重要文化意义的不会飞鸟类)。在与谷歌无关的独立工作中,研究人员对DeepVariant进行再训练,使其针对鸮鹦鹉群体优化。该模型使他们能够为每只存活鸮鹦鹉建立遗传图谱,以此指导繁殖策略和病鸟护理计划,帮助鸮鹦鹉种群数量从最低51只增至252只。
- 澳大利亚联邦科学与工业研究组织与谷歌合作,支持濒危的澳大利亚和塔斯马尼亚巨藻种群恢复工作。通过利用谷歌地球模型和卫星图像识别幸存藻群,并使用谷歌开放基因组工具创建参考基因组,研究人员将遗传变异与耐热数据关联起来。这使得他们能够选择性培育对海洋温度上升具有抵抗力的藻株。
- 脊椎动物基因组计划和地球生物基因组计划正使用我们的开源基因组学工具,致力于实现其宏大的目标——对地球上所有非细菌物种进行基因组测序。在谷歌.org资助下,洛克菲勒大学的研究人员已为13种标志性濒危物种提供完整基因组,另有150个物种的测序工作正在进行。
展望未来
我们与开放科学界的合作是一项不断加速的使命。随着我们深入进入AI赋能科学的时代,生成式AI正深刻改变研究人员的工作与协作方式,这令我们备受鼓舞。我们相信,代理式工作流将使科学家能够将自己的知识编码为专业技能,并将其研究方法转化为可访问、可扩展的工具。这一转变将赋能全球群体快速复现研究成果、扩展复杂方法论,并在全球范围内分享工作。
在这个快节奏的新范式中,沟通与协作比以往任何时候都更为关键。开源软件和开放数据集是这一生态系统的基础。今天我们庆祝的突破,仅仅是通往一个创新更快速、科学知识全球共享的世界的初步蓝图。
在谷歌研究院,我们将继续构建支持这一新发现时代的工具与基础设施。我们期待见证全球科学界接下来将取得的成就。
致谢
我们特别感谢众多全球研究合作伙伴,以及更广泛的科学界用户群体——正是他们基于我们的开放模型、基础设施、数据集和其他工具进行发现、开拓、试点并实施创新,创造了积极的全球社会影响。
英文来源:
Catalyzing scientific impact through global partnerships and open resources
May 1, 2026
The Google Research Science team
Our approach to open science is built on principles of responsible, inclusive, and rigorous research, empowering a global community to drive high-impact discoveries across disciplines and accelerate progress for all.
Quick links
A scientific breakthrough reaches its full potential only when it empowers others to replicate and expand upon findings, pushing the boundaries of science even further. At Google Research, we recognize that open-source software and open-access datasets are drivers of modern science. We believe that creating these resources responsibly and maintaining them through partnerships with the global scientific community embodies the spirit of collaboration. In this way, we uphold the principles of open science, ensuring that innovation is not a siloed event but a catalyst for worldwide progress.
Whether it’s the Transformer architecture that reshaped automated language processing, or our specialized models transforming medicine, genomics, neuroscience, climate, energy, and a host of other efforts across the physical, life, and social sciences, we are proud of the work we’ve shared and how it’s being used by researchers around the globe to unlock their own groundbreaking discoveries. This open approach complements our breadth of initiatives across Google to engage and strengthen the research and science ecosystem, including through APIs, publications, conferences, trusted tester programs and private partnerships.
Partnerships and ecosystem collaboration
We collaborate with numerous specialized organizations across scientific disciplines and global regions, such as the University of California Santa Cruz (UCSC) Genomics Institute, Janelia Research Campus, Institute of Science & Technology Austria (ISTA), the Centre for Population Genomics, CSIRO - Australia’s national science agency, and the All India Institute of Medical Sciences (AIIMS).
Beyond individual organizations, we actively support widespread scientific consortia undertaking monumental, global challenges, including the Human Pangenome Research Consortium, the Earth BioGenome Project and the NIH BRAIN Initiative.
Ultimately, our open-science philosophy extends to the broader ecosystem and we are investing in building communities of practice for individual scientific developers, starting in India, Korea, Japan and Australia.
Our open-source tools and data
Over the last decade, we have developed, released, maintained and evolved several key open-source technologies and open access datasets. To date these have empowered an active ecosystem of more than 250,000 researchers and developers worldwide.
- Genomics: Our suite of deep learning tools, including DeepVariant, DeepConsensus and DeepPolisher, improve DNA analysis from raw sequencing to final assemblies. These methods have collectively enabled the global community to process the exomes and whole genomes of 2.5 million individuals.
- Neuroscience: Our methods and tools for automated reconstruction, analysis, and visualization of connectomic data include flood-filling networks, Neuroglancer, and TensorStore. These technologies allow scientists to seamlessly segment, navigate, and analyze petascale, high-resolution brain tissue reconstructions. This includes two key publicly available datasets: H01, a 1.4 petabyte sample of human brain tissue accessed over 200k times, and MICrONS, the largest wiring diagram and functional map of the mouse visual cortex.
- Earth & Atmospheric Modeling: We have released Open Buildings, which contains 1.8 billion building detections, across an inference area of 58M km2 covering Africa, South Asia, South-East Asia, Latin America and the Caribbean; Caravan, a community-driven dataset for large-sample hydrology, as part of our flood forecasting effort which now provides prediction in 150 countries covering 2B people for the most significant floods, and the Groundsource dataset for urban flash floods, comprising of 2.6 million historical flood events derived using Gemini on 20 years of public data spanning more than 150 countries; and NeuralGCM, a fully differentiable hybrid atmospheric model. These are also part of our geospatial efforts within Google Earth AI. We have also released FireBench, a high-resolution, synthetic dataset designed to advance wildfire research and a dataset of ionosphere conditions measured using phones, along with a paired visualization of the dataset over time.
- Biodiversity: SpeciesNet is a global-scale model that classifies 2,498 animal categories, including mammals, birds, and reptiles in wildlife camera images.
- Healthcare: Our Health AI Developer Foundations (HAI-DEF) provides a suite of open-weight foundation models — including MedGemma — specialized for multimodal medical text, clinical reasoning, and imaging comprehension. It has more than 4.8M downloads to date. Open Health Stack (OHS) is a suite of open-source tools that make it faster and easier for developers to build secure, offline-capable next-generation digital health solutions based on modern digital healthcare standards. Healthcare applications powered by OHS have been deployed in more than 10 countries with over 65 million beneficiaries.
Real-world impact powered by open science
The true measure of our open-science philosophy is the real-world impact achieved by our partners and end users. Below are some examples detailing how our open tools and datasets have enabled further breakthroughs and been used to help communities across the globe.
Enabling global science - In partnership with the UCSC’s Genomics Institute, we have developed methods to improve pangenome references, and reduce errors when identifying genetic variants by 50%. This work contributes to the Human Pangenome Research Consortium and their effort to better represent human diversity in genomics references and workflows.
- The Human-Centered Weather Forecasts Initiative at the University of Chicago used NeuralGCM and the European Centre for Medium-Range Weather Forecasts (ECMWF) systems to predict the onset of the Indian monsoon up to a month in advance, even capturing an unusual dry spell in the progression of the monsoon. In partnership with the Indian Ministry of Agriculture and Farmers' Welfare, these advance forecasts were successfully delivered via SMS to 38 million farmers in India, empowering them to optimize their crop planting decisions.
- Global organizations, including the UN Refugee Agency (UNHCR) have optimized disaster response survey sampling for displaced populations using the Open Buildings dataset. This dataset has also enabled additional scientific research, including assessing building risk from sea level rise in the Global South.
Enabling health advances - Researchers at Johns Hopkins University leveraged the H01 human brain reconstruction dataset to identify a new form of neuronal communication, a discovery which suggests that the current understanding of the brain’s organization may be incomplete, overlooking a hidden layer of connectivity, with implications for conditions like Alzheimer's.
- We partnered with Stanford University School of Medicine and UCSC, to adapt genome analyses to find the cause of genetic disease in the most time-critical cases. The program enabled life-saving interventions and set a new Guinness World Record for achieving genetic diagnosis by whole genome sequencing in less than 8 hours.
- In partnership with UCSC and the National Cancer Institute at the NIH, we co-created a publicly available set of cancer genome sequences for method development and evaluation. We also collaboratively developed DeepSomatic to more accurately find cancer variants, which Children’s Mercy Hospital deployed to discover previously missed variants in cancer cases.
- HAI-DEF has driven widespread global engagement and tangible clinical impact by providing open-weight models that democratize medical AI development, especially in low- and middle-income countries. For instance, Zambia-based Dawa Health used MedSigLP to build an AI-powered multilingual cervical cancer education and screening tool that allows midwives to upload colposcopy images via WhatsApp to identify abnormalities in real time.
- Open Health Stack has enabled developers globally to address healthcare gaps, particularly in low resource settings. For example, Ona builds apps that allow health workers to switch from paper-based records to digital solutions. OHS accelerated Ona’s app development and allowed them to adopt interoperable data standards, which healthcare workers then used to deliver better care in underserved communities.
- In New Delhi, AIIMS is using MedGemma to develop applications for outpatient triage and dermatology screening. In Malaysia, MedGemma powers Ask CPG, a conversational interface to the country’s 150+ clinical practice guidelines that the Ministry of Health in Malaysia said has eased navigating the country’s clinical practice guidelines for day-to-day decision support. MedGemma is also empowering individual developers worldwide to build applications for clinical triage, medical document understanding, and diagnostic decision support.
Enabling biodiversity and conservation - Since 2010, the Snapshot Serengeti camera trapping program has captured over 11 million wildlife images from the African savanna. Using SpeciesNet, researchers at Wake Forest University can now analyze this large dataset in just days, and by running the model from a laptop, they can use the latest wildlife sightings to redeploy cameras in real time to collect targeted data.
- Researchers at the University of Otago are working to preserve the critically endangered kākāpō, a flightless bird of significant cultural importance. Working independently of Google, the researchers re-trained DeepVariant to optimize it for the kākāpō population. This model enabled them to create a genetic map of every living kākāpō to inform breeding strategies and care plans for sick birds, helping to expand the population from a low of 51 to 252 birds.
- Researchers at CSIRO are working with Google to support repopulation efforts for endangered Australian and Tasmanian giant kelp populations. By using Google Earth models and satellite imagery to identify surviving patches, and Google’s open genomics tools to create reference genomes, researchers are linking genetic variants to heat tolerance data. This allows researchers to selectively breed kelp strains that are resilient to rising ocean temperatures.
- The Vertebrate Genomes Project and the Earth BioGenome Project are using our open source genomics tools to make progress toward their monumental goal to sequence the genome of every non-bacterial species on Earth. Bolstered with funding awarded by Google.org to The Rockefeller University, researchers have made full genomes available for 13 iconic endangered species, with an additional 150 species underway.
Looking ahead
Our partnership with the open science community is an accelerating mission. As we transition deeper into the era of AI-enabled science, we are inspired by the way generative AI is profoundly changing how researchers work and collaborate. We believe that agentic workflows will allow scientists to encode their knowledge into specialized skills and transform their methods into accessible, scalable tools. This shift will empower the global community to rapidly reproduce findings, extend complex methodologies, and share their work globally.
In this fast-paced new paradigm, communication and collaboration are more critical than ever. Open-source software and open datasets serve as the essential foundation for this ecosystem. The breakthroughs we celebrate today are merely the initial blueprints for a world with faster innovation and universal sharing of scientific knowledge.
At Google Research, we will continue to build the tools and infrastructure that support this new era of discovery. We look forward to seeing what the global scientific community achieves next.
Acknowledgments
We give special thanks to our many global research partners and to the wider scientific community of users that builds upon our open models, infrastructure, datasets, and other tools to make discoveries and to pioneer, pilot, and implement innovations that create positive global societal impact.