Vibe Coding XR:借助XR模块与Gemini,加速AI与XR融合的原型开发进程。

内容总结:
谷歌推出Vibe Coding XR:融合AI与扩展现实,60秒将创意转化为交互应用
2026年3月25日,谷歌XR团队的交互感知与图形负责人Ruofei Du及产品经理Benjamin Hersh宣布推出一项名为“Vibe Coding XR”的创新工作流程。该技术旨在显著降低扩展现实(XR)应用开发门槛,让创作者无需深厚编程或引擎知识,即可快速构建交互式、具备物理感知的XR体验。
打破XR开发壁垒,用自然语言直接生成应用
传统XR原型开发通常需要整合复杂的感知管线、游戏引擎和底层传感器,过程繁琐且耗时。Vibe Coding XR通过结合谷歌的Gemini多模态大模型与开源框架XR Blocks,实现了“氛围式编程”。用户只需用自然语言描述想法(例如“创建一朵美丽的蒲公英”或“设计一个交互式物理天平实验”),系统便能在60秒内自动生成功能完整的Android XR应用,并可直接在头显设备或桌面模拟环境中运行与测试。
核心工作流程:从描述到沉浸式体验
该流程设计简洁直观:
- 描述创意:用户在Android XR头显或桌面Chrome浏览器中,通过文字或语音输入提示词。
- AI生成:Gemini模型基于对XR Blocks框架和范例代码的理解,自动规划并配置场景、感知模块与交互逻辑。
- 即时体验与迭代:生成的应用可立即在头显中通过手势交互启动。用户可快速预览效果,并通过分享链接与他人协作。
应用场景广泛,助力教育、科研与创意表达
演示案例展现了该技术的强大灵活性:
- 数学教学:可视化欧拉定理,通过高亮不同几何体的点、线、面进行立体教学。
- 物理化学实验:创建交互式天平,让用户通过配重理解平衡原理;或安全模拟甲烷、乙烯等气体的燃烧实验,呈现逼真的三维体积火焰与现象。
- 量子力学演示:生动呈现“薛定谔的猫”思想实验,用户可通过手势交互体验量子态的叠加与坍缩。
- 互动游戏:快速将经典的“Chrome恐龙游戏”改编为沉浸式XR版本,大幅缩短原型开发时间。
技术基石:大模型与专业化框架的深度结合
Vibe Coding XR的成功依赖于两大支柱:
- Gemini的长上下文与推理能力:通过精心设计的系统提示词,将Gemini“培养”成一位遵循XR最佳实践(如空间布局、交互距离)的领域专家。
- XR Blocks开源框架:基于WebXR、three.js等开放网络技术构建,其核心引擎封装了环境感知、XR交互、AI集成等复杂空间计算子系统,为Gemini提供稳定可靠的代码生成基础。
初步测试显示良好前景,团队持续优化
团队通过内部研讨会收集了60个提示词构成测试集VCXR60。经过11个主要版本的迭代优化,当前系统已能有效减少早期版本中因框架自身缺陷或模型“幻觉”产生的错误。评估表明,在处理涉及复杂动画和手部交互的提示时,使用更高级的“Pro模式”能获得更可靠的结果。
展望:开启空间计算“创意为王”的新时代
Vibe Coding XR标志着空间计算开发正从依赖专业技术向赋能普适创意转变。通过将大模型的推理能力与XR Blocks的高层抽象结合,它正在弥合灵感闪现与物理感知现实之间的鸿沟。
谷歌团队诚邀人机交互、人工智能与XR领域的研究者与开发者共同贡献于XR Blocks开源生态。相关框架与演示已公开,并将在2026年ACM CHI大会的谷歌展台进行现场展示。
中文翻译:
Vibe Coding XR:借助 XR Blocks 与 Gemini,加速 AI + XR 原型开发
2026年3月25日
杜若飞,谷歌 XR 交互感知与图形负责人;Benjamin Hersh,谷歌 XR 产品经理
Vibe Coding XR 是一个快速原型开发工作流,它将开源的 XR Blocks 框架与 Gemini Canvas 相结合,能够将用户指令转化为完全交互式、具备物理感知的 WebXR 应用,适用于 Android XR 平台。这使得创作者能够在桌面模拟环境和 Android XR 头显中快速测试智能空间体验。
大语言模型(LLM)和智能体工作流正在改变软件工程和创意计算。我们正见证一种向“氛围编程”的转变,即 LLM 直接将人类意图转化为可运行的代码。像 Gemini Canvas 这样的工具已经为 2D 和 3D 网页开发实现了这一点。然而,扩展现实(XR)领域仍然难以触及。XR 原型开发通常需要拼凑零散的感知管线、复杂的游戏引擎和底层的传感器集成。
快速、通过氛围编程创建的原型可以解决这个问题。它们能帮助经验丰富的开发者直接在头显中测试新的用户界面、3D 交互和空间可视化。这种快速验证可以为那些最终可能被放弃的想法节省数天的工作量。同时,它也使得构建展示自然科学和力学原理的互动教育体验变得更加容易。
今天,我们宣布推出 Vibe Coding XR 以弥合这一鸿沟。该工作流将 Gemini 作为创意伙伴,与我们基于网页的 XR Blocks 框架协同工作。通过将 Gemini 的长上下文推理能力与专门设计的系统提示词和精选代码模板相结合,该系统能自动处理空间逻辑。它能在 60 秒内将自然语言直接转化为功能完备、具备物理感知的 Android XR 应用。
我们的团队将在 2026 年 ACM CHI 大会的谷歌展台进行现场演示。您今天也可以在此处亲自尝试。
Vibe Coding XR 工作流程
在过去一年中,我们持续迭代设计和改进 Vibe Coding XR 流程,使其无缝且易于使用。以下是一个示例:
- 用户无需任何 XR 知识即可描述需求:用户在 Android XR 头显(如 Galaxy XR)上使用 Chrome 浏览器打开 XR Blocks Gem。他们通过键盘或语音输入指令,例如“创建一朵美丽的蒲公英”。用户也可以选择在桌面 Chrome 上创建 XR 应用,并使用 XR Blocks 内置的模拟器进行预览。
- Gemini 设计并实现 XR 体验:通过学习 XR Blocks 的示例,Gemini 利用其多步规划能力和高级推理来配置场景、感知和交互,然后构建交互式 XR 应用。
- 实时演示与快速迭代:在 Android XR 中,用户在“进入 XR”按钮上做出捏合手势,即可立即看到结果——一朵动画蒲公英,在捏合交互时被吹散。用户还可以进一步点击“分享”按钮,为其应用创建可分享的公开链接。
为了便于测试,我们还在桌面 Chrome 上提供了“模拟现实”环境。这使得创作者在将应用部署到 Android XR 设备之前,能够快速原型化和测试交互。许多高级感知功能,如深度感知、手部交互和物理效果,在 Android XR 设备上才能获得最佳体验。
Vibe Coding XR 技术简介
Vibe Coding XR 利用 Gemini 的长上下文能力和思维过程,使其扮演专家级 XR 设计师和工程师的角色。我们开发了一个专门的系统提示词,用 XR Blocks 架构和示例来“教导” Gemini,包括关于房间尺度 XR 环境的指南、包管理以及 XR 交互的最佳实践。
底层的 XR Blocks 框架建立在 WebXR、three.js 和 LiteRT.js 等易于使用的网络技术之上。其核心引擎管理着空间计算所需的各子系统间的复杂交互,包括环境感知、XR 交互和 AI 集成。我们的提示词上下文包含以下组成部分:
- 角色与指南:将 LLM 设定为遵循房间尺度 XR 环境最佳实践(例如,空间布局、比例和交互距离)的领域专家。
- 包管理:规定如何处理 XR Blocks 内的依赖关系,并强制执行推荐的默认样式。
- 源代码与模板:在上下文窗口中提供一组精选的 XR Blocks 模板和示例的源代码。这种基础设定减少了幻觉生成,并鼓励严格遵守有效的 API 调用和既定的设计模式。
应用场景:从指令到现实
我们通过氛围编程生成的示例原型,展示了 Vibe Coding XR 工作流程的多功能性:
- 数学导师:根据指令“可视化几何中的欧拉定理。用不同示例高亮解释顶点、边和面的概念。”Gemini 智能地选择四面体、立方体和八面体作为三个示例,在 XR 中将其可视化,并允许用户通过捏合切换不同的高亮策略。
- 物理实验室:根据指令“创建一个互动物理实验:天平两侧放置不同物体,使用不同重量(带有标签)的砝码来平衡天平。”XR 用户能够拾取和放置不同重量的砝码,直观地学习现实世界中基于杠杆原理的天平如何工作。
- 沉浸式化学:根据指令“创建一个互动化学实验室,用户可以通过捏合点燃并观察三个实验:点燃空气中的甲烷,并将一个干燥、冰冷的烧杯置于火焰上方:火焰呈淡蓝色,烧杯内壁形成液滴。点燃空气中的乙烯:火焰明亮,产生黑烟,并释放热量。点燃空气中的乙炔:火焰明亮,产生浓烟,并释放热量。”Gemini 设计了教育卡片,并为每个实验渲染了 3D 体积可视化效果,提供了一种安全、互动的混合现实体验。
- 薛定谔的猫:根据指令“在 XR 中美学地描绘薛定谔的猫。手指捏合使一只猫(详细的 3D 模型)进入盒子。接近盒子 50 厘米内会使盒子分裂成两个,分别向左和右移动,并且盒子的前壁变得透明。你同时看到盒子里的两个版本(死和活),演示量子态。当你再次捏合时,其中一个状态成为现实。盒子打开,你看到它要么活着要么死了。再次捏合可以重新开始。”Gemini 解释了量子态演示,用户通过捏合引导 3D 猫进入盒子。接近盒子会使其分裂,同时显示活猫和死猫状态,而再次捏合则使叠加态坍缩为单一现实。
- XR 运动:根据指令“让我用手打排球,并与我的环境碰撞。排球有纹理,从一个红色圆环中缓慢发射,更容易用手弹起。”Gemini 创建了一个有纹理的球,可以用于游戏,并对双手和物理环境做出反应。
- XR 恐龙:根据指令“在 XR 中创建 Chrome 恐龙游戏。恐龙在用户面前体素化,每个仙人掌在半透明的跑道上冲向用户。添加音效。”Gemini 创建了经典 Chrome 恐龙游戏的 XR 版本,将原型开发时间从数小时显著缩短至几分钟。
我们使用更具体的上下文进行提示,例如在 XR Blocks Gem 中使用 NASA 系外行星数据、程序化生成或创建高分辨率纹理,并展示了 Vibe Coding XR 流程中的迭代优化。
初步技术评估
评估 XR 应用一直是个挑战,很大程度上是因为它通常需要动手、在设备上进行测试和主观的人工评估。为了测试我们的 Vibe Coding XR 流程的有效性,我们构建了一个用于创建 XR 应用的指令初步数据集:VCXR60。
VCXR60 来源于四次时长一小时的内部研讨会,包含 20 位谷歌员工参与者提供的 60 条独特指令。使用该数据集,我们测量了推理时间和一次性成功率,特别关注在 XR Blocks 模拟现实环境中零错误执行的情况。例如,一个简单的指令“创建一朵美丽的蒲公英,当我拾起它时会被吹散”,在 Gemini Flash 中可能在 20 秒内完成,但与 Gemini Pro 相比,出现运行时错误的几率更高,因为处理动画和手部交互在思维过程中需要更多的令牌。
早期,我们发现大多数初始错误源于 XR Blocks 本身的缺陷或对不存在或已弃用 API 的幻觉,导致成功率约为 70%。这些见解推动了一个为期六个月的快速迭代周期。今天,经过 11 个主要版本发布,我们很高兴分享 XR Blocks Gem v0.11.0 在 VCXR60 数据集上的初步评估结果,作为基线参考。
我们给开发者的首要建议是:在进行高级 XR 原型开发时,使用“专业模式”能获得最可靠的结果。
结论
Vibe Coding XR 标志着迈向未来空间计算的关键一步,在这个未来中,限制因素不再是技术专长,而是创造力。通过将 LLM 的推理能力与 XR Blocks 的高级抽象相结合,我们弥合了转瞬即逝的想法与可触知、具备物理感知的现实之间的鸿沟。
我们的团队正持续致力于 XR Blocks 框架、基准测试和空间智能的研究。我们邀请人机交互(HCI)、AI 和 XR 社区为 Android XR 上的这个 XR Blocks 生态系统做出贡献。您可以通过快速链接访问开源框架并尝试实时演示,或前来 ACM CHI 2026 参观我们的演示。
致谢
此项工作是谷歌多个团队协作的成果。该项目的主要贡献者包括:杜若飞、Benjamin Hersh、David Li、钱勋、Nels Numan、周重义、陈彦和、陈星月、任佳豪、Robert Timothy Bettridge、Faraz Faruqi、陈向“Anthony”、Steve Toh 和 David Kim。以下研究人员和工程师对 XR Blocks 框架做出了贡献:David Li 和杜若飞(同等主要贡献),Nels Numan、钱勋、陈彦和、周重义(同等次要贡献,按字母顺序排列),以及 Evgenii Alekseev、Geonsun Lee、Alex Cooper、Brandon Jones、Min Xia、Scott Chung、Jeremy Nelson、袁秀秀、Jolica Dias、Tim Bettridge、Benjamin Hersh、Michelle Huynh、Konrad Piascik、Ricardo Cabello 和 David Kim。我们还要感谢 Gemini Canvas 和 AI Studio 团队的支持,包括但不限于:Tim Bettridge、李岩、Daniel Marques、Deven Tokuno、Levent Yilmaz、Saravana Rathinam、Samuel Petit、Mike Taylor-Cai、Ammaar Reshi 和 Robert Berry。我们要感谢 Mahdi Tayarani、Max Dzitsiuk、Jim Ratcliffe、Patrick Hackett、Seeyam Qiu、Coco Fatus、Alon Hetzroni、Aaron Kim、杨英华、Brian Collins、Eric Gonzalez、Nicolás Peña Moreno、张一档、Jamie Pepper、何宇豪、李逸飞、刘子毅、金晶对我们早期提案和 WebXR 实验的反馈与讨论。我们感谢 Tim Herrmann 和 Andrew Helton 的审慎评审。我们感谢 Maryam Sanglaji、Max Spear、Adarsh Kowdle、Guru Somadder、Shahram Izadi 的方向性反馈和贡献。
英文来源:
Vibe Coding XR: Accelerating AI + XR prototyping with XR Blocks and Gemini
March 25, 2026
Ruofei Du, Interactive Perception & Graphics Lead, and Benjamin Hersh, Product Manager, Google XR
Vibe Coding XR is a rapid prototyping workflow that empowers Gemini Canvas with the open-source XR Blocks framework to translate user prompts into fully interactive, physics-aware WebXR applications for Android XR, allowing creators to quickly test intelligent spatial experiences in both simulated environments on desktop and on Android XR headsets.
Large language models (LLMs) and agentic workflows are changing software engineering and creative computing. We are seeing a shift toward “vibe coding”, where LLMs turn human intent directly into working code. Tools like Gemini Canvas already make this possible for 2D and 3D web development. However, extended reality (XR) remains difficult to access. Prototyping in XR typically requires piecing together fragmented perception pipelines, complex game engines, and low-level sensor integrations.
Quick, vibe-coded prototypes can solve this problem. They help experienced developers test new UIs, 3D interactions, and spatial visualizations directly in a headset. This rapid validation can save days of work on ideas that might eventually be discarded. It also makes it easier to build interactive educational experiences that demonstrate natural science and mechanics.
Today, we are announcing Vibe Coding XR to bridge this gap. This workflow uses Gemini as a creative partner alongside our web-based XR Blocks framework. By combining Gemini’s long-context reasoning with specialized system prompts and curated code templates, the system handles spatial logic automatically. It translates natural language directly into functional, physics-aware Android XR apps in under 60 seconds.
Our team will present an onsite demonstration at the Google Booth at ACM CHI 2026. You can also try it out here today.
The Vibe Coding XR workflow
Over the last year, we have been iteratively designing and improving the Vibe Coding XR journey to be seamless and accessible. Here’s an example:
- Users describe what they want without any prior knowledge of XR: A user opens the XR Blocks Gem with Chrome on an Android XR headset (such as Galaxy XR). They type a prompt with a keyboard or their voice, such as “Create a beautiful dandelion.” Optionally, they can use Chrome on desktop to create the XR application and preview with XR Blocks’ built-in simulator.
- Gemini designs and implements the XR experience: Learning from samples of XR Blocks, Gemini uses their multi-step planning abilities and advanced reasoning to configure the scene, perception, and interaction, then build interactive XR applications.
- Live demo with rapid iteration: In Android XR, the user performs a pinch gesture at the “Enter XR” button to instantly see the result — an animated dandelion that blows away upon a pinch interaction. Users can further click the “Share” button to create a shareable public link for their app.
To facilitate easier testing, we also provide a “simulated reality” environment on the desktop Chrome. This allows creators to rapid-prototype and test interactions prior to deploying them on Android XR devices. Many advanced perceptual features such as depth sensing, hands interaction, and physics are best experienced on Android XR.
Technical brief of Vibe Coding XR
Vibe Coding XR leverages the long-context capabilities and thinking process of Gemini to function as an expert XR designer and engineer. We developed a specialized system prompt that “teaches” Gemini with the XR Blocks architecture and samples, including guidelines for room-scale XR environments, package management, and best practices for XR interaction.
The underlying XR Blocks framework is built upon accessible web technologies like WebXR, three.js, and LiteRT.js. Its core engine manages the complex interplay of subsystems required for spatial computing, including environmental perception, XR interaction, and AI integration. Our prompt context includes the following components: - Persona & guidelines: Establishes the LLM as a domain expert following best practices for room-scale XR environments (e.g., spatial layout, scale, and interaction distances).
- Package management: Specifies how dependencies within XR Blocks should be handled and enforces recommended default styles.
- Source code & templates: Provides the source code of a curated set of XR Blocks templates and samples within the context window. This grounding reduces hallucination and encourages strict adherence to valid API calls and established design patterns.
Application scenarios: From prompt to reality
We demonstrated the versatility of the Vibe Coding XR workflow with example prototypes generated via vibe coding: - Math tutor: Prompted by “Visualize Euler's theorem in geometry. Explain vertices, edges, and facets concepts with highlighting using different examples.” Gemini smartly chooses a tetrahedron, a cube, and an octahedron as three examples, visualizes them in XR, and allows users to pinch on different highlighting strategies.
- Physics lab: Prompted by “Create an interactive physics experiment: given different objects on each side of the scale, use different weights (with labels on them) to balance the scale.” XR users are able to pick and drop different weights to intuitively learn how a basic level-based scale works in the real world.
- Immersive chemistry: Prompted by “Create an interactive chemistry lab that users can pinch to ignite and observe three experiments: Ignite methane in air and place a dry, cold beaker over the flame: the flame is pale blue, and liquid droplets form on the inner wall of the beaker. Ignite ethylene in air: the flame is bright, black smoke is produced, and heat is released. Ignite acetylene in air: the flame is bright, thick smoke is produced, and heat is released.” Gemini designs educational cards and renders 3D volumetric visualizations for each experiment, facilitating a safe, interactive mixed-reality experience.
- Schrödinger's cat: Prompted by “An aesthetically pleasing depiction of Schrödinger's cat in XR. Finger pinch makes a cat (detailed 3D model) go into the box. Approaching the box within 50cm makes the box become two that move to the left and right and the box's front wall becomes transparent. You see both versions of the cat inside (dead and alive), demonstrating the quantum state. When you pinch again, one of the states becomes reality. The box opens and you see it either alive or dead. With another pinch you can start again.” Gemini explains quantum state demonstration where users pinch to guide a 3D cat into a box. Approaching it splits the box to reveal both the alive and dead states simultaneously, while another pinch collapses the superposition into a single reality.
- XR sports: Prompted by “Let me play volleyball with hands and collide with my environment. Volleyballs are textured and launched from a red ring slowly and easier to bounce with the hand.” Gemini creates a textured ball with which to play that reacts to both hands and the physical environment.
- XR dino: Prompted by “Create the Chrome Dino game in XR. Dino is voxelized in front of the user, with every cactus rushing towards the user on a semi transparent lane. Add audio.” Gemini creates the XR version of the classic Chrome Dino game, significantly reducing the prototyping time from hours to minutes.
We prompt with more specific context, such as using NASA Exoplanet Data, procedural generation, or creating high-resolution textures in the XR Blocks Gem, and demonstrate iterative refinement in the Vibe Coding XR process:
Preliminary technical evaluation
Evaluating XR applications has always been a challenge, largely because it typically requires hands-on, on-device testing and subjective human evaluation. To test the effectiveness of our Vibe Coding XR pipeline, we built a preliminary dataset of prompts to create XR apps: VCXR60.
Sourced from four one-hour internal workshops, VCXR60 consists of 60 unique prompts provided by 20 Googler participants. Using this dataset, we measured both inference time and the one-shot success rate, specifically looking for zero-error executions within the XR Blocks simulated reality environment. For example, a simple prompt, “Create a beautiful dandelion that blows away when I pick it up,” will likely finish in under 20 seconds in Gemini Flash, but has a higher chance of runtime errors compared to Gemini Pro, because handling animation and hands interaction requires more tokens during the thought process.
Early on, we found that the majority of initial errors stemmed from bugs within XR Blocks itself or from hallucination of non-existing or deprecated APIs, yielding an approximate 70% success rate. These insights fueled a rapid six-month iteration cycle. Today, after 11 major releases, we are excited to share the preliminary evaluation of XR Blocks Gem v0.11.0 on the VCXR60 dataset as a baseline reference.
Our top takeaway for developers: when diving into advanced XR prototyping, utilizing “Pro Mode” yields the most reliable results.
Conclusion
Vibe Coding XR marks a pivotal step toward a future where spatial computing is limited not by technical expertise, but by creativity. By coupling the reasoning capabilities of LLMs with the high-level abstractions of XR Blocks, we bridge the gap between a fleeting thought and a tangible, physics-aware reality.
Our team is continuously working towards the XR Blocks framework, benchmarking, and spatial intelligence. We invite the HCI (human–computer interaction), AI, and XR communities to contribute to this XR Blocks ecosystem on Android XR. You can access the open-source framework and try the live demo in the quick links or come to visit our demo at ACM CHI 2026.
Acknowledgements
This work is a collaboration across multiple teams at Google. Key contributors to this project include Ruofei Du, Benjamin Hersh, David Li, Xun Qian, Nels Numan, Zhongyi Zhou, Yanhe Chen, Xingyue Chen, Jiahao Ren, Robert Timothy Bettridge, Faraz Faruqi, Xiang 'Anthony' Chen, Steve Toh, and David Kim. The following researchers and engineers contributed to the XR Blocks framework: David Li and Ruofei Du (equal primary contributions), Nels Numan, Xun Qian, Yanhe Chen, and Zhongyi Zhou, (equal secondary contributions, sorted alphabetically), as well as Evgenii Alekseev, Geonsun Lee, Alex Cooper, Brandon Jones, Min Xia, Scott Chung, Jeremy Nelson, Xiuxiu Yuan, Jolica Dias, Tim Bettridge, Benjamin Hersh, Michelle Huynh, Konrad Piascik, Ricardo Cabello, and David Kim. We further thank the Gemini Canvas and AI Studio teams for their support including, but not limited to: Tim Bettridge, Yan Li, Daniel Marques, Deven Tokuno, Levent Yilmaz, Saravana Rathinam, Samuel Petit, Mike Taylor-Cai, Ammaar Reshi, and Robert Berry, We would like to thank Mahdi Tayarani, Max Dzitsiuk, Jim Ratcliffe, Patrick Hackett, Seeyam Qiu, Coco Fatus, Alon Hetzroni, Aaron Kim, Yinghua Yang, Brian Collins, Eric Gonzalez, Nicolás Peña Moreno, Yidang Zhang, Jamie Pepper, Yuhao He, Yi-Fei Li, Ziyi Liu, Jing Jin for their feedback and discussion on our early-stage proposal and WebXR experiments. We appreciate Tim Herrmann and Andrew Helton’s thoughtful reviews. We thank Maryam Sanglaji, Max Spear, Adarsh Kowdle, and Guru Somadder, Shahram Izadi for the directional feedback and contribution.
文章标题:Vibe Coding XR:借助XR模块与Gemini,加速AI与XR融合的原型开发进程。
文章链接:https://news.qimuai.cn/?post=3651
本站文章均为原创,未经授权请勿用于任何商业用途