关键在于角度:重新构图,赋予照片新生。

qimuai 发布于 阅读:18 一手编译

关键在于角度:重新构图,赋予照片新生。

内容来源:https://research.google/blog/its-all-about-the-angle-your-photos-re-composed/

内容总结:

【科技动态】谷歌相册推出“AI重构视角”功能:用生成式AI为照片“重拍”

2026年4月22日,谷歌DeepMind与平台设备团队联合发布了一项基于生成式人工智能的图像编辑技术,现已集成至谷歌相册的“自动构图”功能中。该技术可帮助用户突破原始拍摄视角的限制,通过调整虚拟机位与焦距,实现对已拍摄照片的视角重构。

传统裁剪、缩放工具无法改变照片固有的透视关系,而新技术将二维图像解析为三维场景,通过机器学习模型估算场景空间布局与原始相机参数,再结合生成式AI自动补全因视角变化而缺失的背景内容,最终生成符合新视角的合理图像。

技术流程分为两大阶段:首先通过专用3D点云模型重建人物面部与身体的立体结构,精准还原原始拍摄视角;随后利用潜在扩散模型对调整视角后产生的图像空白区域进行智能补绘,在保留原始画面内容的同时生成自然的新视角图像。该功能特别针对广角自拍产生的透视畸变问题,可自动校准人物比例,使成像更符合视觉习惯。

目前该功能已面向谷歌相册用户开放。当用户使用“自动构图”处理含有人物的照片时,系统将自动生成优化视角的二次构图方案,用户可通过一键操作获得重构后的照片。

此项工作由谷歌DeepMind与平台设备团队协作完成,核心技术贡献者包括Thiemo Alldieck、Marcos Seefelder、Pedro Velez等多位研究员与工程师。

中文翻译:

关键在于视角:重塑你的照片构图
2026年4月22日
Marcos Seefelder(平台与设备部门软件工程师)与Pedro Velez(Google DeepMind高级研究工程师)

我们推出了一种全新的图像编辑技术,现已应用于Google相册的"自动构图"功能。这项技术让用户能在拍摄完成后,从全新视角重新构思照片的构图。

快速导览
你是否曾翻看相册时,希望当初能用稍有不同的方式捕捉某个场景?或许你想让画面多展现人物的半侧脸庞,或是期待当时相机位置再低一些以获取完美构图。又或者那张笑容完美的自拍,因广角镜头效果而让面容略显陌生。这些"近乎完美"的照片往往成为我们的妥协之选——毕竟瞬间已逝,无法重拍。

虽然裁剪和缩放能稍作调整,但传统图像编辑工具无法解决根本问题:图像仍是从固定且不完美的视角呈现场景。放大不会改变视差,裁剪也无法展现画面之外的景象。

今天我们宣布推出突破性的场景重定位技术。该方法已集成至Google相册的"自动构图"功能,通过机器学习模型解析场景空间布局,并运用生成式AI构建全新视角的图像。与传统编辑方式不同,我们的技术将照片解读为三维场景——如同凝固的真实瞬间——并在此空间内自动调整虚拟相机位置。该方法完整保留原始可见内容,智能生成原本被遮挡的部分,从而构建出真实自然的全新场景视角。

全新成像原理
区别于其他生成式图像编辑方案,我们的技术包含两大核心阶段:(1)三维场景与相机参数估算;(2)生成式修补与润色。通过分离三维估算与图像生成环节,我们能在三维空间中精准操控场景,同步调整相机内外参数。此外,机器学习模型会自动解析场景内容并推荐最佳相机参数。

第一阶段采用专为人体与面部重建优化的内部三维点云估算模型,最大限度减少可能影响身份特征保真的重构伪影。该模型会为原始图像的每个像素估算代表可见表面片段的三维坐标点,并同步推算原始相机的焦距参数。

随后运用经典三维渲染技术,基于调整后的相机参数生成预估图像。值得注意的是,我们可同时调整相机位姿(位置与方向)与焦距,实现对成像过程的全面控制。

但仅渲染三维点云仍不完整:当虚拟相机围绕物体"移动"时,会暴露出原始镜头未曾捕捉的背景区域。本质上,点云仅是场景的不完整表征,从新视角渲染必然产生"空白区域"。为此我们采用生成式潜在扩散模型对渲染预估图进行补全修正。该模型通过内部相机参数已知的图像对数据集进行专项训练:在训练过程中,我们先估算首张图像的三维点云,再将其投射至第二张图像的相机视角,使模型学习从重渲染的首张图像重建第二张图像。在实际应用中,我们采用区域缩放分类器引导技术,在忠实保留原始内容的同时,赋予模型填补空白的创作自由度。

更优的视觉呈现
为实现全自动编辑,我们通过机器学习模型检测主体面部的三维位置与方向。结合三维点云数据,这些语义信息能帮助我们计算出理想构图所需的相机参数,这项特性在人像摄影中尤为实用。此外,广角前置摄像头拍摄的图像常因强烈透视畸变导致近镜头部位显得比例失调。我们的技术可自动检测此类畸变,通过调整虚拟相机内参恢复自然美观的比例,实现拍摄后"后退一步"的视觉效果。

现已登陆Google相册
这项全自动解决方案已作为"自动构图"功能的核心组件在Google相册上线。它通过三维感知图像编辑技术,智能处理含有人物的合格照片,为人像作品带来无缝优化体验。用户可在"自动构图"备选方案中,选择第二版经过相机视角自动调整的重构图版本,一键即可提升照片质感。

致谢名单
本功能由Google DeepMind与Google平台设备团队协同研发。核心贡献者包括:Thiemo Alldieck、Marcos Seefelder、Hannah Woods、Pedro Velez、Michael Milne、Bert Le、Navin Sarma、Jasmin Repenning与Selena Shang。项目顾问团队:Steven Hickson、Claudio Martella、Irfan Essa及Alex Rav Acha。特别鸣谢:Mike Krainin、Jan Stria、Neal Wadhwa、Amit Raj、Mauro Rego、Kita Boice、Dennis Shtatnov、Yuan Qi、Julian Iseringhausen、Peter Zhizhin、Jiaping Zhao、Andre Araujo、Jana Ehmann、Keng-Sheng Lin、Isalo Montacute、Brandon Ruffin、Reginald Ballesteros及Andy Radin。

英文来源:

It's all about the angle: Your photos, re-composed
April 22, 2026
Marcos Seefelder, Staff Software Engineer, Platforms & Devices, and Pedro Velez, Senior Research Engineer, Google Deepmind
We introduced a new approach for editing images, now live in the Auto frame feature in Google Photos, allowing users to re-imagine photos from a new perspective after they have been taken.
Quick links
Have you ever looked back at your camera roll and wished you had captured a scene slightly differently? Maybe you wish you had caught a bit more of one side of a face, or positioned the camera slightly lower to get the perfect shot. Perhaps it’s a selfie with a perfect smile, but the wide-angle lens makes you look somewhat unfamiliar. Usually, these are the "almost perfect" shots we settle for, because the moment has passed, and it is not possible to retake the picture.
While cropping and zooming may help, classic image editing tools won’t fix the underlying problem: the image is still showing the scene from a fixed, imperfect perspective. Zooming in doesn't change the parallax, and cropping can't show you what was just outside the frame.
Today we are announcing a new approach to fix scene alignment after a photo was taken. Our method, now available as part of the Auto frame feature in Google Photos, uses machine learning (ML) models to understand the scene and its spatial layout and uses generative AI to imagine the photo from that new perspective. In contrast to classical photo editing, our method interprets a photo as a 3D scene — think of a real moment frozen in time — and change the camera position automatically within that space. To this end, our method keeps what was originally visible and intelligently generates previously hidden content, forming an authentic new perspective of the original scene.
A new perspective
In contrast to other generative image editing solutions, our method consists of two stages: (1) 3D scene and camera estimation and (2) generative inpainting and retouching. By decoupling 3D estimation from image formation, we can faithfully manipulate the scene in 3D and adjust both camera intrinsics and extrinsics. Further, we utilize ML models to understand scene contents and suggest new camera parameters automatically.
In the first step, we use an internal 3D point map estimation model specifically configured to faithfully reconstruct human bodies and faces to limit reconstruction artifacts that would potentially harm identity preservation. For every pixel of the original image, our model estimates a 3D point representing the visible surface patch, and additionally approximates the focal length of the original camera.
Next, we use classical 3D rendering to generate an estimate of the image as if captured with the altered camera parameters. Importantly, we can modify both the camera pose (position and orientation) and focal length, giving us full control over the image formation process.
However, rendering a 3D point map alone is insufficient: when you move a virtual camera "around" an object, you reveal parts of the background that were never captured by the original lens. Essentially, the point map is an incomplete representation of the scene and rendering it from a new perspective always results in "holes." To fill these areas, we use a generative latent diffusion model to complete and correct the rendered estimate. This model was trained specifically for this task using an internal dataset of image pairs with known camera parameters. During training, we estimate the 3D point map of one image and project it into the camera of the second image. The model then learns to reconstruct the second image from the re-rendered first image. At inference time, we employ classifier guidance with regional scaling to faithfully preserve original content, while allowing the model creative freedom to fill in the blanks.
A better point of view
To support fully automatic editing, we utilize ML models to detect the position and 3D orientation of the faces of the main subjects. Together with the 3D point map, this semantic information allows us to compute the camera parameters for the ideal framing. This is particularly useful for portraits. Additionally, images captured with wide-angle front cameras often suffer from strong perspective distortion which can make features closest to the lens appear unnaturally large. To this end, our method automatically detects these distortions and adjusts the virtual camera intrinsics to restore natural, flattering proportions, effectively "stepping back" from the subject after the fact.
Now available in Google Photos
This fully automatic solution is now live in Google Photos as part of Auto frame. It seamlessly enhances portraits by using our 3D-aware image editing tool to process eligible photos that contain people. Users can access the re-composed image, which has an automatically adjusted camera viewpoint, as the second rendition option within the Auto frame candidates, making it a single-action improvement to the photo.
Acknowledgments
This feature is the result of a collaboration between Google DeepMind and Google Platforms & Devices teams. Key contributors include: Thiemo Alldieck, Marcos Seefelder, Hannah Woods, Pedro Velez, Michael Milne, Bert Le, Navin Sarma, Jasmin Repenning, and Selena Shang. Advisors include: Steven Hickson, Claudio Martella, Irfan Essa, and Alex Rav Acha. Special thanks to: Mike Krainin, Jan Stria, Neal Wadhwa, Amit Raj, Mauro Rego, Kita Boice, Dennis Shtatnov, Yuan Qi, Julian Iseringhausen, Peter Zhizhin, Jiaping Zhao, Andre Araujo, Jana Ehmann, Keng-Sheng Lin, Isalo Montacute, Brandon Ruffin, Reginald Ballesteros, and Andy Radin.

谷歌研究进展

文章目录


    扫描二维码,在手机上阅读