OpenAI升级ChatGPT图像生成模型。

内容来源:https://www.wired.com/story/openai-beefs-up-chatgpts-image-generation-model/
内容总结:
当地时间周二,人工智能公司OpenAI发布了新一代图像生成模型ChatGPT Images 2.0。该模型可根据单一提示词生成多张关联图像,例如完整的学习手册,并能输出包括中文、印地语在内的多语种文字。该版本已面向全球ChatGPT及Codex用户开放,付费订阅者可体验功能更强的进阶版本。
此次升级的核心突破在于模型可调用ChatGPT的“推理”能力,实时检索网络最新信息,并实现单次提示生成多图联动。其知识截止日期更新至2025年12月,使输出内容更具时效性与细节层次。测试中,模型成功生成了包含精确气象数据与地标建筑的旧金山天气预报信息图,展现出对复杂场景的解析能力。
在图像定制方面,新模型支持从3:1超宽幅到1:3纵幅的多种比例调整,用户可通过文字指令直接控制画面构图。值得注意的是,其英文文本渲染能力显著提升,解决了以往AI图像中文字畸变、字符错位等问题。不过在多语言文本生成测试中,中文粉丝主题海报虽视觉元素丰富,却出现大量“伪中文”及字符混杂现象。模型在自我评估时坦言,部分文本属“模仿东亚粉丝风格的无意义字符组合”。
业界观察指出,每当头部AI企业发布图像模型更新,常会引发社交媒体创作热潮。去年谷歌Nano Banana模型推动用户生成超写实人像,今年初ChatGPT Images的漫画风潮均属典型案例。尽管当前多语言文本生成仍存局限,但随着全球用户数据积累与技术迭代,未来版本有望实现更精准的跨文化内容创作。
中文翻译:
本周二,OpenAI推出了全新的图像生成AI模型"ChatGPT图像生成2.0"。该模型可根据单条指令生成多幅图像(例如完整的学习手册),并能输出包括中文、印地语等非英语语种在内的文字内容。该版本已面向全球ChatGPT和Codex用户开放,付费订阅者还可使用功能更强大的版本。
每当大型AI公司发布新图像模型时,总能重新激发用户兴趣并提升使用率——尤其当社交媒体用户掀起可模因化的潮流时,比如生成个人形象转换图。去年谷歌推出纳米香蕉模型时曾引发轰动,特别是当用户开始在网上分享超写实个人玩偶照片。今年早些时候,ChatGPT图像生成功能也因用户分享AI生成的漫画肖像而在社交媒体掀起波澜。
新版本有何不同?
由于新模型能调用ChatGPT的"推理"能力,图像生成2.0可实时检索网络最新信息,并能一次性生成多幅图像。本质上,该模型可通过多步骤处理,从单条指令输出更完整的生成结果。其知识截止日期也更近——2025年12月。
这意味着新模型的输出更具细节层次。例如,我生成了一张包含旧金山次日天气预报及推荐活动的信息图。ChatGPT生成的图像不仅准确呈现了雨天的天气细节,还绘制出渡轮大厦、卡斯楚剧院、彩绘女士屋和泛美金字塔等建筑的逼真插图。
此外,图像生成2.0为需要特殊画幅比例的用户提供了更高定制性。新模型支持从3:1超宽幅到1:3超高幅的图像生成,用户可在指令中直接调整输出尺寸。
初体验
经过数小时测试,新模型的文字渲染能力(至少英文方面)令人印象深刻。不久之前,主流模型生成的含文字图像还常出现大量畸形字符或多余字母。两年前ChatGPT在图像标注方面尚显吃力,因此图像生成2.0更清晰复杂的输出标志着持续进步。谷歌在近期纳米香蕉模型迭代中也着重改进了含文字图像的生成质量。
测试多语言生成能力时,我让ChatGPT创建蒂莫西·柴勒梅德主题的拼贴海报,要求模仿其中文粉丝群体的创作风格。生成结果包含这位影星的多张超写实图像,有的身着传统服饰,有的被添加了猫耳装饰。这张AI拼贴海报充满极繁主义细节:超过20处文字片段,以及饺子、珍珠奶茶和大熊猫的图像。
由于不熟悉该语言,我通过基础指令"这些文字是什么意思?"请求翻译。ChatGPT的回应包含对自身输出的批判:"大量文字属于伪造或半通顺的AI生成内容,仅模仿中文网络迷因海报风格,无法完全准确翻译。"在列举看似正确与可疑的内容后它补充道:"部分内容明显畸形或混杂日式字符,如右侧的清单卡片和装饰线条。这些大多是为模仿东亚粉丝编辑文本而生成的无意义符号,并非准确语句。"
由此可见,尽管新版ChatGPT图像模型在英文文本生成测试中表现良好,但全球用户使用母语生成时能否获得相同效果尚不确定。不过随着OpenAI在改进英文AI图像输出方面取得长足进步,未来通过汇聚全球用户数据推动模型迭代升级,或许会带来更多惊喜。
英文来源:
OpenAI launched a new image generation AI model on Tuesday, dubbed ChatGPT Images 2.0. This model can generate more than one image from a single prompt, like an entire study booklet, as well as output text, including in non-English languages like Chinese and Hindi. This release is available globally for ChatGPT and Codex users, with a more powerful version available for paying subscribers.
When any major AI company releases a new image model, it can revive interest and boost usage, especially if social media users adopt a meme-able trend, transforming images of themselves. Last year, Google's launch of the Nano Banana model was a major moment for the company, especially when users started posting hyperrealistic figurines of themselves online. Earlier this year, ChatGPT Images made waves on social media as users shared AI-generated caricatures.
What’s Different?
Since the new model can tap into ChatGPT’s “reasoning” capabilities, Images 2.0 can search the internet for recent information and generate more than one image at a time. In essence, the bot can use additional steps to output more thorough generations from a single prompt. Images 2.0 also has a more recent knowledge cutoff date: December 2025.
This also means that outputs from the new model are more granular. For example, I generated an infographic with San Francisco’s weather forecast for the next day, as well as activities worth doing. The image ChatGPT generated included accurate weather details for the rainy day, along with accurate-looking drawings of the Ferry Building, Castro Theater, Painted Ladies houses, and Transamerica Pyramid.
Additionally, Images 2.0 is more customizable for users who want unique aspect ratios for image outputs. The new model can generate images ranging from 3:1 wide to 1:3 tall, and users can adjust the image’s size as part of their prompt to the AI tool.
First Impressions
After a few hours of generating images with the new model, I was generally impressed with the text rendering capabilities, in English at least. Not that long ago, image outputs featuring text, from any of the major models, often included numerous malformed characters or words with errant extra letters. ChatGPT struggled to label images accurately two years prior, so the cleaner, more complex outputs from Images 2.0 are a sign of continued improvement. Google has also focused on improving image outputs featuring text in its recent iterations of Nano Banana.
Testing the outputs in different languages from the new model, I asked ChatGPT to generate a Timothée Chalamet–themed collage poster, as if it were crafted by someone from his Chinese fan base. The output featured an assortment of photorealistic-looking images of the movie star, some showing him dressed in traditional clothes or with cat ears drawn on. The AI collage was maximalist in its details, with over 20 different snippets of text, as well as images of a dumpling, a cup of boba, and a panda.
I don’t speak the language, so I nudged the bot for a translation, with a basic prompt: “What does that text say?” ChatGPT’s response was critical of its own output.
“A lot of it is fake, or semi-gibberish AI text dressed up to look like Chinese meme-poster writing, so it does not all cleanly translate,” read its output, in part, before ChatGPT went through a list of what looked accurate and what looked off. “There are also a few bits that are clearly malformed or mixed with Japanese-looking characters, like the checklist card and some decorative lines on the right. Those are mostly nonsense made to resemble East Asian fan-edit text rather than accurate sentences.”
So, while the new ChatGPT Images model performed well in my initial tests when generating text in English, I’m unsure whether users around the globe will have similar results when generating in their own languages. Though with OpenAI’s strides in improving its English outputs for AI images, I wouldn’t be surprised if more users’ data from around the world combined would lead to additional improvements in future iterations of this model.
文章标题:OpenAI升级ChatGPT图像生成模型。
文章链接:https://news.qimuai.cn/?post=3873
本站文章均为原创,未经授权请勿用于任何商业用途