OpenAI 在其 API 中推出了全新的语音智能功能。

qimuai 发布于 2026-5-8 10:00 阅读：49 一手编译

内容来源：https://techcrunch.com/2026/05/07/openai-launches-new-voice-intelligence-features-in-its-api/

内容总结：

OpenAI推出新一代语音API：实时翻译、转录与对话能力全面升级

当地时间周四，OpenAI宣布其API正式上线多项全新语音智能功能，旨在帮助开发者打造能够与用户进行语音对话、实时转录及翻译的应用程序。

此次发布的重点包括：

GPT-Realtime-2语音模型：相较于前代版本（GPT-Realtime-1.5），该模型搭载了GPT-5级别的推理能力，能够处理用户更复杂的请求，实现高度逼真的语音交互模拟。
GPT-Realtime-Translate实时翻译功能：支持超过70种输入语言和13种输出语言，能够与用户保持对话节奏，提供无缝的实时翻译服务。
GPT-Realtime-Whisper转录能力：在交互过程中实时将语音转化为文字，实现“边说边记”的即时转录效果。

OpenAI表示：“我们推出的这些模型，将实时音频从简单的‘一问一答’升级为真正能‘干活’的语音界面——在对话过程中同步进行聆听、推理、翻译、转录并采取行动。”

应用场景与潜在风险

从企业视角看，这些工具在客户服务领域具有直接价值，同时也可广泛应用于教育、媒体、活动及创作者平台等场景。不过，OpenAI也坦言存在被滥用的可能，例如用于生成垃圾信息、欺诈或其他网络违规行为。为此，公司已内置安全护栏和触发机制，一旦检测到对话内容违反有害内容准则，系统将自动中断对话。

定价与获取方式

所有新语音模型均集成于OpenAI的Realtime API中。其中，Translate与Whisper按分钟计费，而GPT-Realtime-2按Token消耗计费。

中文翻译：

OpenAI周四表示，其API将新增多项语音智能功能，旨在帮助开发者构建能够与用户对话、进行语音转录及翻译的应用程序。
该公司新推出的GPT-Realtime-2是另一款语音模型，旨在打造能与用户进行自然对话的逼真语音模拟。然而，与前代产品（GPT-Realtime-1.5）不同，该模型基于GPT-5级别的推理能力构建，OpenAI称其专为处理用户更复杂的请求而设计。
OpenAI还同步发布了GPT-Realtime-Translate，正如其名，该功能旨在提供实时翻译服务，能够在对话中“同步跟随”用户的语速。该功能支持超过70种输入语言（即可理解的语言）和13种输出语言（即向说话者传递的语言）。
此外，OpenAI还推出了新的转录功能GPT-Realtime-Whisper，可为用户提供实时语音转文字能力，在交互发生时即时捕捉语音内容。
OpenAI表示：“我们此次推出的模型，共同将实时音频从简单的应答交互，升级为能真正执行任务的语音界面：在对话过程中完成聆听、推理、翻译、转录及执行操作。”
这些更新将惠及哪些用户？希望扩展客服能力的企业显然是明确目标群体。不过，OpenAI也指出，其新功能将助力教育、媒体、活动及创作者平台等多个领域。
尽管从企业角度来看这些工具看似实用，但其被滥用的可能性也不容忽视。OpenAI表示已设置防护措施，防止新功能被用于生成垃圾信息、欺诈或其他形式的网络滥用行为。该系统中嵌入了特定触发机制，OpenAI称：“一旦检测到对话违反我们的有害内容准则，即可立即终止交互。”
本周特惠：买一赠一，第二张半价
你的下一轮融资、下一位人才、下一次突破机遇——尽在TechCrunch Disrupt 2026。届时，超10000名创始人、投资者及科技领袖将齐聚一堂，参与为期三天、涵盖250多场实战研讨的高效社交活动，共塑市场定义级创新。5月8日前注册，可享携伴同行半价优惠。
本周特惠：买一赠一，第二张半价
你的下一轮融资、下一位人才、下一次突破机遇——尽在TechCrunch Disrupt 2026。届时，超10000名创始人、投资者及科技领袖将齐聚一堂，参与为期三天、涵盖250多场实战研讨的高效社交活动，共塑市场定义级创新。5月8日前注册，可享携伴同行半价优惠。
所有新款语音模型均已纳入OpenAI的Realtime API。Translate和Whisper按分钟计费，而GPT-Realtime-2则按Token消耗量计费。

英文来源：

OpenAI said Thursday that its API will now include a number of new voice intelligence features designed to help developers create apps that can talk, transcribe, and translate conversations with users.
The company’s new GPT‑Realtime‑2 is another voice model, built to create a realistic vocal simulation that can converse with users. However, unlike its predecessor (GPT-Realtime-1.5) this one is built with GPT‑5‑class reasoning that OpenAI says was created to deal with more complicated requests from users.
The company is also launching GPT‑Realtime‑Translate, which, just as it sounds, is designed to provide real-time translation services that “keep pace” with the user, conversationally. The feature includes more than 70 input languages (that is, the languages that it can comprehend) and 13 output languages (the languages it relays to the speaker).
Finally, the company has also launched a new transcription capability, GPT-Realtime-Whisper, which gives users live speech-to-text capabilities that are captured as interactions occur.
“Together, the models we are launching move real-time audio from simple call-and-response toward voice interfaces that can actually do work: listen, reason, translate, transcribe, and take action as a conversation unfolds,” the company said.
Who will these updates be good for? Companies that want to expand customer service capabilities are an obvious target. However, OpenAI also notes that its new features will assist with a wide array of areas, including education, media, events, and creator platforms, among others.
As useful as these tools seem from an enterprise perspective, it also seems plausible that they could be misused. The company said it has built guardrails to stop its new features from being abused to create spam, fraud, or other forms of online abuse. Certain triggers have been embedded in the system so that “conversations can be halted if they are detected as violating our harmful content guidelines,” OpenAI said.
This Week Only: Buy one pass, get the second at 50% off
Your next round. Your next hire. Your next breakout opportunity. Find it at TechCrunch Disrupt 2026, where 10,000+ founders, investors, and tech leaders gather for three days of 250+ tactical sessions, powerful introductions, and market-defining innovation. Register before May 8 to bring a +1 at half the cost.
This Week Only: Buy one pass, get the second at 50% off
Your next round. Your next hire. Your next breakout opportunity. Find it at TechCrunch Disrupt 2026, where 10,000+ founders, investors, and tech leaders gather for three days of 250+ tactical sessions, powerful introductions, and market-defining innovation. Register before May 8 to bring a +1 at half the cost.
All of the new voice models are included in OpenAI’s Realtime API. Translate and Whisper are billed by the minute, while GPT-Realtime-2 is billed by token consumption.

TechCrunchAI大撞车

文章目录

📚 推荐阅读

扫描二维码，在手机上阅读