Cohere发布面向边缘设备的开源语音模型。

qimuai 发布于 2026-3-27 11:01 阅读：107 一手编译

内容来源：https://aibusiness.com/language-models/cohere-transcribe-open-source-small-speech-model-edge-devices

内容总结：

近日，谷歌云支持的Cohere公司推出开源语音识别模型Cohere Transcribe，该模型具备20亿参数，支持包括中文、日语、波兰语、法语和希腊语在内的14种语言，并采用Apache 2.0开源许可。公司表示，该模型在Hugging Face开源自动语音识别榜单中表现优于ElevenLabs Scribe、Qwen3等同类产品，未来将集成至其AI智能体编排平台North。

当前，企业正积极将自动语音识别技术嵌入各类应用。与传统基于长短期记忆网络、循环神经网络及早期Transformer架构的语音模型相比，新型模型如Cohere Transcribe体积更小、延迟更低，更适合部署在边缘设备。随着技术及基础设施的成熟，语音识别在客服、银行、销售与营销等场景的应用不断拓展，IBM、阿里巴巴等厂商也相继推出相关模型。视频会议服务商Zoom亦于2025年推出具备实时语音翻译功能的AI Companion 3.0，进一步丰富了跨语言交互体验。

市场研究机构Omdia分析师苏连杰指出，语音交互始终是人工智能的基础领域，开源策略有助于吸引开发者测试并反馈，从而优化模型以实现商业化。Meta此前凭借该模式取得成功，并带动阿里巴巴、英伟达等企业效仿。Cohere虽长期专注于文本生成，但在语音识别领域同样存在发展机遇——不少企业正寻求将传统的Transformer语音模型升级为更轻量、适用于边缘设备的小型自动语音识别模型，而Cohere可凭借其在语音转文本领域的专长把握这一趋势。

中文翻译：

由谷歌云赞助
选择您的首批生成式AI应用场景
要着手应用生成式AI，首先应关注能够提升人类信息交互体验的领域。

Cohere Transcribe是一款拥有20亿参数的开源语音识别模型，专为边缘部署设计。该公司正试图通过这款20亿参数的开源语音模型，把握企业将自动语音识别嵌入应用程序的行业趋势。

本周四发布的Cohere Transcribe支持包括中文、日语、波兰语、法语和希腊语在内的14种语言训练。该模型采用Apache 2.0许可证发布，在Hugging Face开放语音识别排行榜上的表现超越ElevenLabs Scribe和Qwen3等竞品。据该公司透露，该模型即将集成至Cohere的AI智能体编排平台North。

Cohere Transcribe展现了语音识别模型的演进历程。早期的语音模型采用长短期记忆网络、循环神经网络等深度学习技术，后期虽转向基于Transformer的架构，却因模型体量过大难以实现低延迟。而Transcribe这类新型模型体积小巧，足以部署在边缘设备上。随着技术、基础设施和功能日趋成熟，自动语音识别的应用场景已拓展至客服、银行、销售与营销等领域，这也促使IBM、阿里巴巴等厂商纷纷推出相关模型。

就连视频会议公司Zoom也加入了竞争。2025年，该平台提供商推出具备实时语音翻译功能的AI Companion 3.0，随后又新增了让参会者以母语聆听交流内容的功能。

Informa TechTarget旗下研究机构Omdia分析师苏连杰指出："语音将始终是AI技术的基石。整个人工智能浪潮正是始于人类与Siri的交互能力。"他认为Cohere Transcribe的两个特点值得关注：其一是模型体积小，其二是公司决定将其开源。

"开源模式能吸引开发者进行测试，当效果达到预期时他们会主动反馈，"苏连杰表示，"这显然有助于打造更优质的商业化模型。"Meta已通过这种商业模式取得成功，并影响了阿里巴巴、英伟达等企业的效仿。"Cohere正在尝试复制这种模式，"他补充道，"但该公司聚焦于自身优势领域——语音识别与语音转文本模型。"

苏连杰进一步分析，虽然Cohere传统上专注于文本生成，但在语音识别领域仍存在机遇。特别是当部分企业希望将传统的Transformer语音模型，升级为能在边缘设备运行的小型自动语音识别模型时，这为Cohere提供了发展空间。

英文来源：

Sponsored by Google Cloud
Choosing Your First Generative AI Use Cases
To get started with generative AI, first focus on areas that can improve human experiences with information.
Cohere Transcribe is an open source speech recognition model with 2 billion parameters that’s designed to be deployed at the edge.
Cohere is looking to capitalize on an enterprise trend of embedding automatic speech recognition into applications with a 2 billion parameter open source speech model.
Cohere Transcribe, introduced on Thursday, is trained on 14 languages, including Chinese, Japanese, Polish, French and Greek. Cohere released the model under the Apache 2.0 license and said the model outperforms alternatives on the Hugging Face Open ASR Leaderboard, including ElevenLabs Scribe and Qwen3. The model will soon be integrated into Cohere's AI agent orchestration platform, North, according to the company.
Cohere Transcribe is an example of the evolution of speech recognition models. Previously, speech models were designed using deep learning techniques such as long short-term memory, recurrent neural networks, and later, transformer-based architectures, which struggled to achieve low latency because of model size.
New models such as Transcribe, however, are small enough to be deployed on edge devices. As the technology, infrastructure and capabilities have matured, ASR use cases have expanded, especially in customer service, banking, sales and marketing, which has led to an increase in ASR models from vendors such as IBM and Alibaba.
Even video conferencing company Zoom has joined in the competition. In 2025, the video conferencing platform provider introduced AI Companion 3.0, which included real-time voice translation capability. It later introduced a separate feature that allowed participants to hear exchanges in their own language.
"Speech is always going to be fundamental to AI," said Lian Jye Su, an analyst at Omdia, a division of Informa TechTarget. "That's how the whole AI movement started — because humans started to be able to interact with Siri."
He pointed to a couple of Cohere Transcribe's features as being noteworthy, including its small size and the company’s decision to make the model open source.
"When it's open source, you get developers to test it and then they will come back to you if they find the result to be good enough," Su said. "Then you can obviously commercialize a much better model." Meta has found success with this business model, influencing others such as Alibaba and Nvidia to follow suit.
"Cohere is trying to copy that," Su said. But the company is focused on an area where it excels -- speech recognition and speech-to-text model, he added.
While Cohere has traditionally focused on text generation, it could find an opportunity within speech recognition, especially as some enterprises look to upgrade traditional speech models that use transformers to the growing line of small ASR models that can be used on edge devices, Su continued.

商业视角看AI

文章目录

📚 推荐阅读

扫描二维码，在手机上阅读