为什么谷歌的AI拼不出“谷歌”（或其他任何词）

qimuai 发布于 2026-5-28 10:00 阅读：9 一手编译

内容来源：https://techcrunch.com/2026/05/27/why-googles-ai-cant-spell-google-or-anything-else/

内容总结：

谷歌人工智能搜索功能再现低级拼写错误。据用户反馈，当询问“Google中有几个字母P”时，该公司的AI Overview功能回答“两个”；当被问及“单词‘poop’中有几个字母R”时，AI给出的答案是“恰好一个”，但它将“journalism”一词中字母D的数量说成两个，却拼成了“j-o-u-r-n-a-d-i-s-m”。更令人啼笑皆非的是，AI虽正确识别出美国总统姓氏中有一个字母P，却将该姓氏拼写为“t-r-p-u-m”。

这并非谷歌首次在AI搜索整合中翻车。此前，该功能曾引用讽刺网站《洋葱新闻》和Reddit的恶搞内容，建议用户“吃石头”和“在披萨上涂胶水”。如今，随着谷歌将生成式AI全面植入其拥有29年历史的核心搜索产品，这类低级错误再次引发关注。

谷歌在发给科技媒体TechCrunch的声明中承认：“单词字母计数是大型语言模型（LLM）的一个已知难题，我们正在努力修复这一问题。”事实上，这些看似简单的拼写错误背后，是AI底层架构的固有缺陷。LLM基于“Transformer”模型运行，其工作原理并非像人类一样逐字阅读，而是将文本拆解为“令牌”（tokens）——这些令牌可以是完整单词、音节甚至字母，再将其转化为数字编码进行逻辑推理。阿尔伯塔大学AI研究员马修·古兹迪亚尔解释：“当模型看到单词‘the’时，它只有一个关于该词的整体编码，而不知道组成它的字母‘T’、‘H’、‘E’分别是什么。”

东北大学博士生谢里丹·福伊希特指出，由于语言模型对“单词”的界定本就存在模糊性，理论上不存在完美的分词器。研究人员坦言，要彻底解决这类拼写难题相当困难。不过，专家同时强调，虽然这些显而易见的失败提醒我们AI并非全知全能，但LLM的核心价值本就不在于拼写能力。用户仍需对AI输出结果保持审慎，不能盲目信任。

中文翻译：

谷歌里有多少个字母P？据谷歌自己说，有两个。
谷歌的AI概览还表示，“poop这个词里正好有1个字母r”，而journalism这个词里有两个字母d，却把它拼成了j-o-u-r-n-a-d-i-s-m。谷歌至少确认了美国总统的姓氏里有一个字母P，但又把它拼成了t-r-p-u-m。
你不需要未卜先知也能猜到，谷歌以AI为核心的搜索改版注定不会顺利。这种事我们以前就见过了。谷歌第一次在搜索中加入AI概览时，这项功能引用了《洋葱报》和Reddit上的讽刺帖子，建议人们吃石头和在披萨上涂胶水。
这一次，谷歌加倍投入，决心让生成式AI成为其29年旗舰产品的核心，出现纰漏并不令人意外。
“单词内的字母计数一直是大型语言模型的已知难题，我们正在努力解决这个问题，”谷歌在一封电子邮件声明中告诉TechCrunch。
这些基本的拼写错误可能看起来似曾相识。为聊天机器人和其他文本生成器提供动力的人工智能——大型语言模型，并非为理解拼写而设计。多年来一直有个笑谈：每当一家公司发布新AI模型时，你应该问它“strawberry”这个词里有多少个字母r。这些AI模型——能在几秒内编写应用程序，或解决困扰数学家数十年的问题——在拼写方面却和幼儿园孩子差不多。
不过，谷歌AI概览的麻烦远不止愚蠢的拼写错误。谷歌上周已经修复了一个问题：搜索“disregard”一词会显示一个看起来像词典释义的结果，但释义部分却写着：“明白。如果你有新的提示或问题，请随时告诉我！”但这些拼写错误依然让人哭笑不得，因为它们极难消除。
正如研究人员之前在我们询问这些拼写难题时所解释的，AI并不将句子视为由单词和字母组成的语言单位。许多大型语言模型基于Transformer模型构建，该模型将文本分解为词元，这些词元可以是完整的单词、音节或字母，具体取决于模型。AI并非像人类一样“阅读”，而是将文本转换为其自身的数字表征，然后通过语境化处理帮助AI生成逻辑合理的回答。
“大型语言模型基于这种Transformer架构，它显然并不实际阅读文本。当你输入提示时，它会被转化为一种编码，”阿尔伯塔大学的AI研究员兼助理教授Matthew Guzdial告诉TechCrunch，“当它看到‘the’这个词时，它有对‘the’含义的一种编码，但它不知道‘T’、‘H’、‘E’这些字母。”
为谷歌AI概览等大型语言模型提供动力的基于词元的架构本身就存在局限性，研究人员对解决拼写问题并不乐观。
“对于语言模型来说，究竟什么才算一个‘词’这个问题很难绕过，即使我们让人类专家就完美的词元词汇表达成一致，模型可能仍会觉得有必要进一步‘分块’，”东北大学研究大型语言模型可解释性的博士生Sheridan Feucht告诉TechCrunch，“我的猜测是，由于这种模糊性，根本不存在完美的词元解析器。”
这未必是研究人员迫切关注的问题，因为大型语言模型的实用性并不在于其拼写能力。但这些明显的失败提醒我们，AI并不完美，尽管它有时可能看起来像超乎我们理解的全知力量。我们不能在不核实准确性的情况下盲目信任AI的输出。

英文来源：

How many Ps are in Google? According to Google, there are two.
There’s also is also “exactly 1 ‘r’ in the word ‘poop’,” Google’s AI Overview says, as well as two ‘d’s in the word journalism, yet spelled it: j-o-u-r-n-a-d-i-s-m. Google did at least identify that there is one P in the last name of the U.S. president, but spelled it as t-r-p-u-m.
You didn’t need to be a prophet to predict that Google’s AI-forward Search overhaul was going to go over poorly. We’ve done this before. The first time Google added AI Overviews to Search, the feature ended up citing satirical posts from The Onion and Reddit, advising people to eat rocks and put glue on their pizza.
This time around, as Google doubles down on its commitment to make generative AI the centerpiece of its 29-year-old flagship product, it’s not surprising to see it stumble.
“Counting within words has been a known challenge for LLMs, and we’re working to fix this particular issue,” Google told TechCrunch in an emailed statement.
These basic spelling errors may seem familiar. LLMs, the kind of artificial intelligence that powers chatbots and other text-generators, are not built to understand spelling. It’s been a running joke for years that whenever a company unveils a new AI model, you should ask it how many ‘r’s are in the word strawberry. These AI models — which can code an app in seconds, or solve problems that have stumped mathematicians for decades — are about as good as a kindergartener at spelling.
Google’s AI overview woes reach beyond silly spelling mistakes though. Google already patched an issue from last week in which searching the word “disregard” would yield what looked like a dictionary definition of the word, only the definition was shown as, “Understood. Let me know whenever you have a new prompt or question!” But these spelling errors have remained amusing because they’re so difficult to quash.
As researchers have previously explained when we’ve asked about these spelling conundrums, AI doesn’t perceive sentences as units of language made up of words and letters. Many LLMs are built on transformers models, which break down text into tokens, which can be full words, syllables, or letters, depending on the model. Instead of “reading” like a human would, the AI converts the text into numerical representations of itself, which are then contextualized to help the AI come up with a logical response.
“LLMs are based on this transformer architecture, which notably is not actually reading text. What happens when you input a prompt is that it’s translated into an encoding,” Matthew Guzdial, an AI researcher and assistant professor at the University of Alberta, told TechCrunch. “When it sees the word ‘the,’ it has this one encoding of what ‘the’ means, but it does not know about ‘T,’ ‘H,’ ‘E.’”
The token-based architecture that powers LLMs like Google’s AI overview is inherently limiting, and researchers haven’t been optimistic that they can solve the spelling problem.
“It’s kind of hard to get around the question of what exactly a ‘word’ should be for a language model, and even if we got human experts to agree on a perfect token vocabulary, models would probably still find it useful to ‘chunk’ things even further,” Sheridan Feucht, a PhD student studying large language model interpretability at Northeastern University, told TechCrunch. “My guess would be that there’s no such thing as a perfect tokenizer due to this kind of fuzziness.”
This isn’t necessarily an urgent problem on researchers’ minds, since the utility of LLMs doesn’t come in their capacity to spell. But these blatant failures help us remember that AI is not perfect, even if it may sometimes seem like an all-knowing power beyond our comprehension. We cannot blindly trust AI outputs without double-checking their accuracy.

TechCrunchAI大撞车

文章目录

📚 推荐阅读

扫描二维码，在手机上阅读