一家初创公司声称其突破了制约大型语言模型发展的瓶颈。

qimuai 发布于 2026-6-20 07:00 阅读：5 一手编译

内容来源：https://www.technologyreview.com/2026/06/19/1139313/a-startup-claims-it-broke-through-a-bottleneck-thats-holding-back-llms/

内容总结：

迈阿密AI创企声称破解大模型算力瓶颈，第三方测试结果引发行业关注

总部位于迈阿密的人工智能初创公司Subquadratic近日公布了其新型大语言模型“SubQ”的独立评估结果，试图为其此前声称的“突破性技术”提供证据。该公司的技术宣称解决了困扰大语言模型近十年的数学瓶颈——二次方算力膨胀问题。

据Subquadratic介绍，SubQ模型采用了区别于传统Transformer架构的“稀疏注意力”机制，能够大幅降低计算量。传统模型的密集注意力机制在处理文本时，需要将每个词元（token）与所有其他词元进行两两相乘计算，导致运算量随文本长度呈二次方增长。而稀疏注意力仅选择性地计算部分词元关系，从而显著提升效率。

该公司声称，SubQ模型在保持与谷歌DeepMind、OpenAI及Anthropic等顶级模型相当性能的前提下，单次可处理高达1200万词元的上下文窗口，是主流模型的12倍，能耗和成本却大幅降低。在由第三方机构Appen进行的标准化测试中，SubQ的速度比采用现有稀疏注意力技术的模型快56倍；在编程能力测试LiveCodeBench中得分89.7%，达到业界前沿水平；在长达600万至1200万词元的“大海捞针”检索测试中，准确率高达98%。

然而，该公司的说法并未完全打消业界的疑虑。批评者指出，Subquadratic在训练中复用了中国开源模型Qwen的权重，与“彻底重构大模型工作原理”的宣称存在出入。此外，模型目前仅向极少数用户开放，大规模实测证据不足。有AI研究员直言：“这要么是Transformer以来最大的突破，要么就是AI界的‘伊丽莎白·霍姆斯事件’。”

面对质疑，Subquadratic联合创始人亚历克斯·惠顿回应称，公司已吸取教训，未来将确保所有成果在发布前经过充分验证。联合创始人兼CEO贾斯汀·丹格尔则强调：“我们希望在效率领域开启一个新时代，相信几年内就不会有人再沿用Transformer架构。”尽管争议尚存，但已有超过500家企业客户排队等待试用该模型。

中文翻译：

一家初创公司声称突破了大语言模型的瓶颈
Subquadratic现已公布其新模型的更多细节，但仍有部分人持怀疑态度。

总部位于迈阿密的AI初创公司Subquadratic上个月以一项重磅声明走出隐秘模式。该公司宣称，它解决了困扰大语言模型近十年的数学瓶颈。

当时细节寥寥，许多人并不信服。但如今Subquadratic开始拿出实证，分享了其新技术独立评估的结果。这些结果表明，该公司的说法或许值得关注。

据Subquadratic称，他们开发了一种名为SubQ的新型大语言模型，比市场上任何其他模型都更快、更便宜，且能耗更低。该公司还声称，SubQ一次能处理的文本量是大多数其他模型的12倍，使其能够执行一系列数据密集型任务，例如分析数百份文档或整个代码库。

此外，Subquadratic表示，SubQ在做到这一点的同时，在编码等关键任务上的表现与谷歌DeepMind、OpenAI和Anthropic推出的最佳模型基本持平。

问题在于，该公司最初除了少数自行发布的测试分数外，几乎未提供任何证据来支持其说法。而且SubQ至今仍未广泛开放供人们自行试用。

因此，Subquadratic的说法遭遇质疑也就不足为奇了。人工智能工程师丹·麦克蒂尔在X平台上总结了众议：“SubQ要么是Transformer以来最重大的突破……要么就是AI界的Theranos。”

一个月后，该公司公布了更多关于其模型的信息，包括第三方公司Appen进行的额外独立测试结果。

“我们预料到会有合理的怀疑，”Subquadratic联合创始人兼首席技术官亚历克斯·惠登表示，“事后看来，如果在最初公告时就发布第三方基准测试，本可以避免许多质疑；正因如此，我们正花时间确保未来所有结果在公布前都得到充分验证。”

Subquadratic委托评估其他公司模型的Appen对SubQ进行测试。结果似乎证实了Subquadratic的许多说法。“这让我非常兴奋，它验证了他们的架构，”Appen生成式AI研究总监珍妮·西纳南-辛格表示。

“我当时想，‘哇，这可能会颠覆游戏规则’，因为模型通常受困于速度和低效，”她补充道，“但当你有如此惊人的结果时，自己说出来确实不太可信。”

SubQ不会全面取代现有的顶级模型，但在某些特定任务上，它可能以典型成本的一小部分带来巨大的速度提升。不过，Subquadratic坚称，从长远来看，其突破可能改变大语言模型的构建方式。“我们希望我们正在开启一个效率的新时代，”该公司联合创始人兼CEO贾斯汀·丹格尔表示，“我们认为几年内不会有人再基于Transformer进行开发。”

注意！

要理解Subquadratic的说法为何意义重大，我们先来探究大多数大语言模型的工作原理。大语言模型内部的关键机制是一种称为Transformer的神经网络，它运行着一种名为密集注意力的过程。如今的大语言模型通常将多个Transformer串联在一起。（大语言模型时代的基础论文——2017年由谷歌研究人员发表——标题正是《注意力就是一切》。）

密集注意力是这样运作的：当Transformer处理一段文本时，它首先用数字对每个词（或词的组成部分，称为token）进行编码。为了捕捉全文的含义，它接着将每个数字与该文本中的其他每个数字相乘。例如，一段一万词长的文本将触发近5000万次单独的乘法运算。计算量巨大，这也是大语言模型以耗电著称的主要原因。

“如果你想总结《了不起的盖茨比》，你必须同时关注第一个词和最后一个词，然后必须关注所有其他组合，”丹格尔说。

随着文本长度增加，计算次数呈爆炸式增长。因为每增加一个数字，都必须与之前的所有其他数字相乘。单词数量翻倍，计算次数大约翻两番，这种增长率被称为二次方扩展。

（你可以自己想象一下：画一个圆，在边缘标记一些点。每个点是一个token。然后在点对之间画线，代表这两个token的相乘。五个点的圆将有10条线穿过它。增加到10个点，你将有45条线；20个点，190条线，依此类推。）

大幅降低成本

Subquadratic的解决方案是摒弃Transformer的核心操作——密集注意力，转而采用所谓的稀疏注意力，这大大减少了所需的计算次数。稀疏注意力并非将每个token的数字与所有其他数字相乘，而是只选择部分数字进行相乘。其思路是，文本中词与词之间的关系并非全部重要。

“稀疏注意力认为并非所有关系都重要，因为它们确实不重要，”惠登说，“如果你在读书，你不会去比较第一个词和第二个词、第一个和第三个——那太疯狂了。”

这是一个简单的方法，Subquadratic并非首创。“几乎所有能想到的方法都有人尝试过，”曾在OpenAI工作过的独立AI研究员威尔·德普说，“并非不可能，但相当于跑一英里四分钟。”

以往选择哪些数字相乘、哪些忽略的技术，未能产生像密集注意力那样能捕捉文档含义的机制。

Subquadratic声称终于攻克了这个难题。它将SubQ定位为首个在性能上可与主流密集注意力模型相媲美的稀疏注意力大语言模型。

“历史上，大多数机制使用固定模式，比如总是比较第一个词和第五个词，”惠登说，“这非常局限。语言远比这复杂。因此，我们机制的独特之处之一在于，我们动态地选择哪些关系是重要的。”

该公司不愿透露SubQ具体如何选择要关注的词，但这种选择是即时计算的，并且会根据模型收到的每段文本而变化。“这大概就是我们的独门秘方所在，”惠登说。

测试，测试

其结果是，对于某些任务，SubQ运行起来可能比大多数其他模型更快、更便宜。Appen在几项标准测试中对SubQ进行了评估。在一项纯粹的速度测试中（该测试为模型理论上的运行速度设定基准，而非评估其实际能力），Appen发现SubQ比使用FlashAttention（一种先前的稀疏注意力技术）的模型快56倍。

在LiveCodeBench测试中（该测试评估模型在真实竞赛编程问题上的表现），SubQ得分89.7%，与其他顶尖编码模型处于同一水平。“这个模型在编码方面持续提供前沿水平的表现，”Appen的西纳南-辛格说。

Subquadratic关于成本的声明更难验证，因为SubQ尚未广泛可用。据丹格尔称，运行Anthropic的LLM Opus 4.6通过英伟达开发的RULER 128测试（评估模型从大数据集中检索信息的能力），成本为2600美元。而SubQ呢？“只花了我们八美元，”他说。

SubQ确实似乎能处理非常大的数据集。该模型的上下文窗口（大致相当于工作记忆）最长可达1200万个token。目前大多数顶级模型的上下文窗口长度为100万个token。在惠登为我演示的一个案例中，他要求SubQ执行一项需要推理400份文档中包含信息的任务。它在几秒钟内就做出了回应。当他给Perplexity（一个流行的大语言模型搜索引擎）同样的任务时，它未能加载全部400份文档。

Appen还进行了“大海捞针”测试，评估模型从大量数据中检索特定信息的能力。在报告中，Appen指出，在600万和1200万个token的上下文窗口下，SubQ得分98%，“在极少模型经历过测试的规模上，保持了近乎完美的长上下文检索能力。”

好得难以置信？

尽管分数很高，但基准测试也只能描绘出模型能力与局限的不完整画面。在非常特定的条件下进行测试，并不能替代在广泛真实任务中运行模型。

Subquadratic将SubQ定位为一款专为编码和搜索超大数据集量身定制的模型。该公司表示，已有数万名潜在用户注册了早期访问权限，其中包括500多家企业客户。但等候名单很长，该公司迄今只给了极少数人访问权限。Subquadratic对此回应称，他们是一家资源有限的新兴小公司，无法同时服务太多人。

在更多人拿到模型并亲自试用之前，一些质疑是合理的。一个挥之不去的问题是，Subquadratic重复使用了中国开源模型Qwen某个版本的权重（训练期间在模型中设置的、决定其行为方式的值）来引导SubQ，而非从头开始训练。这是模型制造商的常见做法，但这与Subquadratic声称已彻底颠覆大语言模型工作方式的说法相矛盾。

“他们可能确实构建了真实且有价值的东西，”德普说，“但公开证据尚不足以支撑他们更强烈的说法——即已经解决了二次方注意力瓶颈。”

与此同时，Subquadratic联合创始人惠登坚称，做出与众不同的东西是他唯一的选择。如果你想构建一个有竞争力的模型，就必须有新的想法，他说：“我们比OpenAI面临的挑战更大。”

深度探索

人工智能

当下AI的十件要紧事
《麻省理工科技评论》权威概述2026年AI领域的10项技术、新兴趋势、大胆创意和强大运动。

美国面向基督徒的新手机网络，旨在屏蔽色情与性别相关内容
该手机套餐将于下周在T-Mobile网络上推出，采取一种核弹级的方法来确保在线安全。

马斯克诉奥特曼第一周：马斯克称受骗，警告AI可能毁灭我们所有人，并承认xAI蒸馏了OpenAI的模型
马斯克保持冷静，而OpenAI的律师用尖锐的问题追问其起诉动机，令他难以招架。

DeepSeek新模型为何重要的三个原因
备受期待的V4更高效，且是中国芯片制造商的胜利。

保持联系
获取《麻省理工科技评论》最新动态
发现特别优惠、头条新闻、即将举行的活动等更多内容。

英文来源：

A startup claims it broke through a bottleneck that’s holding back LLMs
Subquadratic has now shared more details about its new model. But some are still skeptical.
Miami-based AI startup Subquadratic came out of stealth mode last month with a huge claim. It announced that it had solved a mathematical bottleneck that had been holding back large language models for almost a decade.
The details were thin, and many people were unconvinced. But Subquadratic has started to bring the receipts, sharing the results of an independent evaluation of its new tech. The results suggest that the company’s claims might be worth paying attention to.
According to Subquadratic, it has developed a new kind of LLM, called SubQ, that is faster and cheaper and uses a lot less energy than any other model on the market. The company also claims that SubQ is able to process up to 12 times as much text at once than most other models, allowing it to carry out a range of data-heavy tasks, such as analyzing hundreds of documents or entire code bases.
What’s more, Subquadratic says, SubQ does this while more or less matching the performance of the best models put out by Google DeepMind, OpenAI, and Anthropic on key tasks like coding.
The problem was that the company at first provided little evidence for its claims beyond a handful of self-published test scores. And it has yet to make SubQ widely available for people to try out themselves.
So it’s no surprise that Subquadratic’s claims were met with skepticism. Dan McAteer, an artificial intelligence engineer, captured the overall response on X: “SubQ is either the biggest breakthrough since the Transformer ... or it’s AI Theranos.”
A month on, the company has published more information about its model, including the results of additional independent tests run by third-party firm Appen.
“We expected healthy skepticism,” says Subquadratic cofounder and chief technology officer Alex Whedon. “In hindsight, releasing the third-party benchmarks alongside the initial announcement would have preempted much of the skepticism, which is why we’re taking the time to make sure any future results are fully verified before putting them out.”
Subquadratic asked Appen, which evaluates other companies’ models, to run its tests on SubQ. The results seem to back up a lot of Subquadratic’s claims. “That was really exciting to me, it validated their architecture,” says Jeanine Sinanan-Singh, Appen’s director of generative AI research.
“I was like, ‘Wow, this could be a game changer,’ because models struggle with speed and inefficiency,” she adds. “But when you have kind of shocking results, it’s really not as credible when you say it yourself.”
SubQ won’t replace existing top models across the board, but it could offer huge increases in speed at a fraction of the typical cost for certain tasks. Subquadratic insists that in the long run, though, its breakthrough could change how LLMs are built. “We hope we’re kicking off a new age of efficiency,” says Justin Dangel, the firm’s cofounder and CEO. “We don’t think anybody will be building on transformers in a few years.”
Attention!
To understand why Subquadratic’s claims are a big deal, let’s dig into how most LLMs work. The key mechanism inside an LLM is a type of neural network called a transformer, which runs a process known as dense attention. Today’s LLMs typically chain together multiple transformers. (The foundational paper of the LLM era, published by researchers at Google in 2017, was titled “Attention Is All You Need.”)
Dense attention works like this: When a transformer processes a chunk of text, it first encodes each word (or part of a word, known as a token) with a number. To capture the meaning of the full text, it then multiplies each of those numbers with every other number for that text. For example, a piece of text 10,000 words long would kick off almost 50 million individual multiplications. That’s a lot of computation and the main reason that LLMs are notorious power hogs.
“If you want to summarize The Great Gatsby, you have to look at the first word and the last word together, and then you have to look at every other combination,” says Dangel.
As the length of the text increases, the number of computations skyrockets. That’s because each additional number must be multiplied by all other previous numbers. Double the number of words, and you roughly quadruple the number of computations, a rate of increase known as a quadratic expansion.
(You can picture this yourself: Draw a circle and mark dots around its edge. Each dot is a token. Then draw lines between pairs of dots to represent the multiplication of those two tokens. A circle with five dots will have 10 lines crossing it. Make it 10 dots and you will have 45 lines, 20 dots and you will have 190 lines, and so on.)
Slashing costs
Subquadratic’s solution is to ditch dense attention, the core operation of a transformer, in favor of what’s known as sparse attention, which slashes the number of computations needed. Instead of multiplying the number assigned to each token by every other number, sparse attention selects just some of the numbers to multiply. The idea is that not all relationships between words in a piece of text matter.
“Sparse attention says not all of those relationships are important, because they’re not,” says Whedon. “If you’re reading a book, you’re not going to look at the first and second words, first and third—that’s insane.”
It’s a simple approach, and Subquadratic is not the first to try it. “Pretty much everything under the sun has been attempted,” says Will Depue, an independent AI researcher who previously worked at OpenAI. “It’s not impossible, but it’s akin to running a four-minute mile.”
Previous techniques for selecting which numbers to multiply and which to ignore have not produced a mechanism that can capture the meaning of a document as well as dense attention can.
Subquadratic claims to have cracked the problem at last. It pitches SubQ as the first sparse-attention LLM that rivals mainstream dense-attention models in performance.
“Historically, most mechanisms have used fixed patterns, like always comparing the first word to the fifth,” says Whedon. “That’s pretty limiting. Language is too sophisticated for that. And so, one of the things that makes our mechanism unique is that we dynamically select which ones are important.”
The firm won’t say exactly how SubQ chooses which words to focus on, but the selection is calculated on the fly and differs for each piece of text the model is given. “That’s kind of where the secret sauce is,” says Whedon.
Testing, testing
The upshot is that for certain tasks, SubQ may be faster and cheaper to run than most other models. Appen evaluated SubQ on a handful of standard tests. In a straight-up speed test, which sets a baseline for how fast a model can operate in theory rather than assess what a model can actually do, Appen found that SubQ was 56 times faster than models using FlashAttention, a previous sparse-attention technique.
On LiveCodeBench, a test that looks at how well models perform on competitive coding problems taken from real contests, SubQ scored 89.7%, putting it in the same ballpark as other top coding models. “This model continues to provide frontier-level performance in coding,” says Appen’s Sinanan-Singh.
Subquadratic's claims about cost are harder to verify because SubQ is not yet widely available. According to Dangel, it costs $2600 to run Anthropic's LLM Opus 4.6 through RULER 128, a test developed by Nvidia to assess a model's ability to retrieve information from large data sets. And SubQ? "It cost us eight dollars," he says.
SubQ does seem to be able to handle very large data sets. The model has a context window (roughly akin to a working memory) up to 12 million tokens long. Most top models today have context windows one million tokens long. In a demo that Whedon ran for me, he asked SubQ to perform a task that required it to reason about information contained in 400 documents. It responded in seconds. When he gave Perplexity—a popular LLM-powered search engine—the same task, it failed to load all 400 documents.
Appen also ran the needle-in-a-haystack test, which assesses how well a model can retrieve specific information buried in a large amount of data. In its report, Appen states that SubQ scored 98% with context windows six million and 12 million tokens long, “sustaining near-perfect long-context retrieval at scales few models are tested at.”
Too good to be true?
Despite the high scores, benchmarks paint an incomplete picture of what a model can and cannot do. Testing under very specific conditions is not a substitute for running a model on a wide range of real tasks.
Subquadratic is offering SubQ as a model tailored to coding and to searching very large data sets. It says that tens of thousands of potential users have already signed up for early access, including more than 500 enterprise customers. But there’s a long waitlist, and the firm has given very few people access so far. Subquadratic’s response is that it is a new, small company with limited resources and cannot serve too many people at once.
Until more people get their hands on the model and try it out for themselves, some skepticism is justified. One nagging issue is that Subquadratic reused the weights (values set within a model during training that determine how it will behave) from a version of the Chinese open-source model Qwen to bootstrap SubQ, rather than training it from scratch. That’s a common thing for model makers to do, but it cuts across Subquadratic’s claim that it has fully reinvented how LLMs work.
“They may have built something real and useful,” says Depue. “But the public evidence does not yet justify the stronger claim that they have solved the quadratic attention bottleneck.”
In the meantime, Subquadratic cofounder Whedon insists that making something different was his only option. If you want to build a competitive model, you have to have new ideas, he says: “We’re more up against it than OpenAI is.”
Deep Dive
Artificial intelligence
10 Things That Matter in AI Right Now
MIT Technology Review's authoritative overview of the 10 technologies, emerging trends, bold ideas, and powerful movements in AI in 2026.
A new US phone network for Christians aims to block porn and gender-related content
Launching next week on T-Mobile's network, the cell plan takes a nuclear approach to online safety.
Musk v. Altman week 1: Elon Musk says he was duped, warns AI could kill us all, and admits that xAI distills OpenAI’s models
Musk kept his cool, and OpenAI’s lawyer bulldozed him with piercing questions about his motivations for suing the company.
Three reasons why DeepSeek’s new model matters
The long-awaited V4 is more efficient and a win for Chinese chipmakers.
Stay connected
Get the latest updates from
MIT Technology Review
Discover special offers, top stories, upcoming events, and more.

MIT科技评论

文章目录

📚 推荐阅读

扫描二维码，在手机上阅读