使用线性弹性缓存优化云经济

内容来源:https://research.google/blog/optimizing-cloud-economics-with-linear-elastic-caching/
内容总结:
谷歌发布线性弹性缓存技术:动态调整内存大小,云服务成本最高降60%
2026年6月25日,谷歌云杰出工程师Todd Lipcon与研究科学家Manish Purohit联合宣布,其团队在顶级数据库会议CIDR上发表了一项名为“线性弹性缓存”的新型缓存管理技术。该技术通过将页面淘汰问题转化为“滑雪租赁问题”,并借助轻量级机器学习模型,在内存占用与缓存缺失之间实现最优权衡,从而大幅降低云服务总拥有成本。
痛点:传统固定缓存面临“黄金分割”困境
在现代高性能数据库与云服务中,内存缓存是加速数据访问的关键。然而,高速内存价格昂贵——部分无服务器云提供商每1GiB内存每日收费高达3美元。传统缓存管理采用固定资源分配模式,工程师需预先设定内存大小,并依赖LRU(最近最少使用)等淘汰策略。这种做法常陷入两难:缓存过小导致性能骤降,过大则浪费数千美元闲置内存成本。
创新:将缓存变“租用”,动态决策降低TCO
研究团队提出,将内存视为一种“线性成本”的租赁服务:其成本与数据占用空间及缓存时长成正比。每个数据页面临类似“滑雪租赁”的选择——要么以较小日费“租用”(保留在缓存),要么一次性“购买”(将数据写入更昂贵的持久化存储)。系统通过滑雪租赁算法,自动为每个页面计算最优“生存时间”(TTL),TTL到期且未被再次访问的页面自动淘汰。当物理缓存满时,再辅以LRU等传统策略进行空间管理。
实战验证:Spanner数据库节省数百万成本
该技术已在谷歌全球分布式数据库Spanner的生产服务器上运行数月。对比固定大小缓存方案,结果显示:
- 内存使用量减少高达60%
- 缓存缺失率仅增加2%-4%
- 实际I/O成本影响仅0.5%(因算法仅淘汰从存储层重取成本低廉的数据)
团队还利用公开缓存追踪数据进行了测试。相比业界常用的GDSF(贪心双尺寸频率)淘汰算法,弹性缓存策略在各种工作负载下均实现更低总成本,且内存越贵、缓存缺失成本越高,效果越显著。
核心技术:轻量级决策树实现毫秒级预测
考虑Spanner每秒处理数十亿次请求,团队采用仅需几行C++代码即可实现的浅层决策树模型。该模型综合考量数据大小、缓存缺失成本、操作类型等特征,为每个页面预测最优TTL,既保证了极低计算开销,又具备良好的可解释性。
行业意义:云基础设施从“静态预留”转向“动态成本感知”
研究团队表示,线性弹性缓存标志着云基础设施设计理念的转变。随着云环境提供越来越精细的按需定价,这种动态、成本感知的弹性策略将成为大规模服务优化全球部署的必备工具。“即便是最小的机器学习模型,应用于核心基础设施时也能产生巨大影响。”该成果与谷歌研究员Tamas Sarlos和Ravi Kumar共同完成,已在CIDR 2025上发表。
中文翻译:
2026年6月25日
谷歌云杰出工程师Todd Lipcon与谷歌研究院研究科学家Manish Purohit
线性弹性缓存通过将页面逐出问题转化为滑雪租赁问题,并借助轻量级机器学习优化内存占用与缓存未命中之间的权衡,从而将总缓存成本降至最低。
现代高性能数据库系统和云服务依赖内存缓存将频繁访问的数据保留在RAM中,以绕过缓慢的磁盘操作,实现用户期望的闪电般快速响应。但这种高性能是有代价的(字面意义上的):高速内存价格不菲,某些无服务器云提供商对仅1 GiB内存的收费高达每天3美元。
在过去,缓存管理被视为固定资源问题。在常规的固定大小缓存中,工程师会为缓存分配特定内存量,当空间用尽时,系统会使用最近最少使用(LRU)替换等逐出策略来决定保留哪些数据。这导致了一个经典的“金发姑娘”问题:缓存太小则性能骤降;为应对峰值需求设置过大,则会在空闲内存上浪费数千美元。
在发表于创新数据系统研究会议(CIDR)的一篇论文中,我们提出了线性弹性缓存,这是一种新方法,旨在通过根据实时工作负载动态调整缓存大小,将缓存管理的总拥有成本(TCO)降至最低。我们不将内存视为固定的预分配资源,而是将其视为一种效用,其成本与缓存数据的大小以及数据在缓存中保留的时长呈线性关系。通过将内存占用视为随时间累积的可变成本,我们证明可以在不牺牲系统性能的情况下显著降低成本。
为了解决动态缓存大小调整的挑战,让我们引入经典的滑雪租赁问题。想象你在进行一次未知长度的滑雪旅行。每天你都面临一个选择:以小额日费租用滑雪板,或是支付较高的前期费用购买它们,之后免费滑雪。如果你确切知道会滑多少天,选择会很简单。但若没有这一信息,你需要一种算法来最小化总支出。
类似地,在线性弹性缓存中,每条数据都面临类似的困境。当一条数据被访问时,系统必须在两个选项间做出决定:
同时,系统无法独立优化每一条数据,因为缓存有最大分配大小(想象滑雪场的一大群人,而该度假村提供的滑雪板数量有限)。我们的核心理论贡献在于证明,我们可以分别优化这两个因素——逐出策略和“租赁”时长。这种分离自然地导向了简洁实用的实现。我们可以使用滑雪租赁算法来确定页面的生存时间(TTL)(类比于租赁时长)。如果页面在TTL到期前未被再次访问,它将被自动逐出。但若缓存物理上已满,传统的逐出策略(如最近最少使用LRU)便会介入管理空间。
传统在线算法设计侧重于提供最坏情况下的性能保证。针对滑雪租赁问题,经典的“收支平衡”算法是持续租赁,直到累计租赁成本等于购买价格,然后购买滑雪板。虽然这种方法(及其随机化变体)提供了可靠的最坏情况保证,但生产工作负载大多可预测。在Spanner(我们的全球分布式数据库)等系统中,数据访问通常遵循可辨识的模式,这些模式可被利用来做出更好的租赁决策。
为确保我们的理论在现实世界中经得起考验,我们利用两大主要来源进行了广泛实验:
我们开发了一种实用算法,在每次页面请求时,根据页面的访问模式和成本为其分配生存时间(TTL)。由于Spanner每秒处理数十亿次请求,此TTL预测模型必须极其轻量。我们选择了一个浅层决策树,可转化为几行C++代码。生成的代码也易于解释,并能提供有关工作负载特征的宝贵见解。该模型考虑数据的体积、缓存未命中成本(当数据不在缓存中,系统需从其他较慢系统如磁盘检索时)以及所执行的数据库操作类型等特征,以预测每个页面的最优TTL。
我们在数月内将弹性缓存策略集成到了Spanner的生产服务器中。与标准固定大小缓存相比,结果显著:
关键的是,由于该算法具有“成本感知”能力,缓存未命中次数的少量增加主要集中在从存储中获取成本较低的数据上,这意味着对实际I/O成本的影响微乎其微,仅为0.5%。
我们还使用多个公开的缓存跟踪记录评估了弹性缓存方法。我们将贪婪双尺寸频率(GDSF)逐出算法(LRU策略的广义化,允许不同大小的页面)的优化实现作为固定缓存大小的基线策略。
根据使用的滑雪租赁算法以及是否使用机器学习模型,我们考虑了弹性缓存的四种变体。由于可用的公开跟踪记录不具备可用于训练的应用层特征,我们没有实现用于预测的决策树。相反,我们开发了一种简单的学习策略:将每条跟踪记录分为两半,前半部分用于训练。对于训练记录中的每个单独页面,我们计算在训练记录上使成本最小化的最佳TTL。
由于缓存的行为会因初始缓存内容而异,一种称为“预热”的常见做法是使用缓存跟踪记录的前缀来填充缓存,但不对其进行性能测量。我们用跟踪记录后半部分中相当于一天请求量的数据预热了所有缓存,其余部分用于测试和测量。在测试跟踪记录期间,如果遇到训练期间见过的页面,我们将TTL设置为该页面预先计算的最佳TTL。否则,我们使用收支平衡策略或随机化策略来设置TTL。
我们发现,在各种工作负载下,弹性方法始终优于固定大小缓存。随着内存成本相对于缓存未命中成本的增加,弹性缓存带来的节省变得更加显著。
如下图所示,弹性缓存策略通过动态调整缓存大小以适应工作负载,从而产生显著更低的总体成本。我们还观察到,在大小相当的情况下,与固定大小策略相比,弹性策略的缓存未命中率低得多。
线性弹性缓存代表了我们思考云基础设施方式的转变。通过从静态峰值负载配置转向动态、成本感知模型,我们可以构建既高性能又经济高效的系统。
我们在Spanner工作负载上对这些学习型滑雪租赁策略的评估表明,即使是小型轻量级机器学习模型,在应用于核心基础设施时也能产生巨大影响。随着云环境继续提供更精细、按需付费的资源定价模式,弹性策略对于任何寻求优化全球足迹的大规模服务来说都将变得不可或缺。
本工作是与谷歌的Tamas Sarlos和Ravi Kumar合作完成,并于2025年创新数据系统研究会议(CIDR)上发表。
英文来源:
June 25, 2026
Todd Lipcon, Distinguished Engineer, Google Cloud, and Manish Purohit, Research Scientist, Google Research
Linear elastic caching minimizes total cache cost by framing page eviction as a ski rental problem, using lightweight machine learning to optimize the trade-off between memory footprint and cache misses.
Modern high-performance database systems and cloud services rely on in-memory caching to keep frequently accessed data in RAM to bypass slow disk operations and deliver the lightning-fast response times users expect. But this performance comes with a cost (literally): high-speed memory is expensive, and some serverless cloud providers charge up to $3 per day for just 1 GiB of memory.
Historically, cache management has been treated as a fixed-resource problem. With regular, fixed-sized caching, engineers allocate a specific amount of memory for the cache and the system uses eviction policies like least recently used (LRU) replacement to decide which data to keep when that space runs out. This leads to a classic “Goldilocks” problem: size the cache too small and performance plummets; size it too large for peak demand, and you waste thousands of dollars on idle memory.
In a paper published at the Conference on Innovative Data Systems Research (CIDR), we introduced linear elastic caching, a new approach designed to minimize the total cost of ownership (TCO) of cache management by dynamically adjusting cache size in response to real-time workloads. Instead of treating memory as a fixed, pre-allocated resource, we treat it as an utility whose cost is linear in both the size of the cached data and the duration for which it is held in the cache. By treating memory footprint as a variable cost that integrates over time, we showed that we can significantly reduce expenses without compromising system performance.
To solve the challenge of dynamic cache sizing, let’s use the classic ski rental problem. Imagine you’re on a ski trip of unknown length. Each day, you face a choice: rent skis for a small daily fee or buy them for a larger upfront cost and ski for free thereafter. If you knew exactly how many days you would ski, the choice would be easy. But without that knowledge, you need an algorithm to minimize your total spend.
Similarly, in a linear elastic cache, every piece of data faces a comparable dilemma. When a piece of data is accessed, the system must decide between two alternatives:
At the same time, the system cannot optimize for each piece of data independently since the cache has a maximum allocated size (think of a large group of people at a ski resort, where the resort only has a limited number of skis to offer). Our core theoretical contribution proves that we can optimize these two factors — the eviction policy and the "rental" duration — separately. This separation lends itself nicely into a clean practical implementation. We can use a ski rental algorithm to determine the time-to-live (TTL) of a page (analogous to the rental duration). If a page isn’t accessed again before its TTL expires, it is automatically evicted. But if the cache ever becomes physically full, a traditional eviction policy like least recently used (LRU) steps in to manage the space.
Traditional online algorithm design focuses on providing worst-case performance guarantees. For the ski rental problem, the classic “break-even” algorithm is to rent until the accumulated cost equals the purchase price, and then buying the skis. While this approach (and its randomized counterpart) provide solid worst-case guarantees, production workloads are mostly predictable. Data access in systems like Spanner — our globally distributed database — often follows discernible patterns that can be exploited to make better renting decisions.
To ensure our theory holds up in the real world, we conducted extensive experiments using two primary sources:
We developed a practical algorithm that assigns a time-to-live (TTL) to the cached page on each page request based on the page’s access patterns and costs. Because Spanner handles billions of requests per second, this TTL prediction model has to be incredibly lightweight. We opted for a shallow decision tree that can be translated into a few lines of C++ code. The resulting code is also easily interpretable and provides valuable insights on the workload characteristics. This model considers features such as the size of the data, the cost of a cache miss (when data isn’t in the cache and the system needs to retrieve it from some other, slower system like a disk), and the type of database operation being performed to predict the optimal TTL for each page.
We integrated the elastic caching policy into Spanner's production servers over several months. Compared to a standard fixed-size cache, the results were substantial:
Crucially, because the algorithm is "cost-aware," the small increase in cache misses was concentrated on data that is cheap to fetch from storage, meaning the impact on actual I/O costs was a negligible 0.5%.
We also evaluated our elastic caching approach using several publicly available cache traces. We used an optimized implementation of the greedy dual size frequency (GDSF) eviction algorithm — a generalization of the well-known LRU policy that allows for pages of different sizes — as a fixed cache size baseline policy.
We considered four variants of elastic caching depending on which ski rental algorithm we used and whether or not we used a machine learned model. Since the available public traces don't have application-level features available for training, we didn’t implement decision trees for prediction. Instead, we developed a simple learning strategy that splits each trace in half and uses the first half for training. For each individual page in the training trace, we computed the best TTL for the page that minimizes the cost over the training trace.
Since the behavior of the cache changes depending on what's initially in the cache, a common practice, known as “warming up”, is to use some prefix of the cache trace to populate the cache but not actually measure performance on it. We warmed up all caches with one day’s worth of requests from the second half of the trace and used the rest for testing and measurements. During the test trace, if we encountered a page that was seen during training, we set the TTL to be the best precomputed TTL for that page. Otherwise, we set the TTL using either the breakeven or randomized policies.
We found that the elastic approach consistently outperformed fixed-size caches across diverse workloads. As the cost of memory increases relative to the cost of a cache miss, the savings provided by elastic caching become even more pronounced.
As the following figures demonstrate, elastic caching policies incur significantly lower total cost by dynamically adapting the cache size to the workload. We also observe that the elastic policies incur a much lower cache miss rate when compared to a fixed size policy at a comparable size.
Linear elastic caching represents a shift in how we think about cloud infrastructure. By moving away from static peak-load provisioning and toward a dynamic, cost-aware model, we can build systems that are both high-performing and economically efficient.
Our evaluation of these learned ski rental policies across Spanner workloads demonstrates that even small, lightweight ML models can have a massive impact when applied to core infrastructure. As cloud environments continue to offer more granular, pay-as-you-go pricing for resources, elastic strategies will become essential for any large-scale service looking to optimize its global footprint.
This represents joint work with Tamas Sarlos (Google) and Ravi Kumar (Google) and was presented at the Conference on Innovative Data Systems Research (CIDR) 2025.