RFC-6297: Cache Layer #6297

koushiro · 2025-06-16T13:07:46Z

Which issue does this PR close?

Closes #.

Rationale for this change

Propose a Cache Layer design

What changes are included in this PR?

A new RFC

Are there any user-facing changes?

Xuanwo · 2025-06-17T09:42:28Z

My current understanding of the cache layer is that it is a very thin component that simply wraps two (or more) OpenDAL operators. The cache policy and strategy should be defined by a separate trait. I haven't explored this idea in depth yet, but it might look something like the following:

let cache_layer = CacheLayer::builder()
    .route("*.hint", WholeCache::new(1024), memory_cache)
    .route("*.json", FixedSizeCache::new(64 << 20), memory_cache)
    .route("*.parquet", ParquetCache::new(), foyer_cache)
    .build()?;
let op = op.layer(cache_layer);

All FixedSizeCache and ParquetCache are implementations of the CachePolicy trait and can be provided by users. OpenDAL can offer some default implementations as a starting point. In other words, the cache layer itself does not make any decisions for users.

koushiro · 2025-06-26T15:18:16Z

My current understanding of the cache layer is that it is a very thin component that simply wraps two (or more) OpenDAL operators.

I agree，I think it's best to simplify the cache layer design first, without considering a unified cache policy and strategy design. Instead, let the corresponding OpenDAL service manage it itself. After collecting enough use cases and feedback, we can consider the design of policy/strategy and provide some common policy/strategy implementations.

Copilot

Pull request overview

This PR proposes RFC-6297, which introduces a Cache Layer to OpenDAL for transparent read-through and write-through caching capabilities. The RFC aims to improve performance by allowing users to cache data from slower storage services (like S3) to faster ones (like Memory or Redis).

Key changes:

Introduces a new CacheLayer that wraps existing storage with caching functionality
Defines a CacheStorage trait for flexible cache backend implementations
Proposes CacheReader and CacheWriter components for handling cached and uncached data flows

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

core/core/src/docs/rfcs/6297_cache_layer.md

core/core/src/docs/rfcs/0000_cache_layer.md

core/core/src/docs/rfcs/6297_cache_layer.md

core/core/src/docs/rfcs/0000_cache_layer.md

core/core/src/docs/rfcs/6297_cache_layer.md

core/core/src/docs/rfcs/0000_cache_layer.md

Xuanwo · 2025-12-29T14:36:56Z

Thank you for working on this! I will find sometime to review this RFC again.

meteorgan · 2025-12-31T15:03:42Z

core/core/src/docs/rfcs/6297_cache_layer.md

+    }
+
+    async fn delete(&self) -> Result<(RpDelete, Self::Deleter)> {
+        self.inner.delete().await


should we also delete the file in the cache service ? we shouldn't depend on the evict policy of the underlying service here for we may always read the deleted file if we use memory as a cache service.

it'll be nice to invalidate the cache on delete() is called imo, in this manner, we could claim a "best effort" cache consistency.

meteorgan · 2025-12-31T15:06:08Z

core/core/src/docs/rfcs/6297_cache_layer.md

+
+1. **Buffer Accumulation**: All written data is accumulated in an internal `buffer`
+2. **Primary Write**: Data is always written to the underlying service first via `inner.write()`
+3. **Cache Write**: If `cache_write` is enabled and the writing to underlying service succeeds, the complete data is written to cache


Should we invalidate the cache data even if cache_write is disabled after a successful write?

meteorgan · 2025-12-31T15:08:29Z

core/core/src/docs/rfcs/6297_cache_layer.md

+1. **Buffer Accumulation**: All written data is accumulated in an internal `buffer`
+2. **Primary Write**: Data is always written to the underlying service first via `inner.write()`
+3. **Cache Write**: If `cache_write` is enabled and the writing to underlying service succeeds, the complete data is written to cache
+4. **Atomic Caching**: Cache operations happen only after successful completion to ensure cache consistency


This is not atomic as cache operation could be failed.

meteorgan · 2025-12-31T16:09:17Z

core/core/src/docs/rfcs/6297_cache_layer.md

+
+# Future possibilities
+
+- Customizable Cache Key Generation:


we may need some keys like {path}-{version}, {path}-{range}, etc.

flaneur2020 · 2026-01-04T02:52:26Z

core/core/src/docs/rfcs/6297_cache_layer.md

+
+        // Try cache first if read caching is enabled
+        if self.cache_options.read {
+            match self.cache_service.read(&cache_key).await {


when dealing with object storages, it's considered as a common practise to merge smaller writes into a bigger object, which often makes each object exceed 512MB or even gigabytes.

also, when objects are accessed, it's often scanned within a range, and the range is often maintained from an outside index (like iceberg's manifest). it would has read amplification if we have to load the entire object from cache.

it'll be nice to able to have an abstraction here likely called as CacheMapper or CacheChunker, which handles the shaping of cache accesses, letting it to be 1:M for object:cache entity. this allows us not to cache the entire object, but a subset of the object by range.

flaneur2020 · 2026-01-04T02:58:01Z

core/core/src/docs/rfcs/6297_cache_layer.md

+
+1. **Cache Key Strategy**: Should we provide options for custom cache key generation (e.g., hashing, prefixing)?
+
+2. **Invalidation API**: Should we provide explicit cache invalidation methods, or rely entirely on the underlying cache storage?


+1 for having no invlidation API, it should solely depends on the eviction strategy underlying.

flaneur2020 · 2026-01-04T02:59:41Z

core/core/src/docs/rfcs/6297_cache_layer.md

+        }
+    }
+
+    async fn write(&self, key: &str, value: Vec<u8>) -> Result<()> {


how about using a Buffer instead of Vec<u8> here?

flaneur2020 · 2026-01-04T03:02:16Z

core/core/src/docs/rfcs/6297_cache_layer.md

+    fn stat(&self, key: &str) -> impl Future<Output = Result<Metadata>> + MaybeSend;
+
+    /// Check whether `key` exists in the cache.
+    fn exists(&self, key: &str) -> impl Future<Output = Result<bool>> + MaybeSend;


does this exists method considered as a must? imo we could already check the existencs with the stat method.

or, iiuc, we could consider to have a default impl for exists by calling stat method.

flaneur2020 · 2026-01-04T03:31:44Z

core/core/src/docs/rfcs/6297_cache_layer.md

+
+Storage access performance varies greatly across different storage services.
+Remote object stores like S3 or GCS have much higher latency than local storage or in-memory caches.
+In many applications, particularly those with read-heavy workloads or repeated access to the same data, caching can significantly improve performance.


shall we assume to keep the content of objects to be immutable on arbitary path when using CacheLayer?

imo keeping content immutable is a common practise for object store caching, keeping things immutable removes the hardest part on cache consistency. however, keeping content immutable requires user to keep this in mind.

flaneur2020 · 2026-01-04T03:36:26Z

core/core/src/docs/rfcs/6297_cache_layer.md

+            // Enable cache promotion during read operations (default: true)
+            read_promotion: true,
+            // Enable write-through caching (default: true)
+            write: true,


is it possible to use an enum for the write option to seperate write through or write back?

sometimes write back could be useful, be like if the upstream s3 service became unavailable, having a write back cache could be benefical to keep our service available by buffering the writes.

Xuanwo · 2026-01-04T10:24:32Z

Hello everyone, before we continue discussing the Cache Layer, I invite everyone to reach a consensus on the Route Layer's design. I believe this approach can greatly simplify our Cache Layer design and allow it to focus solely on caching.

RFC: #7130

cc @koushiro @flaneur2020 @meteorgan @killme2008, also @MrCroxx for work on foyer layer.

Xuanwo · 2026-01-06T10:36:32Z

core/core/src/docs/rfcs/6297_cache_layer.md

+```rust
+let op = s3.layer(
+    CacheLayer::new(memory)
+        .with_options(CacheOptions {


Instead of providing CacheOptions, I prefer CacheLayer to do less work and delegate the different options into a new trait called CachePolicy.

In this way, every cache layer will be composed by an Operator and a CachePolicy.

#[derive(Clone, Copy, Debug, PartialEq, Eq)] pub enum CacheDirective { Bypass, Use { chunk_size: Option<u32>, fill: bool }, } pub trait CachePolicy: Send + Sync + 'static { fn evaluate(&self, req: &CacheRequest<'_>) -> CacheDirective; }

CachePolicy can decide everything about cache itself but we can start with simple:

bypass or use cache

if cache missed, do we need to fill it.

how to cache it, as a whole or in chunk.

We can ship some widely used policy too:

WholeCachePolicy

ChunkedCachePolicy

MetadataOnlyCachePolicy

In this way, users can extend cache behavior based on their own needs.

The CacheDirective you mentioned appears to only cover read operations. Does it lack provisions for write or delete operations? Should we also include an Invalidate directive?

For CacheRequest, I currently think it is composed as follows, but I'm not sure if I should directly use the existing ops types (OpStat/OpRead/OpWrite/OpDelete) in CacheOperation. What do you think?

pub struct CacheRequest<'a> { /// Path as seen by OpenDAL. pub path: &'a str, /// Operation kind (stat/read/write/delete) pub op: CacheOperation, // Additional fields can be added in the future. }

Does it lack provisions for write or delete operations? Should we also include an Invalidate directive?

Yes, I do think we do lock of invalidate support here. However I believe it's not a blocker and we can add it later. We can test CacehLayer on users that ensure that files are immutable.

I'm not sure if I should directly use the existing ops types (OpStat/OpRead/OpWrite/OpDelete) in CacheOperation

I think it's fine. I also thing use a String for path is also fine unless we are sure this is a bottleneck.

Xuanwo · 2026-01-06T10:37:28Z

core/core/src/docs/rfcs/6297_cache_layer.md

+## Architecture
+
+The Cache Layer implements the `Layer` trait and wraps an underlying `Access` implementation with caching capabilities.
+It introduces a `CacheService` trait that defines the interface for cache operations.


I suggest we just accpet Operator as public API and convert into Accessor inside. Also, we should not add a new API surface like CacheService.

By simply reusing the Access trait, we can avoid a lot of repetitive work.

Xuanwo · 2026-01-06T10:38:58Z

core/core/src/docs/rfcs/6297_cache_layer.md

+
+```rust
+#[derive(Clone, Copy, Debug)]
+pub struct CacheOptions {


As discussed before, I think we should expose a policy trait instead.

Xuanwo · 2026-01-06T10:39:31Z

core/core/src/docs/rfcs/6297_cache_layer.md

+    cache_options: CacheOptions,
+}
+
+impl<A: Access, S: CacheService> LayeredAccess for CacheAccessor<A, S> {


The proposal should not include detailed code, as it could divert our focus.

koushiro requested a review from Xuanwo as a code owner June 16, 2025 13:07

dosubot bot added the size:L This PR changes 100-499 lines, ignoring generated files. label Jun 16, 2025

koushiro changed the title ~~RFC: Cache Layer~~ RFC-6297: Cache Layer Jun 16, 2025

koushiro mentioned this pull request Jun 16, 2025

new feature: Add cache layer for opendal #5678

Open

Copilot AI review requested due to automatic review settings December 27, 2025 07:53

Copilot started reviewing on behalf of koushiro December 27, 2025 07:54 View session

Copilot AI reviewed Dec 27, 2025

View reviewed changes

koushiro marked this pull request as draft December 27, 2025 08:01

koushiro marked this pull request as ready for review December 27, 2025 12:29

koushiro mentioned this pull request Dec 27, 2025

Tracking issue of RFC-6297: Cache Layer #7107

Open

koushiro added 6 commits December 29, 2025 18:20

RFC: Cache Layer

c97cfe1

update RFC PR link

8af48a0

some typos

ce94c43

update cache layer rfc

5d1fef6

update tracking issue link

c923ed9

update

6dbc7a7

koushiro force-pushed the add-cache-layer-rfc branch from 2bd38da to 6dbc7a7 Compare December 29, 2025 10:21

meteorgan reviewed Dec 31, 2025

View reviewed changes

meteorgan mentioned this pull request Jan 2, 2026

docs: add an RFC for having chunked cache support for FoyerLayer #7127

Open

flaneur2020 reviewed Jan 4, 2026

View reviewed changes

Xuanwo reviewed Jan 6, 2026

View reviewed changes


		1. Cache Key Strategy: Should we provide options for custom cache key generation (e.g., hashing, prefixing)?

		2. Invalidation API: Should we provide explicit cache invalidation methods, or rely entirely on the underlying cache storage?

RFC-6297: Cache Layer #6297

Are you sure you want to change the base?

RFC-6297: Cache Layer #6297

Uh oh!

Conversation

koushiro commented Jun 16, 2025

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

Are there any user-facing changes?

Uh oh!

Xuanwo commented Jun 17, 2025

Uh oh!

koushiro commented Jun 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Xuanwo commented Dec 29, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

flaneur2020 Jan 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

flaneur2020 Jan 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Xuanwo commented Jan 4, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

koushiro commented Jun 26, 2025 •

edited

Loading

flaneur2020 Jan 4, 2026 •

edited

Loading

flaneur2020 Jan 4, 2026 •

edited

Loading