Skip to content

Conversation

@delamarch3
Copy link
Contributor

@delamarch3 delamarch3 commented Dec 5, 2025

Which issue does this PR close?

Rationale for this change

Make the list file cache memory limit and TTL configurable via runtime config.

What changes are included in this PR?

  • Add ability to SET and RESET list_files_cache_limit and list_files_cache_ttl
  • list_files_cache_ttl will expect the duration to look like either 1m30s or 30 (I'm wondering if it would be simpler for it to just accept a single unit?)
  • Add update_cache_ttl() to the ListFilesCache trait so we can update it from RuntimeEnvBuilder::build()
  • Add config entries

Are these changes tested?

Yes

Are there any user-facing changes?

update_cache_ttl() has been added to the ListFilesCache trait

@github-actions github-actions bot added core Core DataFusion crate execution Related to the execution crate labels Dec 5, 2025
@github-actions github-actions bot added the sqllogictest SQL Logic Tests (.slt) label Dec 5, 2025
@delamarch3 delamarch3 changed the title add config options for list_files_cache_limit and list_files_cache_ttl Add runtime config options for list_files_cache_limit and list_files_cache_ttl Dec 5, 2025
@delamarch3 delamarch3 marked this pull request as ready for review December 5, 2025 16:49
@delamarch3 delamarch3 marked this pull request as draft December 5, 2025 16:58
@github-actions github-actions bot added the documentation Improvements or additions to documentation label Dec 6, 2025
@delamarch3 delamarch3 marked this pull request as ready for review December 8, 2025 12:11
@delamarch3
Copy link
Contributor Author

Hi @BlakeOrth @alamb, this is ready for review

Copy link
Contributor

@BlakeOrth BlakeOrth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This overall looks very nice to me, thanks! Perhaps I'm missing it, but is there anyway to set the TTL back to infinity (None) after it's been set to Some(Duration)?

@delamarch3
Copy link
Contributor Author

@BlakeOrth Yep, you can run reset datafusion.runtime.list_files_cache_ttl to set it back to None

Copy link
Contributor

@BlakeOrth BlakeOrth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My approval doesn't have any real power here, but this all looks good to me. The CI failure seems like it's probably unrelated. I can't imagine this work having any effect on benchmarks.

@alamb
Copy link
Contributor

alamb commented Dec 10, 2025

Thanks @delamarch3 and @BlakeOrth -- I'll try and check this out soon

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this PR @delamarch3 and @BlakeOrth 🙏

I think this PR would be better with unit tests for time parsing but perhaps we can add that as a follow on PR. Otherwise it looks really nice 👌

#19108 (comment)

The values for these will not update after running SET, unless the runtime has been configured with a ListFilesCache (it's None by default, so there is nothing to update)

This confused me for a while but it makes sense to me and will be fixed with

@delamarch3
Copy link
Contributor Author

This confused me for a while but it makes sense to me and will be fixed with

Ah sorry, should have been more clear, I meant in the datafusion-cli. I configured it like this to test it out:

diff --git a/datafusion-cli/src/main.rs b/datafusion-cli/src/main.rs
index de666fced..abed30ea3 100644
--- a/datafusion-cli/src/main.rs
+++ b/datafusion-cli/src/main.rs
@@ -23,6 +23,8 @@ use std::process::ExitCode;
 use std::sync::{Arc, LazyLock};

 use datafusion::error::{DataFusionError, Result};
+use datafusion::execution::cache::cache_manager::CacheManagerConfig;
+use datafusion::execution::cache::DefaultListFilesCache;
 use datafusion::execution::context::SessionConfig;
 use datafusion::execution::memory_pool::{
     FairSpillPool, GreedyMemoryPool, MemoryPool, TrackConsumersPool,
@@ -222,6 +224,11 @@ async fn main_inner() -> Result<()> {
     );
     rt_builder = rt_builder.with_object_store_registry(instrumented_registry.clone());

+    rt_builder = rt_builder.with_cache_manager(
+        CacheManagerConfig::default()
+            .with_list_files_cache(Some(Arc::new(DefaultListFilesCache::default()))),
+    );
+
     let runtime_env = rt_builder.build_arc()?;

     // enable dynamic file query

@alamb
Copy link
Contributor

alamb commented Dec 14, 2025

This confused me for a while but it makes sense to me and will be fixed with

Ah sorry, should have been more clear, I meant in the datafusion-cli. I configured it like this to test it out:

diff --git a/datafusion-cli/src/main.rs b/datafusion-cli/src/main.rs
index de666fced..abed30ea3 100644
--- a/datafusion-cli/src/main.rs
+++ b/datafusion-cli/src/main.rs
@@ -23,6 +23,8 @@ use std::process::ExitCode;
 use std::sync::{Arc, LazyLock};

 use datafusion::error::{DataFusionError, Result};
+use datafusion::execution::cache::cache_manager::CacheManagerConfig;
+use datafusion::execution::cache::DefaultListFilesCache;
 use datafusion::execution::context::SessionConfig;
 use datafusion::execution::memory_pool::{
     FairSpillPool, GreedyMemoryPool, MemoryPool, TrackConsumersPool,
@@ -222,6 +224,11 @@ async fn main_inner() -> Result<()> {
     );
     rt_builder = rt_builder.with_object_store_registry(instrumented_registry.clone());

+    rt_builder = rt_builder.with_cache_manager(
+        CacheManagerConfig::default()
+            .with_list_files_cache(Some(Arc::new(DefaultListFilesCache::default()))),
+    );
+
     let runtime_env = rt_builder.build_arc()?;

     // enable dynamic file query

No worries -- I think it is a temporary situation so we should be good to go

@alamb alamb added this pull request to the merge queue Dec 16, 2025
@alamb
Copy link
Contributor

alamb commented Dec 16, 2025

Thanks @delamarch3 and @BlakeOrth

Merged via the queue into apache:main with commit 79cfe8e Dec 16, 2025
28 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

core Core DataFusion crate documentation Improvements or additions to documentation execution Related to the execution crate sqllogictest SQL Logic Tests (.slt)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add a way to dynamically configure / update the ListFilesCache settings via RuntimeConfiguration

3 participants