-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Add runtime config options for list_files_cache_limit and list_files_cache_ttl
#19108
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
list_files_cache_limit and list_files_cache_ttl
|
Hi @BlakeOrth @alamb, this is ready for review |
BlakeOrth
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This overall looks very nice to me, thanks! Perhaps I'm missing it, but is there anyway to set the TTL back to infinity (None) after it's been set to Some(Duration)?
|
@BlakeOrth Yep, you can run |
BlakeOrth
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My approval doesn't have any real power here, but this all looks good to me. The CI failure seems like it's probably unrelated. I can't imagine this work having any effect on benchmarks.
|
Thanks @delamarch3 and @BlakeOrth -- I'll try and check this out soon |
alamb
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for this PR @delamarch3 and @BlakeOrth 🙏
I think this PR would be better with unit tests for time parsing but perhaps we can add that as a follow on PR. Otherwise it looks really nice 👌
The values for these will not update after running SET, unless the runtime has been configured with a ListFilesCache (it's None by default, so there is nothing to update)
This confused me for a while but it makes sense to me and will be fixed with
Ah sorry, should have been more clear, I meant in the datafusion-cli. I configured it like this to test it out: diff --git a/datafusion-cli/src/main.rs b/datafusion-cli/src/main.rs
index de666fced..abed30ea3 100644
--- a/datafusion-cli/src/main.rs
+++ b/datafusion-cli/src/main.rs
@@ -23,6 +23,8 @@ use std::process::ExitCode;
use std::sync::{Arc, LazyLock};
use datafusion::error::{DataFusionError, Result};
+use datafusion::execution::cache::cache_manager::CacheManagerConfig;
+use datafusion::execution::cache::DefaultListFilesCache;
use datafusion::execution::context::SessionConfig;
use datafusion::execution::memory_pool::{
FairSpillPool, GreedyMemoryPool, MemoryPool, TrackConsumersPool,
@@ -222,6 +224,11 @@ async fn main_inner() -> Result<()> {
);
rt_builder = rt_builder.with_object_store_registry(instrumented_registry.clone());
+ rt_builder = rt_builder.with_cache_manager(
+ CacheManagerConfig::default()
+ .with_list_files_cache(Some(Arc::new(DefaultListFilesCache::default()))),
+ );
+
let runtime_env = rt_builder.build_arc()?;
// enable dynamic file query |
No worries -- I think it is a temporary situation so we should be good to go |
|
Thanks @delamarch3 and @BlakeOrth |
Which issue does this PR close?
Rationale for this change
Make the list file cache memory limit and TTL configurable via runtime config.
What changes are included in this PR?
SETandRESETlist_files_cache_limitandlist_files_cache_ttllist_files_cache_ttlwill expect the duration to look like either1m30sor30(I'm wondering if it would be simpler for it to just accept a single unit?)update_cache_ttl()to theListFilesCachetrait so we can update it fromRuntimeEnvBuilder::build()Are these changes tested?
Yes
Are there any user-facing changes?
update_cache_ttl()has been added to theListFilesCachetrait