-
Notifications
You must be signed in to change notification settings - Fork 1.8k
Open
Description
Currently in datafusion, the CacheManagerConfig has the metadata cache as an option
/// Cache of file-embedded metadata, used to avoid reading it multiple times when processing a
/// data file (e.g., Parquet footer and page metadata).
/// If not provided, the [`CacheManager`] will create a [`DefaultFilesMetadataCache`].
pub file_metadata_cache: Option<Arc<dyn FileMetadataCache>>,
Though as per the doc written above even if we set the file_metadata_cache as None, we will still end up with an DefaultFilesMetadataCache being created on it's own.
I think this makes the API a bit confusing and would like it to atleast honor the cases when we set it to None just like other Config options.
For the same I'm thinking we can initialise the metadata cache object in the CacheManagerConfig in case of default method but in any other case where the user is overriding it, we should honor what's been set instead of creating a cache here in cache_manager:
.file_metadata_cache
.as_ref()
.map(Arc::clone)
.unwrap_or_else(|| {
Arc::new(DefaultFilesMetadataCache::new(config.metadata_cache_limit))
});
Tagging @nuno-faria @alamb who worked on the earlier PRs.
zhuqi-lucas and xudong963
Metadata
Metadata
Assignees
Labels
No labels