Skip to content

Allow support for switching off metadata cache #18362

@alchemist51

Description

@alchemist51

Currently in datafusion, the CacheManagerConfig has the metadata cache as an option

    /// Cache of file-embedded metadata, used to avoid reading it multiple times when processing a
    /// data file (e.g., Parquet footer and page metadata).
    /// If not provided, the [`CacheManager`] will create a [`DefaultFilesMetadataCache`].
    pub file_metadata_cache: Option<Arc<dyn FileMetadataCache>>,

Though as per the doc written above even if we set the file_metadata_cache as None, we will still end up with an DefaultFilesMetadataCache being created on it's own.

I think this makes the API a bit confusing and would like it to atleast honor the cases when we set it to None just like other Config options.

For the same I'm thinking we can initialise the metadata cache object in the CacheManagerConfig in case of default method but in any other case where the user is overriding it, we should honor what's been set instead of creating a cache here in cache_manager:

            .file_metadata_cache
            .as_ref()
            .map(Arc::clone)
            .unwrap_or_else(|| {
                Arc::new(DefaultFilesMetadataCache::new(config.metadata_cache_limit))
            });

Tagging @nuno-faria @alamb who worked on the earlier PRs.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions