Skip to content

Conversation

@nuno-faria
Copy link
Contributor

Which issue does this PR close?

Rationale for this change

Uses the metadata_size_hint when reading metadata in the CachedParquetFileReader, if available, so the behavior is consistent with the documentation.

What changes are included in this PR?

  • Added metadata_size_hint to CachedParquetFileReader.

Are these changes tested?

N/A.

Are there any user-facing changes?

No.

@github-actions github-actions bot added the datasource Changes to the datasource crate label Aug 23, 2025
@shehabgamin
Copy link
Contributor

Super small PR, looks like this one maybe slipped through the cracks? Probably should make it into DF 50 if possible.
cc @alamb

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @nuno-faria and @shehabgamin

Is there some way we can add a test for this behavior so we don't introduce a regression in the future?

@alamb
Copy link
Contributor

alamb commented Sep 13, 2025

Super small PR, looks like this one maybe slipped through the cracks? Probably should make it into DF 50 if possible. cc @alamb

I think we have already started voting on 50.0.0:

Perhaps you can suggest adding this to the 50.0.0 release on #16799?

@nuno-faria
Copy link
Contributor Author

Is there some way we can add a test for this behavior so we don't introduce a regression in the future?

I thought about adding one when creating the PR but don't know if the prefetch information is exposed to the outside.

@alamb alamb merged commit af7587b into apache:main Sep 13, 2025
27 checks passed
@alamb alamb mentioned this pull request Sep 16, 2025
18 tasks
shehabgamin pushed a commit to lakehq/datafusion that referenced this pull request Sep 17, 2025
@nuno-faria nuno-faria deleted the cached_parquet_reader_hint branch September 17, 2025 12:51
alamb pushed a commit that referenced this pull request Sep 17, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

datasource Changes to the datasource crate

Projects

None yet

Development

Successfully merging this pull request may close these issues.

CachedParquetFileReader should respect the metadata prefetch hint

3 participants