Skip to content

Conversation

@eric-maynard
Copy link
Contributor

@eric-maynard eric-maynard commented Aug 21, 2025

#13450 added new tests for specific Parquet encodings that leverage pre-baked files. However, that test currently only reads these using the nonvectorized reader. This PR updates that test to check each file's data using both the vectorized and nonvectorized readers. Only files that are supported by both readers are included for the time being.

Copy link
Contributor

@huaxingao huaxingao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

@kevinjqliu kevinjqliu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

vectorized uses assertRecordsMatch which reads with VectorizedSparkParquetReaders 👍

VectorizedSparkParquetReaders.buildReader(schema, type, idToConstant, null));

@huaxingao huaxingao merged commit 5d5e0a3 into apache:main Aug 25, 2025
27 checks passed
@huaxingao
Copy link
Contributor

Merged. Thanks @eric-maynard for the PR! Thanks @kevinjqliu for the review!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants