-
Notifications
You must be signed in to change notification settings - Fork 3k
Test both vectorized and nonvectorized readers in Parquet golden file tests #13890
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Test both vectorized and nonvectorized readers in Parquet golden file tests #13890
Conversation
...c/test/java/org/apache/iceberg/spark/data/vectorized/parquet/TestParquetVectorizedReads.java
Show resolved
Hide resolved
...c/test/java/org/apache/iceberg/spark/data/vectorized/parquet/TestParquetVectorizedReads.java
Outdated
Show resolved
Hide resolved
huaxingao
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
kevinjqliu
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
vectorized uses assertRecordsMatch which reads with VectorizedSparkParquetReaders 👍
Line 268 in 4e71502
| VectorizedSparkParquetReaders.buildReader(schema, type, idToConstant, null)); |
|
Merged. Thanks @eric-maynard for the PR! Thanks @kevinjqliu for the review! |
#13450 added new tests for specific Parquet encodings that leverage pre-baked files. However, that test currently only reads these using the nonvectorized reader. This PR updates that test to check each file's data using both the vectorized and nonvectorized readers. Only files that are supported by both readers are included for the time being.