Test both vectorized and nonvectorized readers in Parquet golden file tests #13890

eric-maynard · 2025-08-21T17:58:33Z

#13450 added new tests for specific Parquet encodings that leverage pre-baked files. However, that test currently only reads these using the nonvectorized reader. This PR updates that test to check each file's data using both the vectorized and nonvectorized readers. Only files that are supported by both readers are included for the time being.

...c/test/java/org/apache/iceberg/spark/data/vectorized/parquet/TestParquetVectorizedReads.java

huaxingao

LGTM

kevinjqliu

LGTM

vectorized uses assertRecordsMatch which reads with VectorizedSparkParquetReaders 👍

iceberg/spark/v4.0/spark/src/test/java/org/apache/iceberg/spark/data/vectorized/parquet/TestParquetVectorizedReads.java

Line 268 in 4e71502

VectorizedSparkParquetReaders.buildReader(schema, type, idToConstant, null));

huaxingao · 2025-08-25T20:02:09Z

Merged. Thanks @eric-maynard for the PR! Thanks @kevinjqliu for the review!

eric-maynard added 15 commits August 18, 2025 10:39

backport changes

72ebf5a

add vectorized & nonvectorized check

23f7747

lint

d115676

typofix

5e1b3be

stable using json

c955805

Add resources for spark 3.5

01e4549

semistable

bfd84a4

revert gradle.properties

825623f

try fixing lints

6366634

gradle.properties revert

52502df

fix assert message

2ed723f

change test logic

bb0a58f

do not hardcode size

00504ef

apply reverts

2d23ff1

yank rle

a2aca90

github-actions bot added the spark label Aug 21, 2025

eric-maynard mentioned this pull request Aug 21, 2025

Backport Parquet encoding tests for Spark 3.5 #13859

Merged

spotless

e927d94

huaxingao reviewed Aug 21, 2025

View reviewed changes

...c/test/java/org/apache/iceberg/spark/data/vectorized/parquet/TestParquetVectorizedReads.java Show resolved Hide resolved

huaxingao reviewed Aug 21, 2025

View reviewed changes

...c/test/java/org/apache/iceberg/spark/data/vectorized/parquet/TestParquetVectorizedReads.java Outdated Show resolved Hide resolved

eric-maynard added 2 commits August 22, 2025 10:41

comparison with SparkParquetReaders

0a1bb9b

assertEqualsUnsafe

33158d4

huaxingao approved these changes Aug 22, 2025

View reviewed changes

kevinjqliu approved these changes Aug 24, 2025

View reviewed changes

huaxingao merged commit 5d5e0a3 into apache:main Aug 25, 2025
27 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Test both vectorized and nonvectorized readers in Parquet golden file tests #13890

Test both vectorized and nonvectorized readers in Parquet golden file tests #13890

Uh oh!

eric-maynard commented Aug 21, 2025 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

huaxingao left a comment

Uh oh!

kevinjqliu left a comment

Uh oh!

Uh oh!

huaxingao commented Aug 25, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Test both vectorized and nonvectorized readers in Parquet golden file tests #13890

Test both vectorized and nonvectorized readers in Parquet golden file tests #13890

Uh oh!

Conversation

eric-maynard commented Aug 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

huaxingao left a comment

Choose a reason for hiding this comment

Uh oh!

kevinjqliu left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

huaxingao commented Aug 25, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

eric-maynard commented Aug 21, 2025 •

edited

Loading