-
Notifications
You must be signed in to change notification settings - Fork 2.9k
Parquet: Add variant array reader in Parquet #12512
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
f1c0dca to
cb575e3
Compare
core/src/main/java/org/apache/iceberg/variants/ShreddedArray.java
Outdated
Show resolved
Hide resolved
core/src/main/java/org/apache/iceberg/variants/ShreddedArray.java
Outdated
Show resolved
Hide resolved
core/src/test/java/org/apache/iceberg/variants/TestShreddedArray.java
Outdated
Show resolved
Hide resolved
core/src/test/java/org/apache/iceberg/variants/TestShreddedArray.java
Outdated
Show resolved
Hide resolved
core/src/test/java/org/apache/iceberg/variants/TestShreddedArray.java
Outdated
Show resolved
Hide resolved
core/src/test/java/org/apache/iceberg/variants/TestShreddedArray.java
Outdated
Show resolved
Hide resolved
core/src/test/java/org/apache/iceberg/variants/TestShreddedArray.java
Outdated
Show resolved
Hide resolved
core/src/test/java/org/apache/iceberg/variants/TestShreddedArray.java
Outdated
Show resolved
Hide resolved
core/src/test/java/org/apache/iceberg/variants/TestShreddedArray.java
Outdated
Show resolved
Hide resolved
core/src/main/java/org/apache/iceberg/variants/ShreddedArray.java
Outdated
Show resolved
Hide resolved
parquet/src/main/java/org/apache/iceberg/parquet/ParquetVariantReaders.java
Outdated
Show resolved
Hide resolved
parquet/src/main/java/org/apache/iceberg/parquet/ParquetVariantReaders.java
Outdated
Show resolved
Hide resolved
parquet/src/main/java/org/apache/iceberg/parquet/VariantReaderBuilder.java
Outdated
Show resolved
Hide resolved
588842f to
a576eca
Compare
a576eca to
ab877a1
Compare
| } | ||
| } | ||
|
|
||
| private static class ListReader implements VariantValueReader { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should be ArrayReader right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I changed from ArrayReader to ListReader because in the following writer implementation, I need implement ListWriter to handle the list and then have a ArrayValueWriter on top of ListWriter to combine with value.
I name it as ListReader to be consistent so shredded(ListReader) will be actual ArrayReader.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think I quite follow the logic here. This produces a ValueArray so it makes more sense to me that it would be ArrayReader. But we can rename it later so this isn't a big deal.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let me change back to ArrayReader for now. When I share the array writer PR, maybe it's clear what I mean.
parquet/src/main/java/org/apache/iceberg/parquet/ParquetVariantReaders.java
Outdated
Show resolved
Hide resolved
parquet/src/test/java/org/apache/iceberg/parquet/TestVariantReaders.java
Outdated
Show resolved
Hide resolved
parquet/src/test/java/org/apache/iceberg/parquet/TestVariantReaders.java
Outdated
Show resolved
Hide resolved
parquet/src/test/java/org/apache/iceberg/parquet/TestVariantReaders.java
Outdated
Show resolved
Hide resolved
parquet/src/test/java/org/apache/iceberg/parquet/TestVariantReaders.java
Show resolved
Hide resolved
parquet/src/test/java/org/apache/iceberg/parquet/TestVariantReaders.java
Show resolved
Hide resolved
parquet/src/test/java/org/apache/iceberg/parquet/TestVariantReaders.java
Show resolved
Hide resolved
parquet/src/test/java/org/apache/iceberg/parquet/TestVariantReaders.java
Show resolved
Hide resolved
parquet/src/test/java/org/apache/iceberg/parquet/TestVariantReaders.java
Show resolved
Hide resolved
7f3344a to
315393d
Compare
6b5628b to
a764da7
Compare
a764da7 to
deff5f2
Compare
| VariantValue actualValue = actualVariant.value(); | ||
| assertThat(actualValue.type()).isEqualTo(PhysicalType.ARRAY); | ||
| assertThat(actualValue.asArray().numElements()).isEqualTo(1); | ||
| assertThat(actualValue.asArray().get(0)).isNull(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This isn't correct because it cannot be represented as a valid encoded variant. Variant arrays cannot hold missing values -- they can only hold Variant null values.
The spec states that when a value is missing but required, the reader should produce a variant null, so this should be equal to Variants.ofNull().
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are right. We do need to return Variant null.
If a Variant is missing in a context where a value is required, readers must return a Variant null (00): basic type 0 (primitive) and physical type 0 (null). For example, if a Variant is required (like measurement above) and both value and typed_value are null, the returned value must be 00 (Variant null).
rdblue
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is ready other than the issue here: https://github.com/apache/iceberg/pull/12512/files#r2060845248
Thanks @aihuaxu, this is great work!
| ValueArray arr = Variants.array(); | ||
| do { | ||
| if (column.currentDefinitionLevel() > definitionLevel) { | ||
| arr.add(reader.read(metadata)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is where you need to fix the incorrect test case. This should be:
VariantValue value = reader.read(metadata);
arr.add(value != null ? value : Variants.ofNull());| assertThat(actualValue.type()).isEqualTo(PhysicalType.ARRAY); | ||
| assertThat(actualValue.asArray().numElements()).isEqualTo(1); | ||
| assertThat(actualValue.asArray().get(0)).isNull(); | ||
| VariantTestUtil.assertEqual(Variants.ofNull(), actualValue.asArray().get(0)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this could be a simpler assertion by checking an expected array against the actual, but this is okay, too.
rdblue
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me. I'll merge when tests are passing.
|
I'm re-running the failed test run because I think it may have just been a flaky test. |
Thanks. Yeah. This is flaky test. It was passing locally and seems sometimes it may take more retries. |
|
Merged. Thanks, @aihuaxu! Nice work. |
Add the array reader support for Variant in Parquet module.
Part of: #12472