Skip to content

Conversation

@aihuaxu
Copy link
Contributor

@aihuaxu aihuaxu commented Mar 13, 2025

Add the array reader support for Variant in Parquet module.

Part of: #12472

@aihuaxu aihuaxu force-pushed the parquet-variant-array-support branch 4 times, most recently from f1c0dca to cb575e3 Compare March 13, 2025 16:26
@aihuaxu aihuaxu force-pushed the parquet-variant-array-support branch from 588842f to a576eca Compare March 18, 2025 23:35
@aihuaxu aihuaxu requested a review from rdblue March 18, 2025 23:39
@aihuaxu aihuaxu force-pushed the parquet-variant-array-support branch from a576eca to ab877a1 Compare March 18, 2025 23:41
}
}

private static class ListReader implements VariantValueReader {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be ArrayReader right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed from ArrayReader to ListReader because in the following writer implementation, I need implement ListWriter to handle the list and then have a ArrayValueWriter on top of ListWriter to combine with value.

I name it as ListReader to be consistent so shredded(ListReader) will be actual ArrayReader.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think I quite follow the logic here. This produces a ValueArray so it makes more sense to me that it would be ArrayReader. But we can rename it later so this isn't a big deal.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me change back to ArrayReader for now. When I share the array writer PR, maybe it's clear what I mean.

@aihuaxu aihuaxu force-pushed the parquet-variant-array-support branch from 7f3344a to 315393d Compare April 25, 2025 04:38
@github-actions github-actions bot removed the API label Apr 25, 2025
@aihuaxu aihuaxu force-pushed the parquet-variant-array-support branch 4 times, most recently from 6b5628b to a764da7 Compare April 25, 2025 06:04
@aihuaxu aihuaxu force-pushed the parquet-variant-array-support branch from a764da7 to deff5f2 Compare April 25, 2025 16:42
VariantValue actualValue = actualVariant.value();
assertThat(actualValue.type()).isEqualTo(PhysicalType.ARRAY);
assertThat(actualValue.asArray().numElements()).isEqualTo(1);
assertThat(actualValue.asArray().get(0)).isNull();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't correct because it cannot be represented as a valid encoded variant. Variant arrays cannot hold missing values -- they can only hold Variant null values.

The spec states that when a value is missing but required, the reader should produce a variant null, so this should be equal to Variants.ofNull().

Copy link
Contributor Author

@aihuaxu aihuaxu Apr 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right. We do need to return Variant null.

If a Variant is missing in a context where a value is required, readers must return a Variant null (00): basic type 0 (primitive) and physical type 0 (null). For example, if a Variant is required (like measurement above) and both value and typed_value are null, the returned value must be 00 (Variant null).

Copy link
Contributor

@rdblue rdblue left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is ready other than the issue here: https://github.com/apache/iceberg/pull/12512/files#r2060845248

Thanks @aihuaxu, this is great work!

ValueArray arr = Variants.array();
do {
if (column.currentDefinitionLevel() > definitionLevel) {
arr.add(reader.read(metadata));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is where you need to fix the incorrect test case. This should be:

VariantValue value = reader.read(metadata);
arr.add(value != null ? value : Variants.ofNull());

assertThat(actualValue.type()).isEqualTo(PhysicalType.ARRAY);
assertThat(actualValue.asArray().numElements()).isEqualTo(1);
assertThat(actualValue.asArray().get(0)).isNull();
VariantTestUtil.assertEqual(Variants.ofNull(), actualValue.asArray().get(0));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this could be a simpler assertion by checking an expected array against the actual, but this is okay, too.

Copy link
Contributor

@rdblue rdblue left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me. I'll merge when tests are passing.

@rdblue
Copy link
Contributor

rdblue commented Apr 25, 2025

I'm re-running the failed test run because I think it may have just been a flaky test.

@aihuaxu
Copy link
Contributor Author

aihuaxu commented Apr 25, 2025

I'm re-running the failed test run because I think it may have just been a flaky test.

Thanks. Yeah. This is flaky test. It was passing locally and seems sometimes it may take more retries.

@rdblue rdblue merged commit 22d194f into apache:main Apr 25, 2025
42 checks passed
@rdblue
Copy link
Contributor

rdblue commented Apr 25, 2025

Merged. Thanks, @aihuaxu! Nice work.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants