Skip to content

Use ID-based mapping for struct fields in Iceberg Parquet reader#9104

Closed
findepi wants to merge 4 commits intotrinodb:masterfrom
findepi:findepi/parquet
Closed

Use ID-based mapping for struct fields in Iceberg Parquet reader#9104
findepi wants to merge 4 commits intotrinodb:masterfrom
findepi:findepi/parquet

Conversation

@findepi
Copy link
Copy Markdown
Member

@findepi findepi commented Sep 3, 2021

Fixes #8750

@findepi
Copy link
Copy Markdown
Member Author

findepi commented Sep 3, 2021

Somewhat related to #6520 which covered ORC.
Conflicts with #8129.

Copy link
Copy Markdown
Member

@alexjo2144 alexjo2144 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A couple initial thoughts

Copy link
Copy Markdown
Member

@phd3 phd3 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@findepi
Copy link
Copy Markdown
Member Author

findepi commented Sep 6, 2021

AC

@findepi findepi requested review from alexjo2144 and phd3 September 6, 2021 08:48
List<Type> types = type.getTypeParameters();
if (groupColumnIO.getChildrenCount() != 1) {
Optional<Field> field = getArrayElementField(context, groupColumnIO);
if (field.isEmpty()) {
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that this refactor isn't absolutely side-effect free.
Previously, we would bail out when groupColumnIO has child count != 1.
And then if constructField returned empty, we would return new GroupField with single empty element.

After the change, we return empty in both cases.

The first attempt at the refactor (the one with more abstract methods, see #9104 (comment)) avoided this situation, but wasn't pretty. Alternatively, #9124 avoids this too, by splitting ParquetColumnIOConverter instead of reusing it.

@findepi findepi closed this Sep 7, 2021
@findepi findepi deleted the findepi/parquet branch September 7, 2021 15:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Development

Successfully merging this pull request may close these issues.

Unexpected results when reading Iceberg Parquet table after nested field schema evolved

4 participants