Skip to content

Conversation

@SinghAsDev
Copy link
Contributor

@SinghAsDev SinghAsDev commented Dec 19, 2021

Add support to read Parquet files written with old 2-level list structures. This should resolve #3759.

@SinghAsDev
Copy link
Contributor Author

@rdblue @RussellSpitzer can you help review this? Thanks

@jackye1995
Copy link
Contributor

3773 is merged, could you rebase?

@SinghAsDev
Copy link
Contributor Author

3773 is merged, could you rebase?

@jackye1995 done

@wizardxz
Copy link

wizardxz commented Jan 4, 2022

LGTM

@SinghAsDev
Copy link
Contributor Author

@jackye1995 @RussellSpitzer @kbendick @rdblue can I get a review on this, thanks

…et-thrift. Also rename ParquetSchemaUtil.isListElementType to ParquetSchemaUtil.isOldListElementType
Copy link
Member

@RussellSpitzer RussellSpitzer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me, but I'm not a parquet expert so @rdblue could you also take a quick look before we merge?

T elementResult = null;
if (repeatedElement.getFieldCount() > 0) {
Type elementField = repeatedElement.getType(0);
if (repeatedElement.isPrimitive() || repeatedElement.asGroupType().getFieldCount() > 0) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The changes to this class are inconsistent with the changes to the TypeWithSchemaVisitor. Here, the repeated element is always visited (beforeRepeatedElement call above) and may be processed again as the element. The other avoids pushing the name on the stack. If beforeRepeatedElement were used to track names, I think it would get a duplicate name for the repeated group.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, addressed this. It would be nice to have some tests to check this behavior. But, I don't think we need to block on that, unless you disagree.

private static <T> T visitTwoLevelList(Types.ListType iListType, Types.NestedField iListElement, GroupType pListType,
Type pListElement, TypeWithSchemaVisitor<T> visitor) {
T elementResult = visitField(iListElement, pListElement, visitor);

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Style: extra newline. Also, we either wrap argument lists at the same level or start all arguments on the next line at 2 indents.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the style used in various parts of code are different. For example, IIUC ParquetReadSupport.prepareForRead is different than what you are saying. Earlier you also had mentioned Iceberg does not use new param at new line pattern. Updating this part to keep the same level (align params start with previous line) and wrap.

Let me know which style we should try to follow and I can try to update the intellij-style that we provide with Iceberg repo accordingly. I don't know if it is possible, but I can try.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't worry about where still is incorrect in other places. We'll eventually track those down and fix them, but we do want to keep style from diverging when it is caught by a review. So please do fix this.

Copy link
Contributor Author

@SinghAsDev SinghAsDev Jan 30, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should already be fixed. Does the update still have style issue?

@rdblue
Copy link
Contributor

rdblue commented Jan 28, 2022

@SinghAsDev, I did another round of review. This also still needs to update Iceberg generics, I think.

@SinghAsDev
Copy link
Contributor Author

@SinghAsDev, I did another round of review. This also still needs to update Iceberg generics, I think.

Thanks for thorough reviews @rdblue ! I thought I already covered the Iceberg generic with this change. Am I missing something?

@rdblue
Copy link
Contributor

rdblue commented Jan 30, 2022

@SinghAsDev, looks like it is missing a test for Iceberg generics. I see the update for it now. Thanks!

@github-actions github-actions bot added the data label Jan 31, 2022
…ment needs to be popped out after visiting list.
@rdblue rdblue merged commit 6644393 into apache:master Feb 1, 2022
@rdblue
Copy link
Contributor

rdblue commented Feb 1, 2022

Thanks, @SinghAsDev!

@SinghAsDev SinghAsDev deleted the pq_compat branch February 1, 2022 21:26
SinghAsDev added a commit to pinterest/iceberg that referenced this pull request Apr 8, 2022
sunchao pushed a commit to sunchao/iceberg that referenced this pull request May 9, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add backwards compatibility read support for Parquet list

6 participants