Skip to content

Conversation

@rdblue
Copy link
Contributor

@rdblue rdblue commented Aug 7, 2020

StructLikeWrapper is used to provide equals and hashCode methods that are consistent across all StructLike implementations. But, the implementation did not handle different implementations of CharSequence correctly. This fixes the implementations by adding a Comparator implementation for structs, and adding classes to implement Java's hashCode. These implementations depend on the Iceberg type of an object.

The type of a struct is not always passed to StructLikeWrapper, so this includes a temporary work-around to use the existing implementation for code paths that do not currently have a struct's type.

This also adds two helper classes, StructLikeSet and PartitionSet. The struct set hides StructLikeWrapper from callers and is based on CharSequenceSet. PartitionSet is a set implementation for partitions, which are identified by a partition spec id and a partition tuple. PartitionSet is needed because there are some cases where a partition tuple for different specs is identical. Updating existing classes to use these will be done in a follow-up PR.

}

static class PartitionSet {
static class PartitionMap {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was renamed to avoid a conflict with the new PartitionSet class, and updated to use StructLikeWrapper with correct equals/hashCode implementations.

@rdblue rdblue mentioned this pull request Aug 7, 2020
@aokolnychyi
Copy link
Contributor

Let me take a look right now.

Copy link
Contributor

@aokolnychyi aokolnychyi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The change looks good to me, just few questions for my understanding.

How do we plan to test this out?

if (structHash != null) {
this.hashCode = structHash.hash(struct);
} else {
int result = 97;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this the work-around when we don't have a type? In which case does this happen?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. This is a work-around for when there is no type, so that the current code paths don't fail. The follow-up commit removes this after adding specId to DataFile.

@rdblue
Copy link
Contributor Author

rdblue commented Aug 10, 2020

Thanks for reviewing, @aokolnychyi! Tests are passing now, so I'll merge.

@rdblue rdblue merged commit 9e616a1 into apache:master Aug 10, 2020
cmathiesen pushed a commit to ExpediaGroup/iceberg that referenced this pull request Aug 19, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants