-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-6429] Implement hashCode and equals together #12157
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
d276daa to
aefff62
Compare
|
Test build #54897 has finished for PR 12157 at commit
|
aefff62 to
6681b0e
Compare
|
Test build #54899 has finished for PR 12157 at commit
|
6681b0e to
d867d13
Compare
|
Test build #54904 has finished for PR 12157 at commit
|
d867d13 to
8ca6d43
Compare
|
Test build #54911 has finished for PR 12157 at commit
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, super.equals() delegates to the default in Object which requires reference equality. I don't think we can have that. Although defining these in an abstract class is dicey, I agree it should go hand in hand with hashCode at least and should just define equality based on index.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually is there any subclass that relies on this default implementation? If so, I think it also needs to check its own class vs the class of the argument. If not, we could remove this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If so, do you mean using the canEqual approach?
If not, do you mean removing both equals and hashCode then?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If all the subclasses override these methods (and some implement some custom logic), then this isn't used, and maybe it's simpler to omit it. If this stays, yes, you're right that it really has to check the class of itself vs the argument too.
|
This doesn't yet enable a style check for this right? |
|
Not yet. I wanted to have some thoughts first before I bother implementing the wrong way everywhere. |
|
@joan38 what do you think about moving forward with the style check, and at least the changes that are uncontroversial here? some of these are good fixes. |
8ca6d43 to
45e816a
Compare
|
Sure, I was busy with another PR. |
|
Test build #55649 has finished for PR 12157 at commit
|
45e816a to
87e3be0
Compare
|
Test build #55656 has finished for PR 12157 at commit
|
|
I'm wary of giving equals semantics to partitions, since the current semantics in this commit seem incorrect: partition 13 from one RDD is not equal to partition 13 from another. Since it's not technically wrong to implement |
87e3be0 to
9e8085d
Compare
|
Test build #55703 has finished for PR 12157 at commit
|
|
Test build #55704 has finished for PR 12157 at commit
|
|
Test build #55706 has finished for PR 12157 at commit
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't really mind this, but I think this is overkill when there are 2-3 fields. This could be 31 * x.hashCode() + y.hashCode()
|
Jenkins retest this please |
|
Test build #55718 has finished for PR 12157 at commit
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the test failure is due to this patch, and it's an NPE somewhere. It could be because a field you're using in a hashCode() is null, and it seems like can be the case here. Instead of using .hashCode(), use Objects.hashCode which handles null.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point, I will fix that and rerun CI.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This could have been left as an import but that alone isn't so worth changing
|
I think it's a good change because it lets us enforce the fairly important practice of defining equals/hashCode together. This actually forces it to be explicit in all cases where either of the two is defined, which is IMHO a good thing, as it's something that's easy to get subtly wrong. The new equals() methods don't change behavior; existing hashCode() methods have the same behavior; new hashCode() methods look consistent with equals(). And tests pass. That LGTM. My remaining comments are just a nit about implementation of the hash codes, and multiplying by a prime number. |
58b799e to
eb5615f
Compare
|
Test build #56104 has finished for PR 12157 at commit
|
|
Jenkins retest this please |
|
Test build #56152 has finished for PR 12157 at commit
|
|
Test build #2824 has finished for PR 12157 at commit
|
Jenkins retest this please |
|
Jenkins retest this please |
|
Test build #56217 has finished for PR 12157 at commit
|
|
@srowen Thanks. All good |
|
I think we've got just two more things to change: a) a rebase, and b) using prime numbers as multipliers everywhere. I can't see anything else then. |
eb5615f to
02b397e
Compare
|
Test build #56396 has finished for PR 12157 at commit
|
02b397e to
8ce5135
Compare
|
Test build #56451 has finished for PR 12157 at commit
|
| } | ||
|
|
||
| class MockSplitInfo(host: String) extends SplitInfo(null, host, null, 1, null) { | ||
| override def hashCode(): Int = Random.nextInt() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've added this so that it matches the equals behaviour.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is wrong now though. hashCode has to be deterministic and always return the same value. There is nothing wrong with always returning 0. The problem is actually with the equals method, but, it won't matter here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Make sense
|
Test build #56525 has finished for PR 12157 at commit
|
|
|
||
| class MockSplitInfo(host: String) extends SplitInfo(null, host, null, 1, null) { | ||
| override def hashCode(): Int = Random.nextInt() | ||
| override def hashCode(): Int = 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since you need to re-run the tests anyway -- also remove the unneeded import of scala.util.Random now
|
Test build #56538 has finished for PR 12157 at commit
|
|
LGTM. Thanks for sticking with it. If there are no more comments today I'll merge. |
|
Merged to master |
What changes were proposed in this pull request?
Implement some
hashCodeandequalstogether in order to enable the scalastyle.This is a first batch, I will continue to implement them but I wanted to know your thoughts.