[SPARK-6429] Implement hashCode and equals together #12157

joan38 · 2016-04-04T21:52:08Z

What changes were proposed in this pull request?

Implement some hashCode and equals together in order to enable the scalastyle.
This is a first batch, I will continue to implement them but I wanted to know your thoughts.

SparkQA · 2016-04-04T21:59:00Z

Test build #54897 has finished for PR 12157 at commit aefff62.

This patch fails Scala style tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2016-04-04T22:22:29Z

Test build #54899 has finished for PR 12157 at commit 6681b0e.

This patch fails to build.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2016-04-04T22:59:10Z

Test build #54904 has finished for PR 12157 at commit d867d13.

This patch fails MiMa tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2016-04-05T00:04:18Z

Test build #54911 has finished for PR 12157 at commit 8ca6d43.

This patch fails MiMa tests.
This patch merges cleanly.
This patch adds no public classes.

srowen · 2016-04-05T01:28:42Z

core/src/main/scala/org/apache/spark/Partition.scala

Hm, super.equals() delegates to the default in Object which requires reference equality. I don't think we can have that. Although defining these in an abstract class is dicey, I agree it should go hand in hand with hashCode at least and should just define equality based on index.

Actually is there any subclass that relies on this default implementation? If so, I think it also needs to check its own class vs the class of the argument. If not, we could remove this.

If so, do you mean using the canEqual approach?
If not, do you mean removing both equals and hashCode then?

If all the subclasses override these methods (and some implement some custom logic), then this isn't used, and maybe it's simpler to omit it. If this stays, yes, you're right that it really has to check the class of itself vs the argument too.

srowen · 2016-04-05T01:33:16Z

This doesn't yet enable a style check for this right?

joan38 · 2016-04-05T10:02:19Z

Not yet. I wanted to have some thoughts first before I bother implementing the wrong way everywhere.
I will push a new version soon with your comments and more (if not the rest).
Once all done I will push with the style check enabled.

srowen · 2016-04-12T08:05:02Z

@joan38 what do you think about moving forward with the style check, and at least the changes that are uncontroversial here? some of these are good fixes.

joan38 · 2016-04-12T21:29:18Z

Sure, I was busy with another PR.
Do you want to give up on all Partition subtypes also or this is good as per commit 87e3be0 ?

SparkQA · 2016-04-12T21:33:48Z

Test build #55649 has finished for PR 12157 at commit 45e816a.

This patch fails Scala style tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2016-04-12T22:41:14Z

Test build #55656 has finished for PR 12157 at commit 87e3be0.

This patch fails to build.
This patch merges cleanly.
This patch adds no public classes.

srowen · 2016-04-13T06:18:26Z

I'm wary of giving equals semantics to partitions, since the current semantics in this commit seem incorrect: partition 13 from one RDD is not equal to partition 13 from another. Since it's not technically wrong to implement hashCode without equals, seems like we can be conservative and not make those changes. Adding hashCode is good, as is a style check if possible, as are the other changes.

SparkQA · 2016-04-13T08:25:19Z

Test build #55703 has finished for PR 12157 at commit 9e8085d.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2016-04-13T08:50:19Z

Test build #55704 has finished for PR 12157 at commit ebc512b.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2016-04-13T09:25:15Z

Test build #55706 has finished for PR 12157 at commit 650ae02.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

srowen · 2016-04-13T15:09:29Z

sql/core/src/main/scala/org/apache/spark/sql/test/ExamplePointUDT.scala

I don't really mind this, but I think this is overkill when there are 2-3 fields. This could be 31 * x.hashCode() + y.hashCode()

srowen · 2016-04-13T15:09:54Z

Jenkins retest this please

SparkQA · 2016-04-13T15:30:29Z

Test build #55718 has finished for PR 12157 at commit 650ae02.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

srowen · 2016-04-13T15:36:25Z

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/literals.scala

I think the test failure is due to this patch, and it's an NPE somewhere. It could be because a field you're using in a hashCode() is null, and it seems like can be the case here. Instead of using .hashCode(), use Objects.hashCode which handles null.

Good point, I will fix that and rerun CI.

srowen · 2016-04-18T16:48:13Z

core/src/test/scala/org/apache/spark/scheduler/CustomShuffledRDD.scala

This could have been left as an import but that alone isn't so worth changing

srowen · 2016-04-18T16:58:52Z

I think it's a good change because it lets us enforce the fairly important practice of defining equals/hashCode together. This actually forces it to be explicit in all cases where either of the two is defined, which is IMHO a good thing, as it's something that's easy to get subtly wrong.

The new equals() methods don't change behavior; existing hashCode() methods have the same behavior; new hashCode() methods look consistent with equals(). And tests pass. That LGTM.

My remaining comments are just a nit about implementation of the hash codes, and multiplying by a prime number.

SparkQA · 2016-04-18T22:07:40Z

Test build #56104 has finished for PR 12157 at commit eb5615f.

This patch fails from timeout after a configured wait of 250m.
This patch merges cleanly.
This patch adds no public classes.

joan38 · 2016-04-18T22:50:58Z

Jenkins retest this please

SparkQA · 2016-04-19T00:30:10Z

Test build #56152 has finished for PR 12157 at commit eb5615f.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2016-04-19T02:04:36Z

Test build #2824 has finished for PR 12157 at commit eb5615f.

This patch fails MiMa tests.
This patch merges cleanly.
This patch adds no public classes.

joan38 · 2016-04-19T08:44:49Z

[error] (streaming-flume-sink/*:mimaFindBinaryIssues) java.lang.ArrayIndexOutOfBoundsException: 1497

Jenkins retest this please

srowen · 2016-04-19T10:11:03Z

Jenkins retest this please

SparkQA · 2016-04-19T12:10:14Z

Test build #56217 has finished for PR 12157 at commit eb5615f.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

joan38 · 2016-04-19T12:41:28Z

@srowen Thanks. All good

srowen · 2016-04-20T10:33:49Z

I think we've got just two more things to change: a) a rebase, and b) using prime numbers as multipliers everywhere. I can't see anything else then.

SparkQA · 2016-04-20T19:24:07Z

Test build #56396 has finished for PR 12157 at commit 02b397e.

This patch fails Scala style tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2016-04-21T03:27:13Z

Test build #56451 has finished for PR 12157 at commit 8ce5135.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

joan38 · 2016-04-21T08:50:50Z

yarn/src/test/scala/org/apache/spark/deploy/yarn/YarnAllocatorSuite.scala

  }

  class MockSplitInfo(host: String) extends SplitInfo(null, host, null, 1, null) {
+    override def hashCode(): Int = Random.nextInt()


I've added this so that it matches the equals behaviour.

This is wrong now though. hashCode has to be deterministic and always return the same value. There is nothing wrong with always returning 0. The problem is actually with the equals method, but, it won't matter here.

SparkQA · 2016-04-21T12:04:00Z

Test build #56525 has finished for PR 12157 at commit f11b112.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

srowen · 2016-04-21T13:00:42Z

yarn/src/test/scala/org/apache/spark/deploy/yarn/YarnAllocatorSuite.scala


  class MockSplitInfo(host: String) extends SplitInfo(null, host, null, 1, null) {
-    override def hashCode(): Int = Random.nextInt()
+    override def hashCode(): Int = 0


Since you need to re-run the tests anyway -- also remove the unneeded import of scala.util.Random now

SparkQA · 2016-04-21T15:24:44Z

Test build #56538 has finished for PR 12157 at commit ba5633c.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

srowen · 2016-04-21T15:28:32Z

LGTM. Thanks for sticking with it. If there are no more comments today I'll merge.

srowen · 2016-04-22T11:24:31Z

Merged to master

joan38 force-pushed the SPARK-6429-HashCode-Equals branch from d276daa to aefff62 Compare April 4, 2016 21:55

joan38 force-pushed the SPARK-6429-HashCode-Equals branch from aefff62 to 6681b0e Compare April 4, 2016 22:14

joan38 force-pushed the SPARK-6429-HashCode-Equals branch from 6681b0e to d867d13 Compare April 4, 2016 22:46

joan38 force-pushed the SPARK-6429-HashCode-Equals branch from d867d13 to 8ca6d43 Compare April 4, 2016 23:51

srowen reviewed Apr 5, 2016
View reviewed changes

joan38 force-pushed the SPARK-6429-HashCode-Equals branch from 8ca6d43 to 45e816a Compare April 12, 2016 21:27

joan38 force-pushed the SPARK-6429-HashCode-Equals branch from 45e816a to 87e3be0 Compare April 12, 2016 22:32

joan38 force-pushed the SPARK-6429-HashCode-Equals branch from 87e3be0 to 9e8085d Compare April 13, 2016 08:07

srowen reviewed Apr 13, 2016
View reviewed changes

srowen reviewed Apr 18, 2016
View reviewed changes

joan38 force-pushed the SPARK-6429-HashCode-Equals branch from 58b799e to eb5615f Compare April 18, 2016 17:52

joan38 force-pushed the SPARK-6429-HashCode-Equals branch from eb5615f to 02b397e Compare April 20, 2016 19:20

Enable scalastyle for implement hashCode and equals together

8ce5135

joan38 force-pushed the SPARK-6429-HashCode-Equals branch from 02b397e to 8ce5135 Compare April 21, 2016 01:08

joan38 reviewed Apr 21, 2016
View reviewed changes

Back hashCode = 0 on MockSplitInfo

f11b112

srowen reviewed Apr 21, 2016
View reviewed changes

Remove Random object import

ba5633c

asfgit closed this in bf95b8d Apr 22, 2016

[SPARK-6429] Implement hashCode and equals together #12157

[SPARK-6429] Implement hashCode and equals together #12157

Uh oh!

Conversation

joan38 commented Apr 4, 2016

What changes were proposed in this pull request?

Uh oh!

SparkQA commented Apr 4, 2016

Uh oh!

SparkQA commented Apr 4, 2016

Uh oh!

SparkQA commented Apr 4, 2016

Uh oh!

SparkQA commented Apr 5, 2016

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

srowen commented Apr 5, 2016

Uh oh!

joan38 commented Apr 5, 2016

Uh oh!

srowen commented Apr 12, 2016

Uh oh!

joan38 commented Apr 12, 2016

Uh oh!

SparkQA commented Apr 12, 2016

Uh oh!

SparkQA commented Apr 12, 2016

Uh oh!

srowen commented Apr 13, 2016

Uh oh!

SparkQA commented Apr 13, 2016

Uh oh!

SparkQA commented Apr 13, 2016

Uh oh!

SparkQA commented Apr 13, 2016

Uh oh!

Choose a reason for hiding this comment

Uh oh!

srowen commented Apr 13, 2016

Uh oh!

SparkQA commented Apr 13, 2016

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

srowen commented Apr 18, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

SparkQA commented Apr 18, 2016

Uh oh!

joan38 commented Apr 18, 2016

Uh oh!

SparkQA commented Apr 19, 2016

Uh oh!

SparkQA commented Apr 19, 2016

Uh oh!

joan38 commented Apr 19, 2016

Uh oh!

srowen commented Apr 19, 2016

Uh oh!

SparkQA commented Apr 19, 2016

Uh oh!

joan38 commented Apr 19, 2016

Uh oh!

srowen commented Apr 20, 2016

Uh oh!

SparkQA commented Apr 20, 2016

Uh oh!

SparkQA commented Apr 21, 2016

Uh oh!

srowen commented Apr 18, 2016 •

edited

Loading