Skip to content

Conversation

@LuciferYang
Copy link
Contributor

@LuciferYang LuciferYang commented Jun 20, 2022

What changes were proposed in this pull request?

Left ExpressionSet-- Right ExpressionSet in Scala 2.13 will use default -- of scala.collection.Set as follows:

def -- (that: IterableOnce[A]): C = fromSpecific(coll.toSet.removedAll(that))

The Left ExpressionSet will convert to a immutable.Set and the ExpressionSet cannot override the removedAll method of immutable.Set due to it's a scala.collection.Set.

So this pr override -- method for ExpressionSet in Scala 2.13 to to take over the action to ensure it return the same result as Scala 2.12.

Why are the changes needed?

Bug fix for Scala 2.13

Does this PR introduce any user-facing change?

No.

How was this patch tested?

  • Pass GA
  • Manual test
dev/change-scala-version.sh 2.13    
mvn clean install -DskipTests -pl sql/catalyst -am -Pscala-2.13
mvn test -pl  -pl sql/catalyst  -Pscala-2.13 -Dtest=none -DwildcardSuites=org.apache.spark.sql.catalyst.expressions.ExpressionSetSuite

Before

- remove multiple elements to set with non-deterministic expressions *** FAILED ***
  Set() had size 0 instead of expected size 1 (ExpressionSetSuite.scala:239)
Run completed in 4 seconds, 537 milliseconds.
Total number of tests run: 44
Suites: completed 2, aborted 0
Tests: succeeded 43, failed 1, canceled 0, ignored 0, pending 0
*** 1 TEST FAILED ***

After

Run completed in 4 seconds, 622 milliseconds.
Total number of tests run: 44
Suites: completed 2, aborted 0
Tests: succeeded 44, failed 0, canceled 0, ignored 0, pending 0
All tests passed.

@github-actions github-actions bot added the SQL label Jun 20, 2022
@LuciferYang LuciferYang changed the title [SPARK-39520][SQL] Override -- method for Scala 2.13 ExpressionSet [SPARK-39520][SQL] Override -- method for ExpressionSet in Scala 2.13 Jun 20, 2022
@LuciferYang LuciferYang changed the title [SPARK-39520][SQL] Override -- method for ExpressionSet in Scala 2.13 [SPARK-39520][SQL] Override -- method to ExpressionSet in Scala 2.13 Jun 20, 2022
@LuciferYang LuciferYang changed the title [SPARK-39520][SQL] Override -- method to ExpressionSet in Scala 2.13 [SPARK-39520][SQL] Override -- method for ExpressionSet in Scala 2.13 Jun 20, 2022
@LuciferYang
Copy link
Contributor Author

cc @HyukjinKwon

@LuciferYang
Copy link
Contributor Author

run

dev/change-scala-version.sh 2.13
mvn clean install -DskipTests -pl sql/catalyst -am -Pscala-2.13
mvn test -pl sql/catalyst -Pscala-2.13 

All test passed

Run completed in 4 minutes, 31 seconds.
Total number of tests run: 6583
Suites: completed 285, aborted 0
Tests: succeeded 6583, failed 0, canceled 1, ignored 5, pending 0
All tests passed.

that.iterator.foreach(newSet.remove)
newSet
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

++ doesn't have this issue?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, ++ no such issue and ExpressionSetSuite already covered ++

@srowen srowen closed this in e500121 Jun 20, 2022
@srowen
Copy link
Member

srowen commented Jun 20, 2022

Merged to master. It looks like this wasn't a problem in 3.3.x; this change removed the overrides:

https://issues.apache.org/jira/browse/SPARK-38836
#36121

I wonder if we will have other similar issues -- sounds like ++ is OK. CCing @minyyy

@LuciferYang
Copy link
Contributor Author

LuciferYang commented Jun 21, 2022

Merged to master. It looks like this wasn't a problem in 3.3.x; this change removed the overrides:

https://issues.apache.org/jira/browse/SPARK-38836 #36121

I wonder if we will have other similar issues -- sounds like ++ is OK. CCing @minyyy

The default implementation of ++ reuse concat:

def concat(that: collection.IterableOnce[A]): C = fromSpecific(that match {
    case that: collection.Iterable[A] => new View.Concat(this, that)
    case _ => iterator.concat(that.iterator)
  })

So the semantics of ++ should be ok, but if override concat method as follows, maybe a little quick than default implementation in Scala 2.13:

 override def concat(that: collection.IterableOnce[Expression]): ExpressionSet = {
    val newSet = clone()
    that.iterator.foreach(newSet.add)
    newSet
  }

for example:

val initialSet = ExpressionSet(aUpper + 1 :: Rand(0) :: Nil)
val setToAddWithSameDeterministicExpression = ExpressionSet(aUpper + 1 :: Rand(0) :: Nil)
benchmark.addCase("Test ++") { _: Int =>
    for (_ <- 0L until 100000) {
       setToAddWithSameDeterministicExpression ++ initialSet
     }
   }

default implementation:

OpenJDK 64-Bit Server VM 1.8.0_322-b06 on Mac OS X 11.4
Apple M1
Test ExpressionSet ++ :                   Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
------------------------------------------------------------------------------------------------------------------------
Test ++                                              10             10           0         10.3          96.7       1.0X

override concat:

OpenJDK 64-Bit Server VM 1.8.0_322-b06 on Mac OS X 11.4
Apple M1
Test ExpressionSet ++ :                   Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
------------------------------------------------------------------------------------------------------------------------
Test ++                                               7              7           0         15.0          66.5       1.0X

@srowen
Copy link
Member

srowen commented Jun 21, 2022

Other things were removed, like the override of union(). May be fine, just now seems to be worth another look

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants