[SPARK-19746][ML] Faster indexing for logistic aggregator by sethah · Pull Request #17078 · apache/spark

sethah · 2017-02-27T05:27:47Z

What changes were proposed in this pull request?

JIRA: SPARK-19746

The following code is inefficient:

    val localCoefficients: Vector = bcCoefficients.value

    features.foreachActive { (index, value) =>
      val stdValue = value / localFeaturesStd(index)
      var j = 0
      while (j < numClasses) {
        margins(j) += localCoefficients(index * numClasses + j) * stdValue
        j += 1
      }
    }

localCoefficients(index * numClasses + j) calls Vector.apply which creates a new Breeze vector and indexes that. Even if it is not that slow to create the object, we will generate a lot of extra garbage that may result in longer GC pauses. This is a hot inner loop, so we should optimize wherever possible.

How was this patch tested?

I don't think there's a great way to test this patch. It's purely performance related, so unit tests should guarantee that we haven't made any unwanted changes. Empirically I observed between 10-40% speedups just running short local tests. I suspect the big differences will be seen when large data/coefficient sizes have to pause for GC more often. I welcome other ideas for testing.

sethah · 2017-02-27T05:27:57Z

ping @dbtsai @yanboliang

dbtsai

Couple comments.

dbtsai · 2017-02-27T08:30:36Z

mllib/src/main/scala/org/apache/spark/ml/classification/LogisticRegression.scala


    val localFeaturesStd = bcFeaturesStd.value
-    val localCoefficients = bcCoefficients.value
+    val localCoefficients = bcCoefficients.value.toArray


In the first version of LOR, we have the following code which avoid this issue you pointed out.

private val weightsArray = weights match { case dv: DenseVector => dv.values case _ => throw new IllegalArgumentException( s"weights only supports dense vector but got type ${weights.getClass}.") }

I think order approach will be more efficient since toArray is only called once (you can add the case for sparse), and for sparse initial coefficients, we will not convert from sparse to dense again and again.

This can be a future work. With L1 applied, the coefficients can be very sparse, so we can compress the coefficients for each iteration, and have specialized implementation for UpdateInPlace.

The above check got us into trouble because if we don't add @transient lazy val then we'll serialize the coefficients. The call to toArray is really just a small bit of pointer indirection, and while I agree it is not great to call it every time, the extra function call should pale in comparison to the O(numClasses * numFeatures) ops we do in the method.

That said, I'm ok with either solution, just wanted to point out the pros/cons of each. Let me know what you think, thanks for reviewing!

Another thought is that by tagging the class member as @transient lazy val we are at least making this hidden problem more explicit. I think using the transient method makes it a bit less likely that someone will come along and make a change in the future that serializes the coefficients. I'll plan to update it here with the transient tag then.

My concern is that if coefficients is sparse, we are not just doing the pointer indirection but creating a new dense array from sparse matrix. I know we always pass in a dense matrix so this will not be an issue now, but being said that, in the following code, if we call compress in the coefficients, we may be able to broadcast a smaller object when L1 is applied or in the initial iteration that most of the elements in coefficients are zero.

https://github.com/sethah/spark/blob/3bea389f6780e1fd0385fbe26954fa4f59b69e37/mllib/src/main/scala/org/apache/spark/ml/classification/LogisticRegression.scala#L1674

sethah · 2017-02-27T16:34:06Z

Jenkins test this please.

SparkQA · 2017-02-27T17:46:05Z

Test build #73503 has finished for PR 17078 at commit 61d22e7.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

dbtsai · 2017-02-27T23:41:11Z

mllib/src/main/scala/org/apache/spark/ml/classification/LogisticRegression.scala


    val localFeaturesStd = bcFeaturesStd.value
-    val localCoefficients = bcCoefficients.value
+    val localCoefficients = bcCoefficients.value.toArray


My concern is that if coefficients is sparse, we are not just doing the pointer indirection but creating a new dense array from sparse matrix. I know we always pass in a dense matrix so this will not be an issue now, but being said that, in the following code, if we call compress in the coefficients, we may be able to broadcast a smaller object when L1 is applied or in the initial iteration that most of the elements in coefficients are zero.

https://github.com/sethah/spark/blob/3bea389f6780e1fd0385fbe26954fa4f59b69e37/mllib/src/main/scala/org/apache/spark/ml/classification/LogisticRegression.scala#L1674

dbtsai · 2017-02-27T23:42:43Z

mllib/src/main/scala/org/apache/spark/ml/classification/LogisticRegression.scala

  private var lossSum = 0.0

-  private val gradientSumArray = Array.fill[Double](coefficientSize)(0.0D)
+  @transient private lazy val coefficientsArray = bcCoefficients.value match {


Can you have the type of coefficientsArray so people can clear know that it's a primitive array?

Yeah, I'll update it.

dbtsai · 2017-02-27T23:44:29Z

mllib/src/test/scala/org/apache/spark/ml/classification/LogisticRegressionSuite.scala

+    val bcFeaturesStd = spark.sparkContext.broadcast(Array(1.0))
+    val binaryAgg = new LogisticAggregator(bcCoefficientsBinary, bcFeaturesStd, 2,
+      fitIntercept = true, multinomial = false)
+    val thrownBinary = withClue("binary logistic aggregator cannot handle sparse coefficients") {


I think we should handle sparse coefficients for further performance improvement. But not in this PR.

dbtsai · 2017-02-28T00:42:48Z

Thanks. Merged into master.

SparkQA · 2017-02-28T00:48:40Z

Test build #73537 has finished for PR 17078 at commit 44ee113.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

better indexing for logistic agg

3bea389

srowen approved these changes Feb 27, 2017

View reviewed changes

dbtsai requested changes Feb 27, 2017

View reviewed changes

sethah added 2 commits February 27, 2017 08:16

transient coefficients and test

b194a27

check error msg

61d22e7

dbtsai approved these changes Feb 27, 2017

View reviewed changes

dbtsai reviewed Feb 27, 2017

View reviewed changes

add type

44ee113

asfgit closed this in 16d8472 Feb 28, 2017

Conversation

sethah commented Feb 27, 2017

What changes were proposed in this pull request?

How was this patch tested?

Uh oh!

sethah commented Feb 27, 2017

Uh oh!

dbtsai left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

sethah commented Feb 27, 2017

Uh oh!

SparkQA commented Feb 27, 2017

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dbtsai commented Feb 28, 2017

Uh oh!

SparkQA commented Feb 28, 2017

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants