[SPARK-19745][ML] SVCAggregator captures coefficients in its closure #17076

sethah · 2017-02-27T04:37:33Z

What changes were proposed in this pull request?

JIRA: SPARK-19745

Reorganize SVCAggregator to avoid serializing coefficients. This patch also makes the gradient array a lazy val which will avoid materializing a large array on the driver before shipping the class to the executors. This improvement stems from #16037. Actually, probably all ML aggregators can benefit from this.

We can either: a.) separate the gradient improvement into another patch b.) keep what's here plus add the lazy evaluation to all other aggregators in this patch or c.) keep it as is.

How was this patch tested?

This is an interesting question! I don't know of a reasonable way to test this right now. Ideally, we could perform an optimization and look at the shuffle write data for each task, and we could compare the size to what it we know it should be: numCoefficients * 8 bytes. Not sure if there is a good way to do that right now? We could discuss this here or in another JIRA, but I suspect it would be a significant undertaking.

sethah · 2017-02-27T05:29:39Z

ping @yanboliang @AnthonyTruchet

sethah · 2017-02-27T05:39:01Z

mllib/src/main/scala/org/apache/spark/ml/classification/LinearSVC.scala

Note this eliminates the check for sparse coefficients, but I'm not sure it was ever necessary. We don't do it in other aggregators, this is a private class so we don't need to worry about people misusing it. Appreciate other thoughts.

We used to check it I think? At some point there was a BLAS operation used that only worked for dense vectors. I think during all the linear model refactor for 2.0/2.1 that was eliminated

I changed this up a bit. I think using the check but marking the array as @transient lazy val is slightly more explicit about what we are trying to avoid here.

Race condition with my review below...
Thanks for anticipating it, please ignore it then :-)

sethah · 2017-02-27T16:33:55Z

Jenkins test this please.

AnthonyTruchet

Ok for me, just a request for a check.

AnthonyTruchet · 2017-02-27T16:35:32Z

mllib/src/main/scala/org/apache/spark/ml/classification/LinearSVC.scala

I'm not very confident in my understanding of @Transcient annotation when comined with lazy but I would expect it here. Can s.o. more versed confirm or infirm ?

http://fdahms.com/2015/10/14/scala-and-the-transient-lazy-val-pattern/
http://stackoverflow.com/questions/34769220/difference-when-serializing-a-lazy-val-with-or-without-transient (unanswered)

Actually this question is slightly different than what I was referring to above. We don't use @transient here because we do need to serialize this when we send the gradient updates back to the driver. The reason for making it lazy is because we don't need to serialize the array of zeros. We can just initialize it on the workers and avoid the communication cost.

Thanks for caring to explain :-)

SparkQA · 2017-02-27T17:46:36Z

Test build #73505 has finished for PR 17076 at commit 7f22675.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

MLnick · 2017-02-28T12:04:49Z

@sethah a quick glance at the screenshots seems to indicate the processing time went up? Which seems a bit odd. Of course it's a small test so maybe just noise.

sethah · 2017-02-28T16:01:12Z

Those screenshots were only meant to show the shuffle write data. I'm not even sure the data sizes (number of rows) were the same, only the number of columns, which determines the shuffle write. I can do some performance testing if we think it's necessary.

MLnick · 2017-03-01T18:44:47Z

Ah ok, makes sense.

MLnick · 2017-03-01T19:09:52Z

mllib/src/main/scala/org/apache/spark/ml/classification/LinearSVC.scala

So we are moving the dimension check here yes? Seems better than at L446 in fact

MLnick · 2017-03-01T19:16:01Z

Looks fine to me - @sethah does the lazy val for gradient array impact LoR & Linear regression aggregators too? Any others? Since this is a fairly compact change, I think we can maybe go with your option (b) and do all the aggregators here in this PR.

imatiach-msft · 2017-03-01T19:25:24Z

@sethah This is a nice optimization! Small suggestion -- For the shuffle write performance improvement, would it be possible to test this on a larger dataset to highlight the improvements and also to verify that other performance characteristics did not change (eg the duration, which @MLnick mentioned went up very slightly, although it seems due to a good reason, #rows was different)?

imatiach-msft · 2017-03-01T20:14:06Z

mllib/src/main/scala/org/apache/spark/ml/classification/LinearSVC.scala

should the extra require be added:

require(weight >= 0.0, s"instance weight, $weight has to be >= 0.0")

similar to LogisticRegression.scala L1587 (in add method), since this now is more similar to the LogisticAggregator.

Yes good catch - LoR and LinR both have this check.

imatiach-msft · 2017-03-01T20:21:24Z

@MLnick I checked LinearRegression.scala, LogisticRegression.scala, and AFTSurvivalRegression.scala, and all of them already do this. Are there any other aggregators with a similar code structure?

yanboliang · 2017-03-02T06:45:09Z

@sethah Thanks for the good catch. I verified this optimization and found it indeed reduced the size of shuffle data. This looks good to me. BTW, like @MLnick 's suggestion, could you add the lazy evaluation for gradient array to all other aggregators in this PR? Since it's little change, I'd prefer to modify it here. Thanks.

MLnick · 2017-03-02T07:20:12Z

@imatiach-msft LinearRegression, LogisticRegression and AFTSurvivalRegression do not have the lazy - they only do private val gradientSumArray ... so would need to be updated.

yanboliang · 2017-03-02T07:22:46Z

+1 @MLnick Three lines change, updated here?

MLnick · 2017-03-02T07:25:41Z

@yanboliang yeah I agree we can do it in this PR.

imatiach-msft · 2017-03-02T15:22:43Z

@MLnick sorry, you're right, I somehow missed that the "lazy" was missing on gradientSumArray, would be great to update them as well

sethah · 2017-03-02T15:58:23Z

Thanks all for the review. I'll make the updates today.

sethah · 2017-03-03T00:39:23Z

Updated. Well I was able to verify that adding the lazy val does make the task binary broadcast variable smaller - i.e. it does not vary with the size of the features. We could potentially test this by serializing the closure and checking the size, but that might be for another JIRA or we can just leave it. We have not done things like this in the past (then again, we keep running into this issue).

We could also follow the pattern used in WeightedLeastSquares.Aggregator which initializes every aggregation variable to it's default value and then explicitly calls an init method. This is a bit clearer, but only a bit. Maybe we can discuss it further in #17094 which is partly rethinking this interface anyway.

imatiach-msft · 2017-03-03T00:47:23Z

I took another brief look, the updates look great, thank you

SparkQA · 2017-03-03T01:29:27Z

Test build #73794 has finished for PR 17076 at commit 19be223.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2017-03-03T01:43:22Z

Test build #73792 has finished for PR 17076 at commit 207669f.

This patch fails Spark unit tests.
This patch does not merge cleanly.
This patch adds no public classes.

yanboliang · 2017-03-03T03:35:33Z

The failure is misinformation, I'll merge this into master, thanks for all.

sethah commented Feb 27, 2017

View reviewed changes

AnthonyTruchet reviewed Feb 27, 2017

View reviewed changes

MLnick reviewed Mar 1, 2017

View reviewed changes

imatiach-msft reviewed Mar 1, 2017

View reviewed changes

imatiach-msft approved these changes Mar 1, 2017

View reviewed changes

srowen approved these changes Mar 2, 2017

View reviewed changes

sethah added 4 commits March 2, 2017 16:33

performance cleanup in svc agg

232625f

add sparse coeff test

59583fa

style on imports

961fbaa

lazy val on other aggregators

19be223

sethah force-pushed the svc_agg branch from 207669f to 19be223 Compare March 3, 2017 00:34

asfgit closed this in 93ae176 Mar 3, 2017

[SPARK-19745][ML] SVCAggregator captures coefficients in its closure #17076

[SPARK-19745][ML] SVCAggregator captures coefficients in its closure #17076

Uh oh!

Conversation

sethah commented Feb 27, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

How was this patch tested?

Uh oh!

sethah commented Feb 27, 2017

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

AnthonyTruchet Feb 27, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

sethah commented Feb 27, 2017

Uh oh!

AnthonyTruchet left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

SparkQA commented Feb 27, 2017

Uh oh!

MLnick commented Feb 28, 2017

Uh oh!

sethah commented Feb 28, 2017

Uh oh!

MLnick commented Mar 1, 2017

Uh oh!

Choose a reason for hiding this comment

Uh oh!

MLnick commented Mar 1, 2017

Uh oh!

imatiach-msft commented Mar 1, 2017

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

imatiach-msft commented Mar 1, 2017

Uh oh!

yanboliang commented Mar 2, 2017

Uh oh!

MLnick commented Mar 2, 2017

Uh oh!

yanboliang commented Mar 2, 2017

Uh oh!

MLnick commented Mar 2, 2017

Uh oh!

imatiach-msft commented Mar 2, 2017

Uh oh!

sethah commented Mar 2, 2017

Uh oh!

sethah commented Mar 3, 2017

Uh oh!

imatiach-msft commented Mar 3, 2017

Uh oh!

SparkQA commented Mar 3, 2017

Uh oh!

SparkQA commented Mar 3, 2017

Uh oh!

yanboliang commented Mar 3, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

sethah commented Feb 27, 2017 •

edited

Loading

AnthonyTruchet Feb 27, 2017 •

edited

Loading

yanboliang commented Mar 3, 2017 •

edited

Loading