-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-19745][ML] SVCAggregator captures coefficients in its closure #17076
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note this eliminates the check for sparse coefficients, but I'm not sure it was ever necessary. We don't do it in other aggregators, this is a private class so we don't need to worry about people misusing it. Appreciate other thoughts.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We used to check it I think? At some point there was a BLAS operation used that only worked for dense vectors. I think during all the linear model refactor for 2.0/2.1 that was eliminated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I changed this up a bit. I think using the check but marking the array as @transient lazy val is slightly more explicit about what we are trying to avoid here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Race condition with my review below...
Thanks for anticipating it, please ignore it then :-)
|
Jenkins test this please. |
AnthonyTruchet
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok for me, just a request for a check.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not very confident in my understanding of @Transcient annotation when comined with lazy but I would expect it here. Can s.o. more versed confirm or infirm ?
http://fdahms.com/2015/10/14/scala-and-the-transient-lazy-val-pattern/
http://stackoverflow.com/questions/34769220/difference-when-serializing-a-lazy-val-with-or-without-transient (unanswered)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually this question is slightly different than what I was referring to above. We don't use @transient here because we do need to serialize this when we send the gradient updates back to the driver. The reason for making it lazy is because we don't need to serialize the array of zeros. We can just initialize it on the workers and avoid the communication cost.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for caring to explain :-)
|
Test build #73505 has finished for PR 17076 at commit
|
|
@sethah a quick glance at the screenshots seems to indicate the processing time went up? Which seems a bit odd. Of course it's a small test so maybe just noise. |
|
Those screenshots were only meant to show the shuffle write data. I'm not even sure the data sizes (number of rows) were the same, only the number of columns, which determines the shuffle write. I can do some performance testing if we think it's necessary. |
|
Ah ok, makes sense. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So we are moving the dimension check here yes? Seems better than at L446 in fact
|
Looks fine to me - @sethah does the |
|
@sethah This is a nice optimization! Small suggestion -- For the shuffle write performance improvement, would it be possible to test this on a larger dataset to highlight the improvements and also to verify that other performance characteristics did not change (eg the duration, which @MLnick mentioned went up very slightly, although it seems due to a good reason, #rows was different)? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should the extra require be added:
require(weight >= 0.0, s"instance weight, $weight has to be >= 0.0")
similar to LogisticRegression.scala L1587 (in add method), since this now is more similar to the LogisticAggregator.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes good catch - LoR and LinR both have this check.
|
@MLnick I checked LinearRegression.scala, LogisticRegression.scala, and AFTSurvivalRegression.scala, and all of them already do this. Are there any other aggregators with a similar code structure? |
|
@sethah Thanks for the good catch. I verified this optimization and found it indeed reduced the size of shuffle data. This looks good to me. BTW, like @MLnick 's suggestion, could you add the lazy evaluation for gradient array to all other aggregators in this PR? Since it's little change, I'd prefer to modify it here. Thanks. |
|
@imatiach-msft |
|
+1 @MLnick Three lines change, updated here? |
|
@yanboliang yeah I agree we can do it in this PR. |
|
@MLnick sorry, you're right, I somehow missed that the "lazy" was missing on gradientSumArray, would be great to update them as well |
|
Thanks all for the review. I'll make the updates today. |
|
Updated. Well I was able to verify that adding the lazy val does make the task binary broadcast variable smaller - i.e. it does not vary with the size of the features. We could potentially test this by serializing the closure and checking the size, but that might be for another JIRA or we can just leave it. We have not done things like this in the past (then again, we keep running into this issue). We could also follow the pattern used in |
|
I took another brief look, the updates look great, thank you |
|
Test build #73794 has finished for PR 17076 at commit
|
|
Test build #73792 has finished for PR 17076 at commit
|
|
The failure is misinformation, I'll merge this into master, thanks for all. |
What changes were proposed in this pull request?
JIRA: SPARK-19745
Reorganize SVCAggregator to avoid serializing coefficients. This patch also makes the gradient array a
lazy valwhich will avoid materializing a large array on the driver before shipping the class to the executors. This improvement stems from #16037. Actually, probably all ML aggregators can benefit from this.We can either: a.) separate the gradient improvement into another patch b.) keep what's here plus add the lazy evaluation to all other aggregators in this patch or c.) keep it as is.
How was this patch tested?
This is an interesting question! I don't know of a reasonable way to test this right now. Ideally, we could perform an optimization and look at the shuffle write data for each task, and we could compare the size to what it we know it should be:
numCoefficients * 8 bytes. Not sure if there is a good way to do that right now? We could discuss this here or in another JIRA, but I suspect it would be a significant undertaking.