-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-31803][ML] Make sure instance weight is not negative #28621
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Test build #123027 has finished for PR 28621 at commit
|
|
retest this please |
|
Test build #123030 has finished for PR 28621 at commit
|
srowen
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks generally fine if this consistently checks it in one place, and does so without generating more Spark jobs.
|
|
||
| dataset.select(col($(labelCol)).cast(DoubleType), w, col($(featuresCol))).rdd.map { | ||
| case Row(label: Double, weight: Double, features: Vector) => | ||
| require (weight >= 0.0, "illegal weight value: " + weight + " weight must be >= 0.0") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was going to wonder if we should require weights to be positive, but I don't think we do now. I guess we tend to catch it when we check whether all weights are 0 (sum = 0).
BTW you can use string interpolation in these error messages.
|
nit: What about adding a |
|
Nit: add a util common method to generate error string? Seems duplicated strings and will need to update each of them if any future changes :) |
|
Test build #123078 has finished for PR 28621 at commit
|
|
Weird, errors like |
|
Jenkins test this please |
|
Test build #123088 has finished for PR 28621 at commit
|
|
Test build #123094 has finished for PR 28621 at commit
|
|
Merged to master |
|
Thank you all! |
What changes were proposed in this pull request?
In the algorithms that support instance weight, add checks to make sure instance weight is not negative.
Why are the changes needed?
instance weight has to be >= 0.0
Does this PR introduce any user-facing change?
No
How was this patch tested?
Manually tested