-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-31734][ML][PySpark] Add weight support in ClusteringEvaluator #28553
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Test build #122734 has finished for PR 28553 at commit
|
zhengruifeng
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
|
cc @srowen |
srowen
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I may have missed it but does this check somewhere that the weights are positive?
|
Added the check to make sure weights are all positive. |
|
Test build #122945 has finished for PR 28553 at commit
|
|
Yeah there are several points in the code that check weights with a |
|
I mean in the following algorithms that have instance weight support, we probably also need to make sure the weights are positive.
I will take a look and fix them all. Not in this PR, though. |
| ) => | ||
| BLAS.axpy(1.0, features, featureSum) | ||
| (featureSum, squaredNormSum + squaredNorm, numOfPoints + 1) | ||
| require (weight >= 0.0, "illegal weight value: " + weight + " weight must be >= 0.0") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's better to do the check here so it doesn't require an extra pass to get all the weights.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good; it's consistent with your other change. Really minor: use string interpolation?
|
Test build #123031 has finished for PR 28553 at commit
|
| ) => | ||
| BLAS.axpy(1.0, features, featureSum) | ||
| (featureSum, squaredNormSum + squaredNorm, numOfPoints + 1) | ||
| require (weight >= 0.0, "illegal weight value: " + weight + " weight must be >= 0.0") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good; it's consistent with your other change. Really minor: use string interpolation?
| (normalizedFeaturesSum, numOfPoints + 1) | ||
| case ((normalizedFeaturesSum: DenseVector, weightSum: Double), | ||
| (normalizedFeatures, weight)) => | ||
| require (weight >= 0.0, "illegal weight value: " + weight + " weight must be >= 0.0") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here and nit: remove space after require
4f78591 to
ebd8848
Compare
|
I didn't resolve conflicts correctly. I will fix the problem. |
|
Test build #123070 has finished for PR 28553 at commit
|
|
Test build #123069 has finished for PR 28553 at commit
|
|
Test build #123073 has finished for PR 28553 at commit
|
|
Merged to master. You may want to further change the nonnegativity check in your other PR to use the new method you introduced there. |
|
Thanks! @srowen @zhengruifeng |
What changes were proposed in this pull request?
Add weight support in ClusteringEvaluator
Why are the changes needed?
Currently, BinaryClassificationEvaluator, RegressionEvaluator, and MulticlassClassificationEvaluator support instance weight, but ClusteringEvaluator doesn't, so we will add instance weight support in ClusteringEvaluator.
Does this PR introduce any user-facing change?
Yes.
ClusteringEvaluator.setWeightCol
How was this patch tested?
add new unit test