-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-38669][ML] Validate input dataset of ml.clustering and ml.recommendation #35983
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-38669][ML] Validate input dataset of ml.clustering and ml.recommendation #35983
Conversation
srowen
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Basically the same change as last time? that's fine if this is the remainder.
|
@srowen ALS is also added here, now all classification/regression/clustering/recommendation support dataset validation. I think there will be another PR to cleanup unused internal functions. |
|
The approach seems OK; looks like some related text failures though |
|
retest this please |
|
I'm still seeing ... But I don't think that's related? I wonder if we just need to try tomorrow |
|
I also think it should be irrelevant, let me rebase the PR. |
0ebde3b to
38be19f
Compare
38be19f to
03e61cc
Compare
|
Merged to master |
|
thanks for reviewing |
What changes were proposed in this pull request?
Validate input dataset of ml.clustering
Why are the changes needed?
when input dataset contains invalid values, fail fast and output relevant error message
Does this PR introduce any user-facing change?
No
How was this patch tested?
added testsuites