-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-17001] [ML] Enable standardScaler to standardize sparse vectors when withMean=True #14663
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Test build #63840 has finished for PR 14663 at commit
|
|
going once, going twice. This would simply let an operation proceed where it errored before, at the cost of giving a user a little more rope to hang him/herself. I think it unblocks a legitimate and common set of use cases, so I think it's worth changing. |
|
As mentioned on the JIRA discussion, I'm neutral on this, though I tend to lean towards allowing the user to do what they want even if it might be "dangerous". I guess +0? Though perhaps we may want to explicitly log a |
|
Warning seems reasonable. I think you'd have to put in a flag to remember if the user has been warned in order to avoid spewing millions of them. Worth it, you think? |
|
Ah right, good point. Actually I realised that the doc in ... as good a place as any to add a warning about using with sparse input (this can then show up in |
|
If I understood you correctly @MLnick you favored just adding warnings in the doc? I added to three more places that needed it. |
|
Test build #64288 has finished for PR 14663 at commit
|
|
I'll go for this tomorrow if there are no other comments. |
|
Merged to master |
… when withMean=True ## What changes were proposed in this pull request? Allow centering / mean scaling of sparse vectors in StandardScaler, if requested. This is for compatibility with `VectorAssembler` in common usages. ## How was this patch tested? Jenkins tests, including new caes to reflect the new behavior. Author: Sean Owen <[email protected]> Closes apache#14663 from srowen/SPARK-17001.
What changes were proposed in this pull request?
Allow centering / mean scaling of sparse vectors in StandardScaler, if requested. This is for compatibility with
VectorAssemblerin common usages.How was this patch tested?
Jenkins tests, including new caes to reflect the new behavior.