-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-19337] [ML] [Doc] Documentation and examples for LinearSVC #16968
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Test build #73020 has finished for PR 16968 at commit
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
|
I see. I will drop the R example here, whichever PR goes in later can finish the document update. |
docs/ml-classification-regression.md
Outdated
| regression, or other tasks. Intuitively, a good separation is achieved by the hyperplane that has | ||
| the largest distance to the nearest training-data point of any class (so-called functional margin), | ||
| since in general the larger the margin the lower the generalization error of the classifier. LinearSVC | ||
| in Spark ML supports binary calssification with linear SVM. Internally, it optimizes the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
calssification -> classification
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks
docs/ml-classification-regression.md
Outdated
| A [support vector machine](https://en.wikipedia.org/wiki/Support_vector_machine) constructs a hyperplane | ||
| or set of hyperplanes in a high- or infinite-dimensional space, which can be used for classification, | ||
| regression, or other tasks. Intuitively, a good separation is achieved by the hyperplane that has | ||
| the largest distance to the nearest training-data point of any class (so-called functional margin), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"largest distance" -> "longest distance"? I think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the comment. I think both large and long can be used to describe distance, wherever large is more suitable to describe the numeric margin. Please let me know if you have a strong preference.
|
title should say |
|
Thanks for the comment @felixcheung |
|
Test build #73071 has finished for PR 16968 at commit
|
felixcheung
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
I'll leave this out for a day in case anyone else can comment.
docs/ml-classification-regression.md
Outdated
| regression, or other tasks. Intuitively, a good separation is achieved by the hyperplane that has | ||
| the largest distance to the nearest training-data points of any class (so-called functional margin), | ||
| since in general the larger the margin the lower the generalization error of the classifier. LinearSVC | ||
| in Spark ML supports binomial classification with linear SVM. Internally, it optimizes the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
actually, is there a reason you change this to binomial classification?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just to be consistent with LR. But I'm not sure if it's the common expression.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do you have a link? I think binary classification is more commonly used
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FWIW I have never head the term binomial classification and it doesn't show up in a Google search. I think it was a typo.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, let's fix that
felixcheung
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's change binomial classification
|
Thanks for the review. Updated to binary. |
|
Test build #73132 has finished for PR 16968 at commit
|
felixcheung
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM @wangmiao1981
|
merged to master. |
## What changes were proposed in this pull request? Documentation and examples (Java, scala, python, R) for LinearSVC ## How was this patch tested? local doc generation Author: Yuhao Yang <[email protected]> Closes apache#16968 from hhbyyh/mlsvmdoc.
What changes were proposed in this pull request?
Documentation and examples (Java, scala, python, R) for LinearSVC
How was this patch tested?
local doc generation