Skip to content

Conversation

@hhbyyh
Copy link
Contributor

@hhbyyh hhbyyh commented Feb 16, 2017

What changes were proposed in this pull request?

Documentation and examples (Java, scala, python, R) for LinearSVC

How was this patch tested?

local doc generation

@SparkQA
Copy link

SparkQA commented Feb 16, 2017

Test build #73020 has finished for PR 16968 at commit 7a0829f.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • public class JavaLinearSVCExample

Copy link
Contributor

@wangmiao1981 wangmiao1981 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@hhbyyh In R, the LinearSVC is named as spark.svmLinear. I just created a PR #16969 for R example and vignettes. Can you drop off the R example? After merging the two PRs, I will update the document section to R. Or you can copy the R example to this PR.

@hhbyyh
Copy link
Contributor Author

hhbyyh commented Feb 17, 2017

I see. I will drop the R example here, whichever PR goes in later can finish the document update.

regression, or other tasks. Intuitively, a good separation is achieved by the hyperplane that has
the largest distance to the nearest training-data point of any class (so-called functional margin),
since in general the larger the margin the lower the generalization error of the classifier. LinearSVC
in Spark ML supports binary calssification with linear SVM. Internally, it optimizes the
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

calssification -> classification

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks

A [support vector machine](https://en.wikipedia.org/wiki/Support_vector_machine) constructs a hyperplane
or set of hyperplanes in a high- or infinite-dimensional space, which can be used for classification,
regression, or other tasks. Intuitively, a good separation is achieved by the hyperplane that has
the largest distance to the nearest training-data point of any class (so-called functional margin),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"largest distance" -> "longest distance"? I think?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the comment. I think both large and long can be used to describe distance, wherever large is more suitable to describe the numeric margin. Please let me know if you have a strong preference.

@felixcheung
Copy link
Member

title should say
[SPARK-19337] [ML] [Dcoc]
->
[SPARK-19337] [ML] [Doc]

@hhbyyh hhbyyh changed the title [SPARK-19337] [ML] [Dcoc] Documentation and examples for LinearSVC [SPARK-19337] [ML] [Doc] Documentation and examples for LinearSVC Feb 17, 2017
@hhbyyh
Copy link
Contributor Author

hhbyyh commented Feb 17, 2017

Thanks for the comment @felixcheung

@SparkQA
Copy link

SparkQA commented Feb 17, 2017

Test build #73071 has finished for PR 16968 at commit b888f35.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Member

@felixcheung felixcheung left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.
I'll leave this out for a day in case anyone else can comment.

regression, or other tasks. Intuitively, a good separation is achieved by the hyperplane that has
the largest distance to the nearest training-data points of any class (so-called functional margin),
since in general the larger the margin the lower the generalization error of the classifier. LinearSVC
in Spark ML supports binomial classification with linear SVM. Internally, it optimizes the
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actually, is there a reason you change this to binomial classification?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just to be consistent with LR. But I'm not sure if it's the common expression.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do you have a link? I think binary classification is more commonly used

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW I have never head the term binomial classification and it doesn't show up in a Google search. I think it was a typo.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, let's fix that

Copy link
Member

@felixcheung felixcheung left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's change binomial classification

@hhbyyh
Copy link
Contributor Author

hhbyyh commented Feb 19, 2017

Thanks for the review. Updated to binary.
Also add the reference to R example.

@SparkQA
Copy link

SparkQA commented Feb 19, 2017

Test build #73132 has finished for PR 16968 at commit 165fbe4.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Member

@felixcheung felixcheung left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@felixcheung
Copy link
Member

merged to master.

@asfgit asfgit closed this in 280afe0 Feb 21, 2017
Yunni pushed a commit to Yunni/spark that referenced this pull request Feb 27, 2017
## What changes were proposed in this pull request?

Documentation and examples (Java, scala, python, R) for LinearSVC

## How was this patch tested?
local doc generation

Author: Yuhao Yang <[email protected]>

Closes apache#16968 from hhbyyh/mlsvmdoc.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants