Skip to content

Conversation

@yanboliang
Copy link
Contributor

Add RegressionMetrics.scala as regression metrics used for evaluation and corresponding test case RegressionMetricsSuite.scala.

@AmplabJenkins
Copy link

Can one of the admins verify this patch?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo in name

@srowen
Copy link
Member

srowen commented Oct 28, 2014

Update the title with SPARK-XXXX [MLLIB]

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see that stats for pred are used?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it's not used and I have remove it in a new commit.

@yanboliang yanboliang changed the title add regression metrics SPARK-4111 [MLlib] add regression metrics Oct 28, 2014
@yanboliang
Copy link
Contributor Author

Rename re_score() and remove unused column.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this might be worth a comment to explain what sums of squares you are trying to compute in the numerator and denominator. A link to the definition might be good, here and for explained variance, since they are related.

@srowen
Copy link
Member

srowen commented Oct 28, 2014

This is picky now, but you might write out "meanAverageError" instead of saying "mae". Is "r2_score" style-wise correct vs "r2Score"? (Sorry should have thought of that.) Finally consider using @return tags in your scaladoc to describe what's being returned instead of leaving it blank but writing docs in the body.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also picky but you can avoid math.pow and avoid computing value - pred 3 times here with a local var. Might be cleaner. This LGTM for what it's worth.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The third and the forth columns are not necessary. You can use normL1 and normL2 on the second column:

https://github.com/apache/spark/blob/master/mllib/src/main/scala/org/apache/spark/mllib/stat/MultivariateOnlineSummarizer.scala#L219

@yanboliang
Copy link
Contributor Author

Rename parameter and function names to be consistent with spark naming rules.
Delete unused columns and set prediction as the first column.
Add explanation and reference to r2Score and explained variance.
Other code style keeping.

@mengxr
Copy link
Contributor

mengxr commented Oct 29, 2014

ok to test

@mengxr
Copy link
Contributor

mengxr commented Oct 29, 2014

test this please

@SparkQA
Copy link

SparkQA commented Oct 29, 2014

Test build #22457 has started for PR 2978 at commit a8ad3e3.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Oct 29, 2014

Test build #22457 timed out for PR 2978 at commit a8ad3e3 after a configured wait of 120m.

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/22457/
Test FAILed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

space after ,

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

space after ,

@SparkQA
Copy link

SparkQA commented Oct 30, 2014

Test build #22527 has started for PR 2978 at commit 3d0bec1.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Oct 30, 2014

Test build #22528 has started for PR 2978 at commit 730d0a9.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Oct 30, 2014

Test build #22527 has finished for PR 2978 at commit 3d0bec1.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • class RegressionMetrics(predictionAndObservations: RDD[(Double, Double)]) extends Logging

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/22527/
Test PASSed.

@SparkQA
Copy link

SparkQA commented Oct 30, 2014

Test build #22528 has finished for PR 2978 at commit 730d0a9.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • class RegressionMetrics(predictionAndObservations: RDD[(Double, Double)]) extends Logging

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/22528/
Test PASSed.

@mengxr
Copy link
Contributor

mengxr commented Oct 30, 2014

LGTM. Merged into master. Thanks!

@asfgit asfgit closed this in d932719 Oct 30, 2014
@yanboliang yanboliang deleted the regression_metrics branch February 19, 2015 14:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants