[SPARK-14615][ML][FOLLOWUP] Fix Python examples to use the new ML Vector and Matrix APIs in the ML pipeline based algorithms #13393

HyukjinKwon · 2016-05-29T10:40:40Z

What changes were proposed in this pull request?

This PR fixes Python examples to use the new ML Vector and Matrix APIs in the ML pipeline based algorithms.

I firstly executed this shell command, grep -r "from pyspark.mllib" . and then executed them all.
Some of tests in ml produced the error messages as below:

pyspark.sql.utils.IllegalArgumentException: u'requirement failed: Input type must be VectorUDT but got org.apache.spark.mllib.linalg.VectorUDT@f71b0bce.'

So, I fixed them to use new ones just identically with some Python tests fixed in #12627

How was this patch tested?

Manually tested for all the examples listed by grep -r "from pyspark.mllib" ..

HyukjinKwon · 2016-05-29T10:41:48Z

@viirya Thank you for your quick reply on #12627. Please feel free to take over this if there are a lot of things wrong here.

SparkQA · 2016-05-29T10:50:03Z

Test build #59588 has finished for PR 13393 at commit 0ddb1f4.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

viirya · 2016-05-29T11:49:53Z

LGTM and cc @mengxr

HyukjinKwon · 2016-06-06T10:17:16Z

Hi @mengxr , Could you please take a look?

HyukjinKwon · 2016-06-08T23:03:45Z

Please let me ping @mengxr again. Thanks!

HyukjinKwon · 2016-06-09T22:52:44Z

Hi @yanboliang , could you maybe take a quick look please?

jkbradley · 2016-06-10T21:54:34Z

I'll take a look

jkbradley · 2016-06-10T22:09:47Z

examples/src/main/python/ml/aft_survival_regression.py

 # $example on$
 from pyspark.ml.regression import AFTSurvivalRegression
-from pyspark.mllib.linalg import Vectors
+from pyspark.ml.linalg import Vectors


Does this example run for you? It seems broken (not due to your PR though). Would you mind checking to identify the last time it worked?

Traceback (most recent call last): File "/Users/josephkb/spark/examples/src/main/python/ml/aft_survival_regression.py", line 49, in <module> model = aft.fit(training) File "/Users/josephkb/spark/python/lib/pyspark.zip/pyspark/ml/base.py", line 64, in fit File "/Users/josephkb/spark/python/lib/pyspark.zip/pyspark/ml/wrapper.py", line 213, in _fit File "/Users/josephkb/spark/python/lib/pyspark.zip/pyspark/ml/wrapper.py", line 210, in _fit_java File "/Users/josephkb/spark/python/lib/py4j-0.10.1-src.zip/py4j/java_gateway.py", line 933, in __call__ File "/Users/josephkb/spark/python/lib/pyspark.zip/pyspark/sql/utils.py", line 79, in deco pyspark.sql.utils.IllegalArgumentException: u'requirement failed: The number of instances should be greater than 0.0, but got 0.'

jkbradley · 2016-06-10T22:24:32Z

LGTM except for the broken example, but I don't think that's from this PR. I'll rerun tests before merging it.

SparkQA · 2016-06-10T22:49:44Z

Test build #3076 has finished for PR 13393 at commit 0ddb1f4.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

jkbradley · 2016-06-11T00:38:16Z

Well, that test works in 1.6 but fails in branch-2.0. I'll merge your PR. Thanks!

I created a JIRA for the bug. Would you have time to look into it? [https://issues.apache.org/jira/browse/SPARK-15892]

…tor and Matrix APIs in the ML pipeline based algorithms ## What changes were proposed in this pull request? This PR fixes Python examples to use the new ML Vector and Matrix APIs in the ML pipeline based algorithms. I firstly executed this shell command, `grep -r "from pyspark.mllib" .` and then executed them all. Some of tests in `ml` produced the error messages as below: ``` pyspark.sql.utils.IllegalArgumentException: u'requirement failed: Input type must be VectorUDT but got org.apache.spark.mllib.linalg.VectorUDTf71b0bce.' ``` So, I fixed them to use new ones just identically with some Python tests fixed in #12627 ## How was this patch tested? Manually tested for all the examples listed by `grep -r "from pyspark.mllib" .`. Author: hyukjinkwon <[email protected]> Closes #13393 from HyukjinKwon/SPARK-14615. (cherry picked from commit 99f3c82) Signed-off-by: Joseph K. Bradley <[email protected]>

HyukjinKwon · 2016-06-11T15:04:46Z

@jkbradley Sure, thanks!

Fix Python examples

0ddb1f4

jkbradley reviewed Jun 10, 2016
View reviewed changes

asfgit closed this in 99f3c82 Jun 11, 2016

HyukjinKwon deleted the SPARK-14615 branch January 2, 2018 03:42

[SPARK-14615][ML][FOLLOWUP] Fix Python examples to use the new ML Vector and Matrix APIs in the ML pipeline based algorithms #13393

[SPARK-14615][ML][FOLLOWUP] Fix Python examples to use the new ML Vector and Matrix APIs in the ML pipeline based algorithms #13393

Uh oh!

Conversation

HyukjinKwon commented May 29, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

How was this patch tested?

Uh oh!

HyukjinKwon commented May 29, 2016

Uh oh!

SparkQA commented May 29, 2016

Uh oh!

viirya commented May 29, 2016

Uh oh!

HyukjinKwon commented Jun 6, 2016

Uh oh!

HyukjinKwon commented Jun 8, 2016

Uh oh!

HyukjinKwon commented Jun 9, 2016

Uh oh!

jkbradley commented Jun 10, 2016

Uh oh!

jkbradley Jun 10, 2016

Choose a reason for hiding this comment

Uh oh!

jkbradley commented Jun 10, 2016

Uh oh!

SparkQA commented Jun 10, 2016

Uh oh!

jkbradley commented Jun 11, 2016

Uh oh!

HyukjinKwon commented Jun 11, 2016

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

HyukjinKwon commented May 29, 2016 •

edited

Loading