Skip to content

Conversation

@HyukjinKwon
Copy link
Member

@HyukjinKwon HyukjinKwon commented May 29, 2016

What changes were proposed in this pull request?

This PR fixes Python examples to use the new ML Vector and Matrix APIs in the ML pipeline based algorithms.

I firstly executed this shell command, grep -r "from pyspark.mllib" . and then executed them all.
Some of tests in ml produced the error messages as below:

pyspark.sql.utils.IllegalArgumentException: u'requirement failed: Input type must be VectorUDT but got org.apache.spark.mllib.linalg.VectorUDT@f71b0bce.'

So, I fixed them to use new ones just identically with some Python tests fixed in #12627

How was this patch tested?

Manually tested for all the examples listed by grep -r "from pyspark.mllib" ..

@HyukjinKwon
Copy link
Member Author

@viirya Thank you for your quick reply on #12627. Please feel free to take over this if there are a lot of things wrong here.

@SparkQA
Copy link

SparkQA commented May 29, 2016

Test build #59588 has finished for PR 13393 at commit 0ddb1f4.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@viirya
Copy link
Member

viirya commented May 29, 2016

LGTM and cc @mengxr

@HyukjinKwon
Copy link
Member Author

Hi @mengxr , Could you please take a look?

@HyukjinKwon
Copy link
Member Author

Please let me ping @mengxr again. Thanks!

@HyukjinKwon
Copy link
Member Author

Hi @yanboliang , could you maybe take a quick look please?

@jkbradley
Copy link
Member

I'll take a look

# $example on$
from pyspark.ml.regression import AFTSurvivalRegression
from pyspark.mllib.linalg import Vectors
from pyspark.ml.linalg import Vectors
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this example run for you? It seems broken (not due to your PR though). Would you mind checking to identify the last time it worked?

Traceback (most recent call last):                                              
  File "/Users/josephkb/spark/examples/src/main/python/ml/aft_survival_regression.py", line 49, in <module>
    model = aft.fit(training)
  File "/Users/josephkb/spark/python/lib/pyspark.zip/pyspark/ml/base.py", line 64, in fit
  File "/Users/josephkb/spark/python/lib/pyspark.zip/pyspark/ml/wrapper.py", line 213, in _fit
  File "/Users/josephkb/spark/python/lib/pyspark.zip/pyspark/ml/wrapper.py", line 210, in _fit_java
  File "/Users/josephkb/spark/python/lib/py4j-0.10.1-src.zip/py4j/java_gateway.py", line 933, in __call__
  File "/Users/josephkb/spark/python/lib/pyspark.zip/pyspark/sql/utils.py", line 79, in deco
pyspark.sql.utils.IllegalArgumentException: u'requirement failed: The number of instances should be greater than 0.0, but got 0.'

@jkbradley
Copy link
Member

LGTM except for the broken example, but I don't think that's from this PR. I'll rerun tests before merging it.

@SparkQA
Copy link

SparkQA commented Jun 10, 2016

Test build #3076 has finished for PR 13393 at commit 0ddb1f4.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@jkbradley
Copy link
Member

Well, that test works in 1.6 but fails in branch-2.0. I'll merge your PR. Thanks!

I created a JIRA for the bug. Would you have time to look into it? [https://issues.apache.org/jira/browse/SPARK-15892]

@asfgit asfgit closed this in 99f3c82 Jun 11, 2016
asfgit pushed a commit that referenced this pull request Jun 11, 2016
…tor and Matrix APIs in the ML pipeline based algorithms

## What changes were proposed in this pull request?

This PR fixes Python examples to use the new ML Vector and Matrix APIs in the ML pipeline based algorithms.

I firstly executed this shell command, `grep -r "from pyspark.mllib" .` and then executed them all.
Some of tests in `ml` produced the error messages as below:

```
pyspark.sql.utils.IllegalArgumentException: u'requirement failed: Input type must be VectorUDT but got org.apache.spark.mllib.linalg.VectorUDTf71b0bce.'
```

So, I fixed them to use new ones just identically with some Python tests fixed in #12627

## How was this patch tested?

Manually tested for all the examples listed by `grep -r "from pyspark.mllib" .`.

Author: hyukjinkwon <[email protected]>

Closes #13393 from HyukjinKwon/SPARK-14615.

(cherry picked from commit 99f3c82)
Signed-off-by: Joseph K. Bradley <[email protected]>
@HyukjinKwon
Copy link
Member Author

@jkbradley Sure, thanks!

@HyukjinKwon HyukjinKwon deleted the SPARK-14615 branch January 2, 2018 03:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants