[SPARK-38243][PYTHON][ML] Fix pyspark.ml.LogisticRegression.getThreshold error message logic by zero323 · Pull Request #35558 · apache/spark

zero323 · 2022-02-17T15:55:55Z

What changes were proposed in this pull request?

This PR replaces incorrect usage of str.join on a List[float] in LogisticRegression.getThreshold.

Why are the changes needed?

To avoid unexpected failure if method is used in case of multi-class classification.

After this change, the following code:

from pyspark.ml.classification import LogisticRegression

LogisticRegression(thresholds=[1.0, 2.0, 3.0]).getThreshold()

raises

Traceback (most recent call last):
  Input In [4] in <module>
    model.getThreshold()
  File /path/to/spark/python/pyspark/ml/classification.py:999 in getThreshold
    raise ValueError(
ValueError: Logistic Regression getThreshold only applies to binary classification, but thresholds has length != 2.  thresholds: [1.0, 2.0, 3.0]

instead of current

Traceback (most recent call last):
  Input In [7] in <module>
    model.getThreshold()
  File /path/to/spark/python/pyspark/ml/classification.py:1003 in getThreshold
    + ",".join(ts)
TypeError: sequence item 0: expected str instance, float found

Does this PR introduce any user-facing change?

No. Bugfix.

How was this patch tested?

Manual testing.

srowen

Could you do a quick search to see if .join(ts) is called like this anywhere else?

zero323 · 2022-02-18T14:55:00Z

Thanks for the review @srowen.

Could you do a quick search to see if .join(ts) is called like this anywhere else?

I did a quick sweep, but nothing popped out.

In core, sql and most of ml, errors like this should be already detected with inline hints, ml.classification is just not there yet.

srowen · 2022-02-18T17:08:38Z

Merged to master

zero323 · 2022-02-18T17:32:08Z

Thanks!

Fix pyspark.ml.LogisticRegression.getThreshold error message logic

9bac2ab

zero323 changed the title ~~Fix pyspark.ml.LogisticRegression.getThreshold error message logic~~ [SPARK-38243][PYTHON][ML] Fix pyspark.ml.LogisticRegression.getThreshold error message logic Feb 17, 2022

github-actions bot added CORE ML PYTHON labels Feb 17, 2022

srowen approved these changes Feb 18, 2022

View reviewed changes

srowen closed this in 4399755 Feb 18, 2022

zero323 deleted the SPARK-38243 branch February 18, 2022 17:31

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-38243][PYTHON][ML] Fix pyspark.ml.LogisticRegression.getThreshold error message logic#35558

[SPARK-38243][PYTHON][ML] Fix pyspark.ml.LogisticRegression.getThreshold error message logic#35558
zero323 wants to merge 1 commit intoapache:masterfrom
zero323:SPARK-38243

zero323 commented Feb 17, 2022

Uh oh!

srowen left a comment

Uh oh!

zero323 commented Feb 18, 2022 •

edited

Loading

Uh oh!

srowen commented Feb 18, 2022

Uh oh!

zero323 commented Feb 18, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

zero323 commented Feb 17, 2022

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

srowen left a comment

Choose a reason for hiding this comment

Uh oh!

zero323 commented Feb 18, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

srowen commented Feb 18, 2022

Uh oh!

zero323 commented Feb 18, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

zero323 commented Feb 18, 2022 •

edited

Loading