Skip to content

[SPARK-38243][PYTHON][ML] Fix pyspark.ml.LogisticRegression.getThreshold error message logic#35558

Closed
zero323 wants to merge 1 commit intoapache:masterfrom
zero323:SPARK-38243
Closed

[SPARK-38243][PYTHON][ML] Fix pyspark.ml.LogisticRegression.getThreshold error message logic#35558
zero323 wants to merge 1 commit intoapache:masterfrom
zero323:SPARK-38243

Conversation

@zero323
Copy link
Member

@zero323 zero323 commented Feb 17, 2022

What changes were proposed in this pull request?

This PR replaces incorrect usage of str.join on a List[float] in LogisticRegression.getThreshold.

Why are the changes needed?

To avoid unexpected failure if method is used in case of multi-class classification.

After this change, the following code:

from pyspark.ml.classification import LogisticRegression

LogisticRegression(thresholds=[1.0, 2.0, 3.0]).getThreshold()

raises

Traceback (most recent call last):
  Input In [4] in <module>
    model.getThreshold()
  File /path/to/spark/python/pyspark/ml/classification.py:999 in getThreshold
    raise ValueError(
ValueError: Logistic Regression getThreshold only applies to binary classification, but thresholds has length != 2.  thresholds: [1.0, 2.0, 3.0]

instead of current

Traceback (most recent call last):
  Input In [7] in <module>
    model.getThreshold()
  File /path/to/spark/python/pyspark/ml/classification.py:1003 in getThreshold
    + ",".join(ts)
TypeError: sequence item 0: expected str instance, float found

Does this PR introduce any user-facing change?

No. Bugfix.

How was this patch tested?

Manual testing.

@zero323 zero323 changed the title Fix pyspark.ml.LogisticRegression.getThreshold error message logic [SPARK-38243][PYTHON][ML] Fix pyspark.ml.LogisticRegression.getThreshold error message logic Feb 17, 2022
Copy link
Member

@srowen srowen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you do a quick search to see if .join(ts) is called like this anywhere else?

@zero323
Copy link
Member Author

zero323 commented Feb 18, 2022

Thanks for the review @srowen.

Could you do a quick search to see if .join(ts) is called like this anywhere else?

I did a quick sweep, but nothing popped out.

In core, sql and most of ml, errors like this should be already detected with inline hints, ml.classification is just not there yet.

@srowen
Copy link
Member

srowen commented Feb 18, 2022

Merged to master

@srowen srowen closed this in 4399755 Feb 18, 2022
@zero323 zero323 deleted the SPARK-38243 branch February 18, 2022 17:31
@zero323
Copy link
Member Author

zero323 commented Feb 18, 2022

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants