Skip to content

Conversation

@EnricoMi
Copy link
Contributor

What changes were proposed in this pull request?

The Pyspark Migration Guide needs to mention a breaking change of the Pyspark ML API.

Why are the changes needed?

In SPARK-29093, all setters have been removed from Params mixins in pyspark.ml.param.shared. Those setters had been part of the public pyspark ML API, hence this is a breaking change.

Does this PR introduce any user-facing change?

Only documentation.

How was this patch tested?

Visually.

@HyukjinKwon
Copy link
Member

@zhengruifeng FYI

@srowen
Copy link
Member

srowen commented May 28, 2020

Dumb question, but does that affect callers?

@EnricoMi
Copy link
Contributor Author

Code that used to call into the setters, e.g. HasOutputCols.setOutputCols(value) will not compile anymore.

@zhengruifeng
Copy link
Contributor

@EnricoMi The old mixins provided some wrong setters in the params, (e.g. OneVsRestModel.setClassifier), which were removed in the 3.0. The new params are keeped in line with the scala side.

For the callers like HasOutputCols.setOutputCols(value), I am not sure whether it is a common usage.

@huaxingao @zero323 How do think about this?

@huaxingao
Copy link
Contributor

I don't think the changes in SPARK-29093 affect users. Using outputCols as an example, the common usage is like this:

    ohe = OneHotEncoder()
    ohe.setInputCols(["input"])
    ohe.setOutputCols(["output"])
    model = ohe.fit(df)

It doesn't matter to the users if setOutputCols is in OneHotEncoder or in HasOutputCols. Seems to me that users will not call HasOutputCols.setOutputCols directly.

@EnricoMi
Copy link
Contributor Author

EnricoMi commented Jun 1, 2020

The HasOutputCols mixin is a public mixin that users can use with other implementations of Params, not only OneHotEncoder. User code breaks and users have to dive into Spark code to find the correct replacement. That single line in the migration guide helps users to save an hour or two when moving their code to pyspark 3.

In my opinion, it does not matter if this is common usage or not, it is a possible usage that breaks with this change. First place to look when your code breaks is the migration guide.

@srowen
Copy link
Member

srowen commented Jun 1, 2020

Yeah the only downside is that if we fill the migration guide with a bunch of niche stuff, it might be hard to figure out what the important common items are. I'm not sure how much user code uses the mixins directly, but if there is any code that does it's probably something like HasOutputCols, a common one. So yeah I think it's OK to mention this change.

@srowen
Copy link
Member

srowen commented Jun 3, 2020

Jenkins test this please

@SparkQA
Copy link

SparkQA commented Jun 3, 2020

Test build #123492 has finished for PR 28663 at commit ae3bf76.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@srowen srowen closed this in 4bbe3c2 Jun 3, 2020
srowen pushed a commit that referenced this pull request Jun 3, 2020
…ion guide

### What changes were proposed in this pull request?
The Pyspark Migration Guide needs to mention a breaking change of the Pyspark ML API.

### Why are the changes needed?
In SPARK-29093, all setters have been removed from `Params` mixins in `pyspark.ml.param.shared`. Those setters had been part of the public pyspark ML API, hence this is a breaking change.

### Does this PR introduce _any_ user-facing change?
Only documentation.

### How was this patch tested?
Visually.

Closes #28663 from EnricoMi/branch-pyspark-migration-guide-setters.

Authored-by: Enrico Minack <github@enrico.minack.dev>
Signed-off-by: Sean Owen <srowen@gmail.com>
(cherry picked from commit 4bbe3c2)
Signed-off-by: Sean Owen <srowen@gmail.com>
@srowen
Copy link
Member

srowen commented Jun 3, 2020

Merged to master/3.0

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants