[SPARK-29212][ML][PYSPARK] Add common classes without using JVM backend#27245
[SPARK-29212][ML][PYSPARK] Add common classes without using JVM backend#27245zero323 wants to merge 2 commits intoapache:masterfrom
Conversation
|
Test build #116878 has finished for PR 27245 at commit
|
04ed1f1 to
577d09c
Compare
|
Test build #116881 has finished for PR 27245 at commit
|
|
Test build #116883 has finished for PR 27245 at commit
|
|
Test build #116880 has finished for PR 27245 at commit
|
|
Test build #116885 has finished for PR 27245 at commit
|
3a6ab3b to
be45e3b
Compare
|
Test build #116886 has finished for PR 27245 at commit
|
|
Test build #116887 has finished for PR 27245 at commit
|
|
Test build #116969 has finished for PR 27245 at commit
|
|
Test build #116970 has finished for PR 27245 at commit
|
|
Test build #116984 has finished for PR 27245 at commit
|
|
CC @huaxingao @srowen Forwarding, since you participated in the discussion. |
|
Test build #117491 has finished for PR 27245 at commit
|
|
Test build #117493 has finished for PR 27245 at commit
|
|
Test build #117751 has finished for PR 27245 at commit
|
|
Test build #117754 has finished for PR 27245 at commit
|
|
Test build #118184 has finished for PR 27245 at commit
|
|
In general LGTM, pending update version info 3.0.0 to 3.1.0 |
|
The changes look good to me. |
Yeah, I was thinking about the right way to tackle this. While we introduce new objects in type hierarchy, the API of the existing classes doesn't change, compared to what we get in 3.0. So I felt that keeping |
|
Test build #118680 has finished for PR 27245 at commit
|
|
retest this please |
|
Test build #119274 has finished for PR 27245 at commit
|
|
Merged to master, thanks all! |
|
Thanks! |
### What changes were proposed in this pull request? Implement common base ML classes (`Predictor`, `PredictionModel`, `Classifier`, `ClasssificationModel` `ProbabilisticClassifier`, `ProbabilisticClasssificationModel`, `Regressor`, `RegrssionModel`) for non-Java backends. Note - `Predictor` and `JavaClassifier` should be abstract as `_fit` method is not implemented. - `PredictionModel` should be abstract as `_transform` is not implemented. ### Why are the changes needed? To provide extensions points for non-JVM algorithms, as well as a public (as opposed to `Java*` variants, which are commonly described in docstrings as private) hierarchy which can be used to distinguish between different classes of predictors. For longer discussion see [SPARK-29212](https://issues.apache.org/jira/browse/SPARK-29212) and / or apache#25776. ### Does this PR introduce any user-facing change? It adds new base classes as listed above, but effective interfaces (method resolution order notwithstanding) stay the same. Additionally "private" `Java*` classes in`ml.regression` and `ml.classification` have been renamed to follow PEP-8 conventions (added leading underscore). It is for discussion if the same should be done to equivalent classes from `ml.wrapper`. If we take `JavaClassifier` as an example, type hierarchy will change from  to  Similarly the old model  will become  ### How was this patch tested? Existing unit tests. Closes apache#27245 from zero323/SPARK-29212. Authored-by: zero323 <mszymkiewicz@gmail.com> Signed-off-by: zhengruifeng <ruifengz@foxmail.com>
What changes were proposed in this pull request?
Implement common base ML classes (
Predictor,PredictionModel,Classifier,ClasssificationModelProbabilisticClassifier,ProbabilisticClasssificationModel,Regressor,RegrssionModel) for non-Java backends.Note
PredictorandJavaClassifiershould be abstract as_fitmethod is not implemented.PredictionModelshould be abstract as_transformis not implemented.Why are the changes needed?
To provide extensions points for non-JVM algorithms, as well as a public (as opposed to
Java*variants, which are commonly described in docstrings as private) hierarchy which can be used to distinguish between different classes of predictors.For longer discussion see SPARK-29212 and / or #25776.
Does this PR introduce any user-facing change?
It adds new base classes as listed above, but effective interfaces (method resolution order notwithstanding) stay the same.
Additionally "private"
Java*classes inml.regressionandml.classificationhave been renamed to follow PEP-8 conventions (added leading underscore).It is for discussion if the same should be done to equivalent classes from
ml.wrapper.If we take
JavaClassifieras an example, type hierarchy will change fromto
Similarly the old model
will become
How was this patch tested?
Existing unit tests.