[SPARK-15643] [Doc] [ML] Update spark.ml and spark.mllib migration guide from 1.6 to 2.0#13378
[SPARK-15643] [Doc] [ML] Update spark.ml and spark.mllib migration guide from 1.6 to 2.0#13378yanboliang wants to merge 7 commits intoapache:masterfrom
Conversation
|
Test build #59561 has finished for PR 13378 at commit
|
| `spark.ml.regression.LinearRegressionModel`, the `weights` field has been deprecated in favor of | ||
| the new name `coefficients`. This helps disambiguate from instance (row) "weights" given to | ||
| algorithms. | ||
| * [SPARK-14984](https://issues.apache.org/jira/browse/SPARK-14984): |
There was a problem hiding this comment.
@yanboliang there are breaking changes for removing some deprecated methods in https://issues.apache.org/jira/browse/SPARK-14089 and https://issues.apache.org/jira/browse/SPARK-14952 that we should highlight.
There was a problem hiding this comment.
Though I'm happy to just do that in a follow up PR once I've made a final pass through for MiMa changes.
There was a problem hiding this comment.
Good points. I forgot to record all removed deprecated methods. It's great that you can do that in a follow up PR. Thanks!
|
Test build #59611 has finished for PR 13378 at commit
|
|
How do we want to handle the new vectors (i.e. |
docs/mllib-guide.md
Outdated
| ## From 1.5 to 1.6 | ||
| ## From 1.6 to 2.0 | ||
|
|
||
| There are no breaking API changes in the `spark.mllib` or `spark.ml` packages, but there are |
|
There are also some changes from these 2 JIRAs/PRs which should be noted here:
For linear algebra, we should definitely discuss the change in the migration guide. @mengxr is also thinking about whether we can add a little functionality to make that transition easier. Documenting/improving this could happen in a follow-up PR. |
|
Test build #59734 has finished for PR 13378 at commit
|
|
@MLnick Would you like to update corresponding migration docs for changes in SPARK-14810 in a follow up PR? I saw you left comments to do that. If not, please let me know. |
|
I'm happy to do the breaking changes in a separate PR (I still need to do a final pass through of those to confirm I've caught them all). |
docs/mllib-guide.md
Outdated
| * [SPARK-13600](https://issues.apache.org/jira/browse/SPARK-13600): | ||
| `QuantileDiscretizer` now uses `spark.sql.DataFrameStatFunctions.approxQuantile` to find splits (previously used custom sampling logic). | ||
| The output buckets will differ for same input data and params. | ||
| * [SPARK-14814](https://issues.apache.org/jira/browse/SPARK-14814): |
There was a problem hiding this comment.
Just noticed that this is a breaking API change, not a change of behavior.
There was a problem hiding this comment.
I've just added it to the list in SPARK-14810. We can either remove it here from this PR and I will include it when I do the one for breaking changes, or add it to a breaking changes section in this PR, which I will update with the others later.
There was a problem hiding this comment.
I removed it in this PR. @MLnick Please add it in your follow up PR. Thanks!
|
Separating the work SGTM too. |
|
Test build #59809 has finished for PR 13378 at commit
|
|
@yanboliang how is this coming along? I have a PR ready for the breaking changes. I can either do that separately or push a PR to your branch. We need to update this PR with a few items mentioned in the JIRA by @jkbradley & @mengxr. |
|
@MLnick What about merging this PR firstly and then sending your PR for breaking changes separately? If this is OK, please go ahead to get it in. Thanks! |
|
@yanboliang I'm happy with that - we need to merge this one first so I can slot my changes in format-wise. Could you update for the new deprecations in the JIRA (https://issues.apache.org/jira/browse/SPARK-15643?focusedCommentId=15343059&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15343059)? Also the vector conversion (https://issues.apache.org/jira/browse/SPARK-15643?focusedCommentId=15334729&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15334729) |
|
@yanboliang I opened #13924 with my changes. If you prefer, I can incorporate the part about vector conversions into my section on the new linalg classes (since it perhaps fits best there?). |
|
Test build #61305 has finished for PR 13378 at commit
|
docs/mllib-guide.md
Outdated
| * [SPARK-14900](https://issues.apache.org/jira/browse/SPARK-14900): | ||
| In `spark.mllib.evaluation.MulticlassMetrics`, the parameters `precision`, `recall` and `fMeasure` have been deprecated in favor of `accuracy`. | ||
| * [SPARK-15644](https://issues.apache.org/jira/browse/SPARK-15644): | ||
| In `spark.ml.util.BaseReadWrite`, the `context` method has been deprecated in favor of `session`. |
There was a problem hiding this comment.
Could you please list this as MLReader and MLWriter instead of BaseReadWrite? Those are the public APIs.
|
Other than that, this looks good. |
|
Test build #61364 has finished for PR 13378 at commit
|
|
LGTM |
…e from 1.6 to 2.0 ## What changes were proposed in this pull request? Update ```spark.ml``` and ```spark.mllib``` migration guide from 1.6 to 2.0. ## How was this patch tested? Docs update, no tests. Author: Yanbo Liang <ybliang8@gmail.com> Closes #13378 from yanboliang/spark-13448. (cherry picked from commit 26252f7) Signed-off-by: Joseph K. Bradley <joseph@databricks.com>
What changes were proposed in this pull request?
Update
spark.mlandspark.mllibmigration guide from 1.6 to 2.0.How was this patch tested?
Docs update, no tests.