Skip to content

Commit 13d2406

Browse files
committed
[SPARK-5254][MLLIB] Update the user guide to position spark.ml better
The current statement in the user guide may deliver confusing messages to users. spark.ml contains high-level APIs for building ML pipelines. But it doesn't mean that spark.mllib is being deprecated. First of all, the pipeline API is in its alpha stage and we need to see more use cases from the community to stabilizes it, which may take several releases. Secondly, the components in spark.ml are simple wrappers over spark.mllib implementations. Neither the APIs or the implementations from spark.mllib are being deprecated. We expect users use spark.ml pipeline APIs to build their ML pipelines, but we will keep supporting and adding features to spark.mllib. For example, there are many features in review at https://spark-prs.appspot.com/#mllib. So users should be comfortable with using spark.mllib features and expect more coming. The user guide needs to be updated to make the message clear. Author: Xiangrui Meng <[email protected]> Closes #4052 from mengxr/SPARK-5254 and squashes the following commits: 6d5f1d3 [Xiangrui Meng] typo 0cc935b [Xiangrui Meng] update user guide to position spark.ml better
1 parent 76389c5 commit 13d2406

File tree

2 files changed

+21
-14
lines changed

2 files changed

+21
-14
lines changed

docs/ml-guide.md

Lines changed: 10 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -3,13 +3,16 @@ layout: global
33
title: Spark ML Programming Guide
44
---
55

6-
Spark ML is Spark's new machine learning package. It is currently an alpha component but is potentially a successor to [MLlib](mllib-guide.html). The `spark.ml` package aims to replace the old APIs with a cleaner, more uniform set of APIs which will help users create full machine learning pipelines.
7-
8-
MLlib vs. Spark ML:
9-
10-
* Users can use algorithms from either of the two packages, but APIs may differ. Currently, `spark.ml` offers a subset of the algorithms from `spark.mllib`. Since Spark ML is an alpha component, its API may change in future releases.
11-
* Developers should contribute new algorithms to `spark.mllib` and can optionally contribute to `spark.ml`. See below for more details.
12-
* Spark ML only has Scala and Java APIs, whereas MLlib also has a Python API.
6+
`spark.ml` is a new package introduced in Spark 1.2, which aims to provide a uniform set of
7+
high-level APIs that help users create and tune practical machine learning pipelines.
8+
It is currently an alpha component, and we would like to hear back from the community about
9+
how it fits real-world use cases and how it could be improved.
10+
11+
Note that we will keep supporting and adding features to `spark.mllib` along with the
12+
development of `spark.ml`.
13+
Users should be comfortable using `spark.mllib` features and expect more features coming.
14+
Developers should contribute new algorithms to `spark.mllib` and can optionally contribute
15+
to `spark.ml`.
1316

1417
**Table of Contents**
1518

docs/mllib-guide.md

Lines changed: 11 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -35,16 +35,20 @@ MLlib is under active development.
3535
The APIs marked `Experimental`/`DeveloperApi` may change in future releases,
3636
and the migration guide below will explain all changes between releases.
3737

38-
# spark.ml: The New ML Package
38+
# spark.ml: high-level APIs for ML pipelines
3939

40-
Spark 1.2 includes a new machine learning package called `spark.ml`, currently an alpha component but potentially a successor to `spark.mllib`. The `spark.ml` package aims to replace the old APIs with a cleaner, more uniform set of APIs which will help users create full machine learning pipelines.
40+
Spark 1.2 includes a new package called `spark.ml`, which aims to provide a uniform set of
41+
high-level APIs that help users create and tune practical machine learning pipelines.
42+
It is currently an alpha component, and we would like to hear back from the community about
43+
how it fits real-world use cases and how it could be improved.
4144

42-
See the **[spark.ml programming guide](ml-guide.html)** for more information on this package.
43-
44-
Users can use algorithms from either of the two packages, but APIs may differ. Currently, `spark.ml` offers a subset of the algorithms from `spark.mllib`.
45+
Note that we will keep supporting and adding features to `spark.mllib` along with the
46+
development of `spark.ml`.
47+
Users should be comfortable using `spark.mllib` features and expect more features coming.
48+
Developers should contribute new algorithms to `spark.mllib` and can optionally contribute
49+
to `spark.ml`.
4550

46-
Developers should contribute new algorithms to `spark.mllib` and can optionally contribute to `spark.ml`.
47-
See the `spark.ml` programming guide linked above for more details.
51+
See the **[spark.ml programming guide](ml-guide.html)** for more information on this package.
4852

4953
# Dependencies
5054

0 commit comments

Comments
 (0)