Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions docs/_layouts/global.html
Original file line number Diff line number Diff line change
Expand Up @@ -84,6 +84,7 @@
<a class="dropdown-item" href="ml-guide.html">MLlib (Machine Learning)</a>
<a class="dropdown-item" href="graphx-programming-guide.html">GraphX (Graph Processing)</a>
<a class="dropdown-item" href="sparkr.html">SparkR (R on Spark)</a>
<a class="dropdown-item" href="api/python/getting_started/index.html">PySpark (Python on Spark)</a>
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Screen Shot 2021-01-07 at 6 45 02 PM

</div>
</li>

Expand Down
2 changes: 2 additions & 0 deletions docs/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -113,6 +113,8 @@ options for deployment:
* [Spark Streaming](streaming-programming-guide.html): processing data streams using DStreams (old API)
* [MLlib](ml-guide.html): applying machine learning algorithms
* [GraphX](graphx-programming-guide.html): processing graphs
* [SparkR](sparkr.html): processing data with Spark in R
* [PySpark](api/python/getting_started/index.html): processing data with Spark in Python
Copy link
Member Author

@HyukjinKwon HyukjinKwon Jan 7, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Screen Shot 2021-01-07 at 6 49 59 PM


**API Docs:**

Expand Down
3 changes: 3 additions & 0 deletions python/docs/source/getting_started/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,9 @@ Getting Started
===============

This page summarizes the basic steps required to setup and get started with PySpark.
There are more guides shared with other languages such as
`Quick Start <http://spark.apache.org/docs/latest/quick-start.html>`_ in Programming Guides
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this needed to be absolute path? Cannot we use relative path?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah ... I tried so hard to find a good way but failed. This is because the link here is outer side of the PySpark documentation build. So it can't resolve the link when PySpark documentation builds.

at `the Spark documentation <http://spark.apache.org/docs/latest/index.html#where-to-go-from-here>`_.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Screen Shot 2021-01-07 at 6 46 05 PM


.. toctree::
:maxdepth: 2
Expand Down
12 changes: 10 additions & 2 deletions python/docs/source/migration_guide/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -21,8 +21,6 @@ Migration Guide
===============

This page describes the migration guide specific to PySpark.
Many items of other migration guides can also be applied when migrating PySpark to higher versions because PySpark internally shares other components.
Please also refer other migration guides such as `Migration Guide: SQL, Datasets and DataFrame <http://spark.apache.org/docs/latest/sql-migration-guide.html>`_.

.. toctree::
:maxdepth: 2
Expand All @@ -33,3 +31,13 @@ Please also refer other migration guides such as `Migration Guide: SQL, Datasets
pyspark_2.2_to_2.3
pyspark_1.4_to_1.5
pyspark_1.0_1.2_to_1.3


Many items of other migration guides can also be applied when migrating PySpark to higher versions because PySpark internally shares other components.
Please also refer other migration guides:

- `Migration Guide: Spark Core <http://spark.apache.org/docs/latest/core-migration-guide.html>`_
- `Migration Guide: SQL, Datasets and DataFrame <http://spark.apache.org/docs/latest/sql-migration-guide.html>`_
- `Migration Guide: Structured Streaming <http://spark.apache.org/docs/latest/ss-migration-guide.html>`_
- `Migration Guide: MLlib (Machine Learning) <http://spark.apache.org/docs/latest/ml-migration-guide.html>`_

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Screen Shot 2021-01-07 at 6 46 56 PM

12 changes: 6 additions & 6 deletions python/docs/source/reference/pyspark.ml.rst
Original file line number Diff line number Diff line change
Expand Up @@ -16,11 +16,11 @@
under the License.


ML
==
MLlib (DataFrame-based)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have mixed about it. I am aware that MLlib is the official name applied to both, but majority of users, I interacted with, prefers to use ML when speaking about DataFrame-based API.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one is actually the feedback who has a lot of major contributions in ML, @mengxr :-). I think that this name was picked based on what we documented here https://spark.apache.org/docs/latest/ml-guide.html which I think makes sense:

  • What is “Spark ML”?

    “Spark ML” is not an official name but occasionally used to refer to the MLlib DataFrame-based API. This is majorly due to the org.apache.spark.ml Scala package name used by the DataFrame-based API, and the “Spark ML Pipelines” term we used initially to emphasize the pipeline concept.

=======================

ML Pipeline APIs
----------------
Pipeline APIs
-------------

.. currentmodule:: pyspark.ml

Expand Down Expand Up @@ -188,8 +188,8 @@ Clustering
PowerIterationClustering


ML Functions
----------------------------
Functions
---------

.. currentmodule:: pyspark.ml.functions

Expand Down
4 changes: 2 additions & 2 deletions python/docs/source/reference/pyspark.mllib.rst
Original file line number Diff line number Diff line change
Expand Up @@ -16,8 +16,8 @@
under the License.


MLlib
=====
MLlib (RDD-based)
=================

Classification
--------------
Expand Down
12 changes: 12 additions & 0 deletions python/docs/source/user_guide/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -20,9 +20,21 @@
User Guide
==========

This page is the guide for PySpark users which contains PySpark specific topics.

.. toctree::
:maxdepth: 2

arrow_pandas
python_packaging


There are more guides shared with other languages in Programming Guides
at `the Spark documentation <http://spark.apache.org/docs/latest/index.html#where-to-go-from-here>`_.

- `RDD Programming Guide <http://spark.apache.org/docs/latest/rdd-programming-guide.html>`_
- `Spark SQL, DataFrames and Datasets Guide <http://spark.apache.org/docs/latest/sql-programming-guide.html>`_
- `Structured Streaming Programming Guide <http://spark.apache.org/docs/latest/structured-streaming-programming-guide.html>`_
- `Spark Streaming Programming Guide <http://spark.apache.org/docs/latest/streaming-programming-guide.html>`_
- `Machine Learning Library (MLlib) Guide <http://spark.apache.org/docs/latest/ml-guide.html>`_

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Screen Shot 2021-01-07 at 6 46 32 PM