-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-34041][PYTHON][DOCS] Miscellaneous cleanup for new PySpark documentation #31082
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
003b61f to
0cacfd0
Compare
|
Test build #133785 has finished for PR 31082 at commit
|
| <a class="dropdown-item" href="ml-guide.html">MLlib (Machine Learning)</a> | ||
| <a class="dropdown-item" href="graphx-programming-guide.html">GraphX (Graph Processing)</a> | ||
| <a class="dropdown-item" href="sparkr.html">SparkR (R on Spark)</a> | ||
| <a class="dropdown-item" href="api/python/getting_started/index.html">PySpark (Python on Spark)</a> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| * [MLlib](ml-guide.html): applying machine learning algorithms | ||
| * [GraphX](graphx-programming-guide.html): processing graphs | ||
| * [SparkR](sparkr.html): processing data with Spark in R | ||
| * [PySpark](api/python/getting_started/index.html): processing data with Spark in Python |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| This page summarizes the basic steps required to setup and get started with PySpark. | ||
| There are more guides shared with other languages such as | ||
| `Quick Start <http://spark.apache.org/docs/latest/quick-start.html>`_ in Programming Guides | ||
| at `the Spark documentation <http://spark.apache.org/docs/latest/index.html#where-to-go-from-here>`_. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| - `Structured Streaming Programming Guide <http://spark.apache.org/docs/latest/structured-streaming-programming-guide.html>`_ | ||
| - `Spark Streaming Programming Guide <http://spark.apache.org/docs/latest/streaming-programming-guide.html>`_ | ||
| - `Machine Learning Library (MLlib) Guide <http://spark.apache.org/docs/latest/ml-guide.html>`_ | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| - `Migration Guide: SQL, Datasets and DataFrame <http://spark.apache.org/docs/latest/sql-migration-guide.html>`_ | ||
| - `Migration Guide: Structured Streaming <http://spark.apache.org/docs/latest/ss-migration-guide.html>`_ | ||
| - `Migration Guide: MLlib (Machine Learning) <http://spark.apache.org/docs/latest/ml-migration-guide.html>`_ | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
|
Kubernetes integration test starting |
|
Kubernetes integration test status success |
|
Test build #133789 has finished for PR 31082 at commit
|
|
Kubernetes integration test starting |
|
Kubernetes integration test status failure |
| ML | ||
| == | ||
| MLlib (DataFrame-based) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have mixed about it. I am aware that MLlib is the official name applied to both, but majority of users, I interacted with, prefers to use ML when speaking about DataFrame-based API.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This one is actually the feedback who has a lot of major contributions in ML, @mengxr :-). I think that this name was picked based on what we documented here https://spark.apache.org/docs/latest/ml-guide.html which I think makes sense:
What is “Spark ML”?
“Spark ML” is not an official name but occasionally used to refer to the MLlib DataFrame-based API. This is majorly due to the org.apache.spark.ml Scala package name used by the DataFrame-based API, and the “Spark ML Pipelines” term we used initially to emphasize the pipeline concept.
|
|
||
| This page summarizes the basic steps required to setup and get started with PySpark. | ||
| There are more guides shared with other languages such as | ||
| `Quick Start <http://spark.apache.org/docs/latest/quick-start.html>`_ in Programming Guides |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this needed to be absolute path? Cannot we use relative path?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah ... I tried so hard to find a good way but failed. This is because the link here is outer side of the PySpark documentation build. So it can't resolve the link when PySpark documentation builds.
viirya
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good. Just one question about absolute path.
|
Thanks guys. Let me merge this in. |
…umentation ### What changes were proposed in this pull request? This PR proposes to: - Add a link of quick start in PySpark docs into "Programming Guides" in Spark main docs - `ML` / `MLlib` -> `MLlib (DataFrame-based)` / `MLlib (RDD-based)` in API reference page - Mention other user guides as well because the guide such as [ML](http://spark.apache.org/docs/latest/ml-guide.html) and [SQL](http://spark.apache.org/docs/latest/sql-programming-guide.html). - Mention other migration guides as well because PySpark can get affected by it. ### Why are the changes needed? For better documentation. ### Does this PR introduce _any_ user-facing change? It fixes user-facing docs. However, it's not released out yet. ### How was this patch tested? Manually tested by running: ```bash cd docs SKIP_SCALADOC=1 SKIP_RDOC=1 SKIP_SQLDOC=1 jekyll serve --watch ``` Closes #31082 from HyukjinKwon/SPARK-34041. Authored-by: HyukjinKwon <[email protected]> Signed-off-by: HyukjinKwon <[email protected]> (cherry picked from commit aa388cf) Signed-off-by: HyukjinKwon <[email protected]>





What changes were proposed in this pull request?
This PR proposes to:
ML/MLlib->MLlib (DataFrame-based)/MLlib (RDD-based)in API reference pageWhy are the changes needed?
For better documentation.
Does this PR introduce any user-facing change?
It fixes user-facing docs. However, it's not released out yet.
How was this patch tested?
Manually tested by running:
cd docs SKIP_SCALADOC=1 SKIP_RDOC=1 SKIP_SQLDOC=1 jekyll serve --watch