-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-32179][SPARK-32188][PYTHON][DOCS] Replace and redesign the documentation base #29188
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| python/.eggs/ | ||
| python/deps | ||
| python/docs/_site/ | ||
| python/docs/source/reference/api/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is generated by autosummary plugin in Sphinx when autosummary_generate in conf.py is enabled. Each page of API or class under autosummary directive, for example, DataFrame.alias will be generated via that plugin as RST files.
HyukjinKwon
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
docs/img/spark-logo-reverse.png image is from "white logo" at http://spark.apache.org/faq.html.
|
|
||
| {% endif %} | ||
| {% endblock %} | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is needed to let autosummary plugin document the methods in a class. For example, when we use this template, it describes methods documentation on the bottom. See pyspark.ml.Transformer as an example.
Without this template, it only lists the methods and attributes without showing the documentation in details. See pyspark.sql.DataFrameNaFunctions as an example.
| # Remove previously generated rst files. Ignore errors just in case it stops | ||
| # generating whole docs. | ||
| shutil.rmtree( | ||
| "%s/reference/api" % os.path.dirname(os.path.abspath(__file__)), ignore_errors=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
autosummary generates RST files but don't remove it back. Here we always remove the generated RST files so the leftover doesn't cause any side effect.
|
@BryanCutler, @huaxingao, @ueshin, @viirya, @srowen, @dongjoon-hyun, @WeichenXu123, @zhengruifeng, @holdenk, @zero323, can you guys take a look when you are available? |
This comment has been minimized.
This comment has been minimized.
54ffa45 to
1c6fe6c
Compare
srowen
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks nice! yes will take a little more work to maintain the modules / class lists, so whatever we can do to keep it simple is welcome.
|
Excited to see the site improve, I’ll take some time to review it this week. |
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
viirya
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks great! Besides visual effects like colors, it looks more structured.
|
Looks nice. I miss a bit direct access to docstrings, but I guess that's a reasonable trade-off. I wonder if there is some non-hacky way to organize functions into logical groups, similarly to what ScalaDoc does. |
|
Looks really nice! It's more organized this way. |
|
The demo website looks nice although I didn't generate manually this PR~ |
I tried hard but looked difficult to do. I will take a look one more time. |
| By default, it follows casting rules to :class:`pyspark.sql.types.DateType` if the format | ||
| is omitted. Equivalent to ``col.cast("date")``. | ||
| .. _datetime pattern: https://spark.apache.org/docs/latest/sql-ref-datetime-pattern.html |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is needed because we're now creating each page for each API. For example, see https://hyukjin-spark.readthedocs.io/en/stable/reference/api/pyspark.sql.functions.to_date.html
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
1f121d5 to
3c89dab
Compare
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
|
I believe this is ready for a look or possibly ready to go. |
This comment has been minimized.
This comment has been minimized.
|
Test build #126449 has finished for PR 29188 at commit
|
|
retest this please |
|
I will merge and go ahead given that there are multiple positive feedback here. |
|
Merged to master. |
|
Let me know if you guys have any concern on this. I will be working on this to complete the other pages for a while. |
|
Test build #126627 has finished for PR 29188 at commit
|
|
The demo website looks great! |
### What changes were proposed in this pull request? This PR proposes to write the main page of PySpark documentation. The base work is finished at #29188. ### Why are the changes needed? For better usability and readability in PySpark documentation. ### Does this PR introduce _any_ user-facing change? Yes, it creates a new main page as below:  ### How was this patch tested? Manually built the PySpark documentation. ```bash cd python make clean html ``` Closes #29320 from HyukjinKwon/SPARK-32507. Authored-by: HyukjinKwon <[email protected]> Signed-off-by: HyukjinKwon <[email protected]>
What changes were proposed in this pull request?
This PR proposes to redesign the PySpark documentation.
I made a demo site to make it easier to review: https://hyukjin-spark.readthedocs.io/en/stable/reference/index.html.
Here is the initial draft for the final PySpark docs shape: https://hyukjin-spark.readthedocs.io/en/latest/index.html.
In more details, this PR proposes:
Use pydata_sphinx_theme theme - pandas and Koalas use this theme. The CSS overwrite is ported from Koalas. The colours in the CSS were actually chosen by designers to use in Spark.
Use the Sphinx option to separate
sourceandbuilddirectories as the documentation pages will likely grow.Port current API documentation into the new style. It mimics Koalas and pandas to use the theme most effectively.
One disadvantage of this approach is that you should list up APIs or classes; however, I think this isn't a big issue in PySpark since we're being conservative on adding APIs. I also intentionally listed classes only instead of functions in ML and MLlib to make it relatively easier to manage.
Why are the changes needed?
Often I hear the complaints, from the users, that current PySpark documentation is pretty messy to read - https://spark.apache.org/docs/latest/api/python/index.html compared other projects such as pandas and Koalas.
It would be nicer if we can make it more organised instead of just listing all classes, methods and attributes to make it easier to navigate.
Also, the documentation has been there from almost the very first version of PySpark. Maybe it's time to update it.
Does this PR introduce any user-facing change?
Yes, PySpark API documentation will be redesigned.
How was this patch tested?
Manually tested, and the demo site was made to show.