-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-46103][PYTHON][INFRA][BUILD][DOCS] Enhancing PySpark documentation #44012
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
be8a300 to
8dd74da
Compare
|
|
||
| .. autosummary:: | ||
| :toctree: api/ | ||
| :template: autosummary/accessor_method.rst |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With the new version of Sphinx, the package name creation rules for rst files that are automatically created when building documents have changed, so we must manually adjust the package path using these templates.
This behavior is used in the same way in Pandas, so I referred to it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For example, previously the rst file was created as follows:
pyspark.sql.SparkSession.builder.appName
========================================
.. currentmodule:: pyspark.sql.SparkSession
.. automethod:: builder.appNameHowever, in newer Sphinx versions it is generated like this:
pyspark.sql.SparkSession.builder.appName
========================================
.. currentmodule:: pyspark.sql
.. automethod:: SparkSession.builder.appNameIn the case of functions used through internal classes or accessors like this, the package paths created in a new way will cause Sphinx build to fail. That's why we should use the customized template to correct the module path.
See also sphinx-doc/sphinx#7551.
| "navbar_end": ["version-switcher", "theme-switcher"], | ||
| "logo": { | ||
| "image_light": "_static/spark-logo-light.png", | ||
| "image_dark": "_static/spark-logo-dark.png", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FYI: The default mode for light/dark is auto, which will choose a theme based on the system settings from user, but we can specify one of dark or light as default manually if we want.
| .. autosummary:: | ||
| {% for item in attributes %} | ||
| ~{{ name }}.{{ item }} | ||
| {% if not (item == 'uid') %} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should manually exclude uid from documentation because it is an internal property. We don't include them our current documentation as well, but for some reason newer Sphinx version trying to generate the internal property unexpectedly.
| :toctree: api/ | ||
| :template: autosummary/accessor_method.rst | ||
|
|
||
| DataFrame.plot |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In newer versions of Sphinx, the build will fail because DataFrame.plot and Series.plot are determined to be duplicates of the list of functions described below such as DataFrame.plot.area, DataFrame.plot.barh, DataFrame.plot.bar, etc.
In fact, this behavior seems reasonable since .plot is simply an accessor keyword and not a function, so I believe we can just simply leave it out of the document.
| jinja2<3.0.0 | ||
| sphinx<3.1.0 | ||
| jinja2 | ||
| sphinx==4.2.0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see that other Sphinx versions do not generate documentation properly for some reason. I have tested as many combinations as possible with Jinja2 and pydata_sphinx_theme, but I have confirmed that Sphinx version 4.2.0 currently renders documents in the most optimal form. Will investigate further in the future to support the latest Sphinx if necessary.
|
|
||
| # Documentation (Python) | ||
| pydata_sphinx_theme | ||
| pydata_sphinx_theme==0.13 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I followed the version used in Pandas, and actually I believe this version render the document in the most optimal form after doing several version testing.
|
This is a nice improvement! |
|
Documentation build passed: https://github.com/itholic/spark/actions/runs/6991575840/job/19023836477
Other failures seems not related to this PR, but let me re-trigger the CI just to be sure. |
|
Merged to master. |
…nd `Jinja2` ### What changes were proposed in this pull request? Delete comments on `Sphinx` and `Jinja2` ### Why are the changes needed? they had been upgraded in #44012 ### Does this PR introduce _any_ user-facing change? no ### How was this patch tested? ci ### Was this patch authored or co-authored using generative AI tooling? no Closes #44046 from zhengruifeng/infra_Sphinx_nit. Authored-by: Ruifeng Zheng <[email protected]> Signed-off-by: Hyukjin Kwon <[email protected]>
### What changes were proposed in this pull request? #44012 unpinned jinja2 in doc build, this PR unpin it in Python linter. this pr is only for master branch, and won't affect branch-3.x daily build ### Why are the changes needed? to be consistent with requirements. ### Does this PR introduce _any_ user-facing change? no ### How was this patch tested? ci ### Was this patch authored or co-authored using generative AI tooling? no Closes #44051 from zhengruifeng/infra_linter_jinja. Authored-by: Ruifeng Zheng <[email protected]> Signed-off-by: Ruifeng Zheng <[email protected]>
### What changes were proposed in this pull request? 1.After pr #44012, the output format of some 'ipynb' tables displayed in HTML format has been disrupted. The pr aims to fix table format error in ipynb docs. - Before: <img width="792" alt="image" src="https://github.com/apache/spark/assets/15246973/2095a2ac-f0b5-44bd-a3c2-ce742d041243"> - After: <img width="739" alt="image" src="https://github.com/apache/spark/assets/15246973/ec0be72d-4dc0-44f4-ab75-d9668e32fc51"> 2.Fix some minor errors. ### Why are the changes needed? Fix bug. ### Does this PR introduce _any_ user-facing change? Yes, only for docs. ### How was this patch tested? Manually test. Pass GA. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #44049 from panbingkun/SPARK-46135. Authored-by: panbingkun <[email protected]> Signed-off-by: Hyukjin Kwon <[email protected]>
### What changes were proposed in this pull request? 1.After pr apache#44012, the output format of some 'ipynb' tables displayed in HTML format has been disrupted. The pr aims to fix table format error in ipynb docs. - Before: <img width="792" alt="image" src="https://github.com/apache/spark/assets/15246973/2095a2ac-f0b5-44bd-a3c2-ce742d041243"> - After: <img width="739" alt="image" src="https://github.com/apache/spark/assets/15246973/ec0be72d-4dc0-44f4-ab75-d9668e32fc51"> 2.Fix some minor errors. ### Why are the changes needed? Fix bug. ### Does this PR introduce _any_ user-facing change? Yes, only for docs. ### How was this patch tested? Manually test. Pass GA. ### Was this patch authored or co-authored using generative AI tooling? No. Closes apache#44049 from panbingkun/SPARK-46135. Authored-by: panbingkun <[email protected]> Signed-off-by: Hyukjin Kwon <[email protected]>

What changes were proposed in this pull request?
This PR proposes to enhance the PySpark documentation by leveraging modern Sphinx features and functionalities. The primary objective is to improve the overall user experience and readability of the documentation. To achieve this, the PR includes an upgrade of
SphinxandJinja2to their newer/latest versions, enabling us to use the latestpydata_sphinx_themefeatures such as light/dark mode toggling.Why are the changes needed?
Currently, the PySpark documentation is unable to utilize many of the advanced features available in recent
Sphinxversions due to older package versions. This limitation hinders the documentation's visual appeal and usability, particularly when compared to other projects like Pandas which have already adopted these enhancements. For example:Pandas API reference (better layout / switching light & dark mode available)
Dark mode
Light mode
PySpark API reference (less readable compare to pandas / no light & dark mode)
By updating the
SphinxandJinja2versions, we can significantly improve the documentation's layout, design, and interactive features, thereby enhancing the end-user experience.Does this PR introduce any user-facing change?
No API changes, but users will notice a more modern and user-friendly interface in the PySpark documentation. New features like light/dark mode and improved page layouts will be available as below:
Before
After
Dark mode
Light mode
How was this patch tested?
Manually built docs from local environment, and also tested combinations between various
Jinja2,Sphinxandpydata_sphinx_themeversions for best document rendering.Was this patch authored or co-authored using generative AI tooling?
No.