Skip to content

Conversation

@itholic
Copy link
Contributor

@itholic itholic commented Nov 25, 2023

What changes were proposed in this pull request?

This PR proposes to enhance the PySpark documentation by leveraging modern Sphinx features and functionalities. The primary objective is to improve the overall user experience and readability of the documentation. To achieve this, the PR includes an upgrade of Sphinx and Jinja2 to their newer/latest versions, enabling us to use the latest pydata_sphinx_theme features such as light/dark mode toggling.

Why are the changes needed?

Currently, the PySpark documentation is unable to utilize many of the advanced features available in recent Sphinx versions due to older package versions. This limitation hinders the documentation's visual appeal and usability, particularly when compared to other projects like Pandas which have already adopted these enhancements. For example:

Pandas API reference (better layout / switching light & dark mode available)

Dark mode

Screenshot 2023-11-26 at 5 43 29 AM

Light mode

Screenshot 2023-11-26 at 5 45 01 AM

PySpark API reference (less readable compare to pandas / no light & dark mode)

Screenshot 2023-11-26 at 5 43 48 AM

By updating the Sphinx and Jinja2 versions, we can significantly improve the documentation's layout, design, and interactive features, thereby enhancing the end-user experience.

Does this PR introduce any user-facing change?

No API changes, but users will notice a more modern and user-friendly interface in the PySpark documentation. New features like light/dark mode and improved page layouts will be available as below:

Before

Screenshot 2023-11-26 at 5 43 48 AM

After

Dark mode

Screenshot 2023-11-26 at 6 17 13 AM

Light mode

Screenshot 2023-11-26 at 6 16 47 AM

How was this patch tested?

Manually built docs from local environment, and also tested combinations between various Jinja2, Sphinx and pydata_sphinx_theme versions for best document rendering.

Was this patch authored or co-authored using generative AI tooling?

No.


.. autosummary::
:toctree: api/
:template: autosummary/accessor_method.rst
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With the new version of Sphinx, the package name creation rules for rst files that are automatically created when building documents have changed, so we must manually adjust the package path using these templates.

This behavior is used in the same way in Pandas, so I referred to it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For example, previously the rst file was created as follows:

pyspark.sql.SparkSession.builder.appName
========================================

.. currentmodule:: pyspark.sql.SparkSession

.. automethod:: builder.appName

However, in newer Sphinx versions it is generated like this:

pyspark.sql.SparkSession.builder.appName
========================================

.. currentmodule:: pyspark.sql

.. automethod:: SparkSession.builder.appName

In the case of functions used through internal classes or accessors like this, the package paths created in a new way will cause Sphinx build to fail. That's why we should use the customized template to correct the module path.

See also sphinx-doc/sphinx#7551.

"navbar_end": ["version-switcher", "theme-switcher"],
"logo": {
"image_light": "_static/spark-logo-light.png",
"image_dark": "_static/spark-logo-dark.png",
Copy link
Contributor Author

@itholic itholic Nov 25, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI: The default mode for light/dark is auto, which will choose a theme based on the system settings from user, but we can specify one of dark or light as default manually if we want.

.. autosummary::
{% for item in attributes %}
~{{ name }}.{{ item }}
{% if not (item == 'uid') %}
Copy link
Contributor Author

@itholic itholic Nov 25, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should manually exclude uid from documentation because it is an internal property. We don't include them our current documentation as well, but for some reason newer Sphinx version trying to generate the internal property unexpectedly.

:toctree: api/
:template: autosummary/accessor_method.rst

DataFrame.plot
Copy link
Contributor Author

@itholic itholic Nov 25, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In newer versions of Sphinx, the build will fail because DataFrame.plot and Series.plot are determined to be duplicates of the list of functions described below such as DataFrame.plot.area, DataFrame.plot.barh, DataFrame.plot.bar, etc.

In fact, this behavior seems reasonable since .plot is simply an accessor keyword and not a function, so I believe we can just simply leave it out of the document.

jinja2<3.0.0
sphinx<3.1.0
jinja2
sphinx==4.2.0
Copy link
Contributor Author

@itholic itholic Nov 25, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see that other Sphinx versions do not generate documentation properly for some reason. I have tested as many combinations as possible with Jinja2 and pydata_sphinx_theme, but I have confirmed that Sphinx version 4.2.0 currently renders documents in the most optimal form. Will investigate further in the future to support the latest Sphinx if necessary.


# Documentation (Python)
pydata_sphinx_theme
pydata_sphinx_theme==0.13
Copy link
Contributor Author

@itholic itholic Nov 25, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I followed the version used in Pandas, and actually I believe this version render the document in the most optimal form after doing several version testing.

@github-actions github-actions bot added the INFRA label Nov 25, 2023
@itholic itholic changed the title [SPARK-46103][PYTHON][BUILD][DOCS] Enhancing PySpark documentation [SPARK-46103][PYTHON][INFRA][BUILD][DOCS] Enhancing PySpark documentation Nov 25, 2023
@HyukjinKwon
Copy link
Member

This is a nice improvement!

@itholic
Copy link
Contributor Author

itholic commented Nov 26, 2023

Documentation build passed: https://github.com/itholic/spark/actions/runs/6991575840/job/19023836477

Screenshot 2023-11-26 at 1 27 29 PM

Other failures seems not related to this PR, but let me re-trigger the CI just to be sure.

@HyukjinKwon
Copy link
Member

Merged to master.

HyukjinKwon pushed a commit that referenced this pull request Nov 28, 2023
…nd `Jinja2`

### What changes were proposed in this pull request?
Delete comments on `Sphinx` and `Jinja2`

### Why are the changes needed?
they had been upgraded in #44012

### Does this PR introduce _any_ user-facing change?
no

### How was this patch tested?
ci

### Was this patch authored or co-authored using generative AI tooling?
no

Closes #44046 from zhengruifeng/infra_Sphinx_nit.

Authored-by: Ruifeng Zheng <[email protected]>
Signed-off-by: Hyukjin Kwon <[email protected]>
zhengruifeng added a commit that referenced this pull request Nov 28, 2023
### What changes were proposed in this pull request?
#44012 unpinned jinja2 in doc build, this PR unpin it in Python linter.

this pr is only for master branch, and won't affect branch-3.x daily build

### Why are the changes needed?
to be consistent with requirements.

### Does this PR introduce _any_ user-facing change?
no

### How was this patch tested?
ci

### Was this patch authored or co-authored using generative AI tooling?
no

Closes #44051 from zhengruifeng/infra_linter_jinja.

Authored-by: Ruifeng Zheng <[email protected]>
Signed-off-by: Ruifeng Zheng <[email protected]>
HyukjinKwon pushed a commit that referenced this pull request Dec 1, 2023
### What changes were proposed in this pull request?

1.After pr #44012, the output format of some 'ipynb' tables displayed in HTML format has been disrupted. The pr aims to fix table format error in ipynb docs.

- Before:
   <img width="792" alt="image" src="https://github.com/apache/spark/assets/15246973/2095a2ac-f0b5-44bd-a3c2-ce742d041243">

- After:
   <img width="739" alt="image" src="https://github.com/apache/spark/assets/15246973/ec0be72d-4dc0-44f4-ab75-d9668e32fc51">

2.Fix some minor errors.
### Why are the changes needed?
Fix bug.

### Does this PR introduce _any_ user-facing change?
Yes, only for docs.

### How was this patch tested?
Manually test.
Pass GA.

### Was this patch authored or co-authored using generative AI tooling?
No.

Closes #44049 from panbingkun/SPARK-46135.

Authored-by: panbingkun <[email protected]>
Signed-off-by: Hyukjin Kwon <[email protected]>
asl3 pushed a commit to asl3/spark that referenced this pull request Dec 5, 2023
### What changes were proposed in this pull request?

1.After pr apache#44012, the output format of some 'ipynb' tables displayed in HTML format has been disrupted. The pr aims to fix table format error in ipynb docs.

- Before:
   <img width="792" alt="image" src="https://github.com/apache/spark/assets/15246973/2095a2ac-f0b5-44bd-a3c2-ce742d041243">

- After:
   <img width="739" alt="image" src="https://github.com/apache/spark/assets/15246973/ec0be72d-4dc0-44f4-ab75-d9668e32fc51">

2.Fix some minor errors.
### Why are the changes needed?
Fix bug.

### Does this PR introduce _any_ user-facing change?
Yes, only for docs.

### How was this patch tested?
Manually test.
Pass GA.

### Was this patch authored or co-authored using generative AI tooling?
No.

Closes apache#44049 from panbingkun/SPARK-46135.

Authored-by: panbingkun <[email protected]>
Signed-off-by: Hyukjin Kwon <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants