Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .github/workflows/build_and_test.yml
Original file line number Diff line number Diff line change
Expand Up @@ -751,7 +751,7 @@ jobs:
# See also https://issues.apache.org/jira/browse/SPARK-35375.
# Pin the MarkupSafe to 2.0.1 to resolve the CI error.
# See also https://issues.apache.org/jira/browse/SPARK-38279.
python3.9 -m pip install 'sphinx<3.1.0' mkdocs pydata_sphinx_theme sphinx-copybutton nbsphinx numpydoc 'jinja2<3.0.0' 'markupsafe==2.0.1' 'pyzmq<24.0.0'
python3.9 -m pip install 'sphinx==4.2.0' mkdocs 'pydata_sphinx_theme==0.13' sphinx-copybutton nbsphinx numpydoc jinja2 'markupsafe==2.0.1' 'pyzmq<24.0.0'
python3.9 -m pip install ipython_genutils # See SPARK-38517
python3.9 -m pip install sphinx_plotly_directive 'numpy>=1.20.0' pyarrow pandas 'plotly>=4.8'
python3.9 -m pip install 'docutils<0.18.0' # See SPARK-39421
Expand Down
6 changes: 3 additions & 3 deletions dev/requirements.txt
Original file line number Diff line number Diff line change
Expand Up @@ -31,12 +31,12 @@ pandas-stubs<1.2.0.54
mkdocs

# Documentation (Python)
pydata_sphinx_theme
pydata_sphinx_theme==0.13
Copy link
Contributor Author

@itholic itholic Nov 25, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I followed the version used in Pandas, and actually I believe this version render the document in the most optimal form after doing several version testing.

ipython
nbsphinx
numpydoc
jinja2<3.0.0
sphinx<3.1.0
jinja2
sphinx==4.2.0
Copy link
Contributor Author

@itholic itholic Nov 25, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see that other Sphinx versions do not generate documentation properly for some reason. I have tested as many combinations as possible with Jinja2 and pydata_sphinx_theme, but I have confirmed that Sphinx version 4.2.0 currently renders documents in the most optimal form. Will investigate further in the future to support the latest Sphinx if necessary.

sphinx-plotly-directive
sphinx-copybutton
docutils<0.18.0
Expand Down
Binary file added python/docs/source/_static/spark-logo-dark.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added python/docs/source/_static/spark-logo-light.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
{{ fullname }}
{{ underline }}

.. currentmodule:: {{ module + "." + objname.split(".")[0] }}

.. autoattribute:: {{ ".".join(objname.split(".")[1:]) }}
6 changes: 6 additions & 0 deletions python/docs/source/_templates/autosummary/accessor_method.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
{{ fullname }}
{{ underline }}

.. currentmodule:: {{ module + "." + objname.split(".")[0] }}

.. automethod:: {{ ".".join(objname.split(".")[1:]) }}
Original file line number Diff line number Diff line change
Expand Up @@ -47,7 +47,9 @@

.. autosummary::
{% for item in attributes %}
~{{ name }}.{{ item }}
{% if not (item == 'uid') %}
Copy link
Contributor Author

@itholic itholic Nov 25, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should manually exclude uid from documentation because it is an internal property. We don't include them our current documentation as well, but for some reason newer Sphinx version trying to generate the internal property unexpectedly.

~{{ name }}.{{ item }}
{% endif %}
{%- endfor %}

{% endif %}
Expand Down
53 changes: 53 additions & 0 deletions python/docs/source/_templates/autosummary/plot_class.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,53 @@
.. Licensed to the Apache Software Foundation (ASF) under one
or more contributor license agreements. See the NOTICE file
distributed with this work for additional information
regarding copyright ownership. The ASF licenses this file
to you under the Apache License, Version 2.0 (the
"License"); you may not use this file except in compliance
with the License. You may obtain a copy of the License at

.. http://www.apache.org/licenses/LICENSE-2.0

.. Unless required by applicable law or agreed to in writing,
software distributed under the License is distributed on an
"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
KIND, either express or implied. See the License for the
specific language governing permissions and limitations
under the License.

{{ fullname }}
{{ underline }}

.. currentmodule:: {{ module + "." + objname.split(".")[0] }}

.. automethod:: {{ ".".join(objname.split(".")[1:]) }}

{% if '__init__' in methods %}
{% set caught_result = methods.remove('__init__') %}
{% endif %}

{% block methods %}
{% if methods %}

.. rubric:: Methods

.. autosummary::
{% for item in methods %}
~{{ name.split(".")[1] }}.{{ item }}
{%- endfor %}

{% endif %}
{% endblock %}

{% block attributes_summary %}
{% if attributes %}

.. rubric:: Attributes

.. autosummary::
{% for item in attributes %}
~{{ name.split(".")[1] }}.{{ item }}
{%- endfor %}

{% endif %}
{% endblock %}
6 changes: 5 additions & 1 deletion python/docs/source/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -194,7 +194,11 @@
# further. For a list of options available for each theme, see the
# documentation.
html_theme_options = {
"navbar_end": ["version-switcher"]
"navbar_end": ["version-switcher", "theme-switcher"],
"logo": {
"image_light": "_static/spark-logo-light.png",
"image_dark": "_static/spark-logo-dark.png",
Copy link
Contributor Author

@itholic itholic Nov 25, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI: The default mode for light/dark is auto, which will choose a theme based on the system settings from user, but we can specify one of dark or light as default manually if we want.

}
}

# Add any paths that contain custom themes here, relative to this directory.
Expand Down
8 changes: 7 additions & 1 deletion python/docs/source/reference/pyspark.pandas/frame.rst
Original file line number Diff line number Diff line change
Expand Up @@ -299,6 +299,7 @@ in Spark. These can be accessed by ``DataFrame.spark.<function/property>``.

.. autosummary::
:toctree: api/
:template: autosummary/accessor_method.rst
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With the new version of Sphinx, the package name creation rules for rst files that are automatically created when building documents have changed, so we must manually adjust the package path using these templates.

This behavior is used in the same way in Pandas, so I referred to it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For example, previously the rst file was created as follows:

pyspark.sql.SparkSession.builder.appName
========================================

.. currentmodule:: pyspark.sql.SparkSession

.. automethod:: builder.appName

However, in newer Sphinx versions it is generated like this:

pyspark.sql.SparkSession.builder.appName
========================================

.. currentmodule:: pyspark.sql

.. automethod:: SparkSession.builder.appName

In the case of functions used through internal classes or accessors like this, the package paths created in a new way will cause Sphinx build to fail. That's why we should use the customized template to correct the module path.

See also sphinx-doc/sphinx#7551.


DataFrame.spark.frame
DataFrame.spark.cache
Expand All @@ -319,8 +320,8 @@ specific plotting methods of the form ``DataFrame.plot.<kind>``.

.. autosummary::
:toctree: api/
:template: autosummary/accessor_method.rst

DataFrame.plot
Copy link
Contributor Author

@itholic itholic Nov 25, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In newer versions of Sphinx, the build will fail because DataFrame.plot and Series.plot are determined to be duplicates of the list of functions described below such as DataFrame.plot.area, DataFrame.plot.barh, DataFrame.plot.bar, etc.

In fact, this behavior seems reasonable since .plot is simply an accessor keyword and not a function, so I believe we can just simply leave it out of the document.

DataFrame.plot.area
DataFrame.plot.barh
DataFrame.plot.bar
Expand All @@ -330,6 +331,10 @@ specific plotting methods of the form ``DataFrame.plot.<kind>``.
DataFrame.plot.pie
DataFrame.plot.scatter
DataFrame.plot.density

.. autosummary::
:toctree: api/

DataFrame.hist
DataFrame.boxplot
DataFrame.kde
Expand All @@ -341,6 +346,7 @@ These can be accessed by ``DataFrame.pandas_on_spark.<function/property>``.

.. autosummary::
:toctree: api/
:template: autosummary/accessor_method.rst

DataFrame.pandas_on_spark.apply_batch
DataFrame.pandas_on_spark.transform_batch
12 changes: 12 additions & 0 deletions python/docs/source/reference/pyspark.pandas/indexing.rst
Original file line number Diff line number Diff line change
Expand Up @@ -129,8 +129,14 @@ in Spark. These can be accessed by ``Index.spark.<function/property>``.

.. autosummary::
:toctree: api/
:template: autosummary/accessor_attribute.rst

Index.spark.column

.. autosummary::
:toctree: api/
:template: autosummary/accessor_method.rst

Index.spark.transform

Sorting
Expand Down Expand Up @@ -308,9 +314,15 @@ in Spark. These can be accessed by ``MultiIndex.spark.<function/property>``.

.. autosummary::
:toctree: api/
:template: autosummary/accessor_attribute.rst

MultiIndex.spark.data_type
MultiIndex.spark.column

.. autosummary::
:toctree: api/
:template: autosummary/accessor_method.rst

MultiIndex.spark.transform

MultiIndex Sorting
Expand Down
5 changes: 5 additions & 0 deletions python/docs/source/reference/pyspark.pandas/io.rst
Original file line number Diff line number Diff line change
Expand Up @@ -69,6 +69,11 @@ Generic Spark I/O
:toctree: api/

read_spark_io

.. autosummary::
:toctree: api/
:template: autosummary/accessor_method.rst

DataFrame.spark.to_spark_io

Flat File / CSV
Expand Down
22 changes: 21 additions & 1 deletion python/docs/source/reference/pyspark.pandas/series.rst
Original file line number Diff line number Diff line change
Expand Up @@ -270,8 +270,14 @@ in Spark. These can be accessed by ``Series.spark.<function/property>``.

.. autosummary::
:toctree: api/
:template: autosummary/accessor_attribute.rst

Series.spark.column

.. autosummary::
:toctree: api/
:template: autosummary/accessor_method.rst

Series.spark.transform
Series.spark.apply

Expand Down Expand Up @@ -304,6 +310,7 @@ Datetime Properties

.. autosummary::
:toctree: api/
:template: autosummary/accessor_attribute.rst

Series.dt.date
Series.dt.year
Expand Down Expand Up @@ -333,6 +340,7 @@ Datetime Methods

.. autosummary::
:toctree: api/
:template: autosummary/accessor_method.rst

Series.dt.normalize
Series.dt.strftime
Expand All @@ -353,6 +361,7 @@ like ``Series.str.<function/property>``.

.. autosummary::
:toctree: api/
:template: autosummary/accessor_method.rst

Series.str.capitalize
Series.str.cat
Expand Down Expand Up @@ -416,10 +425,16 @@ the ``Series.cat`` accessor.

.. autosummary::
:toctree: api/
:template: autosummary/accessor_attribute.rst

Series.cat.categories
Series.cat.ordered
Series.cat.codes

.. autosummary::
:toctree: api/
:template: autosummary/accessor_method.rst

Series.cat.rename_categories
Series.cat.reorder_categories
Series.cat.add_categories
Expand All @@ -438,8 +453,8 @@ specific plotting methods of the form ``Series.plot.<kind>``.

.. autosummary::
:toctree: api/
:template: autosummary/accessor_method.rst

Series.plot
Series.plot.area
Series.plot.bar
Series.plot.barh
Expand All @@ -449,6 +464,10 @@ specific plotting methods of the form ``Series.plot.<kind>``.
Series.plot.line
Series.plot.pie
Series.plot.kde

.. autosummary::
:toctree: api/

Series.hist

Serialization / IO / Conversion
Expand Down Expand Up @@ -476,6 +495,7 @@ These can be accessed by ``Series.pandas_on_spark.<function/property>``.

.. autosummary::
:toctree: api/
:template: autosummary/accessor_method.rst

Series.pandas_on_spark.transform_batch

14 changes: 14 additions & 0 deletions python/docs/source/reference/pyspark.sql/spark_session.rst
Original file line number Diff line number Diff line change
Expand Up @@ -29,12 +29,21 @@ See also :class:`SparkSession`.
:toctree: api/

SparkSession.active

.. autosummary::
:toctree: api/
:template: autosummary/accessor_method.rst

SparkSession.builder.appName
SparkSession.builder.config
SparkSession.builder.enableHiveSupport
SparkSession.builder.getOrCreate
SparkSession.builder.master
SparkSession.builder.remote

.. autosummary::
:toctree: api/

SparkSession.catalog
SparkSession.conf
SparkSession.createDataFrame
Expand All @@ -58,8 +67,13 @@ Spark Connect Only

.. autosummary::
:toctree: api/
:template: autosummary/accessor_method.rst

SparkSession.builder.create

.. autosummary::
:toctree: api/

SparkSession.addArtifact
SparkSession.addArtifacts
SparkSession.copyFromLocalToFs
Expand Down