[SPARK-46668][DOCS] Parallelize Sphinx build of Python API docs #44680

nchammas · 2024-01-11T04:15:20Z

What changes were proposed in this pull request?

Upgrade to Sphinx 4.5.0, which is the latest in the 4.x line and includes the fix for parallel builds on macOS.

Enable parallel Sphinx workers to build the Python API docs.

I experimented with a few different values, and auto seems to work best. Configuring 4 workers seems to yield the same improvement as auto, suggesting parallelization beyond that is ineffective due to some sort of resource contention. But I left it as auto since that's more dynamic and should work better across varied environments.

On my 16-core Intel workstation, the runtime of make html was cut by ~60%.

# `make html` @ master
real    43m51.167s
user    41m43.526s
sys     0m39.651s

# `make html` with parallel workers
real    17m8.424s
user    174m42.051s
sys     5m8.824s

Here on CI, the "Run documentation build" step seems also to have improved from the usual of ~50 minutes down to ~31 minutes.

Why are the changes needed?

This saves developer time and CI time.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

I manually built and reviewed the docs using:

SKIP_SCALADOC=1 SKIP_SQLDOC=1 SKIP_RDOC=1 time bundle exec jekyll build

Was this patch authored or co-authored using generative AI tooling?

No.

nchammas · 2024-01-11T05:11:02Z

cc @itholic and @HyukjinKwon. This PR builds on #44012.

itholic

LGTM when CI passing.

Nice fix.

HyukjinKwon · 2024-01-11T07:59:22Z

Merged to master.

dongjoon-hyun · 2024-09-16T23:42:18Z

python/docs/Makefile


 # You can set these variables from the command line.
-SPHINXOPTS    ?= "-W"
+SPHINXOPTS    ?= "-W" "-j" "auto"


Unfortunately, this breaks Python API doc generation in many core machines because this means the number of parallel SparkSubmit invocation of PySpark. In addition, given that each PySpark currently is launched with local[*], this ends up N * N pyspark.daemon.

As of today, this setting seems to work on low-core machine like GitHub Action runners (4 cores). This breaks Python documentations build even on M3 Max environment and this is worse on large EC2 machines (c7i.24xlarge). You can see the failure locally like this.

$ build/sbt package -Phive-thriftserver $ cd python/docs $ make html ... java.lang.OutOfMemoryError: Java heap space ... 24/09/16 14:09:55 WARN PythonRunner: Incomplete task 7.0 in stage 30 (TID 177) interrupted: Attempting to kill Python Worker ... make: *** [html] Error 2

As of today, this setting seems to work on low-core machine like GitHub Action runners (4 cores).

It seems something about the build has changed since this PR was merged in ~8 months ago. As stated in the PR description, I tested auto on a 16-core Intel and it worked fine. And I remember running the build many times to get a representative runtime to report.

this ends up N * N pyspark.daemon

Perhaps this was not the case back in January, for some reason. 🤷‍♂️

dongjoon-hyun · 2024-09-16T23:53:14Z

Here is a PR with new default setting while keeping the speed of GitHub Action CI. The users also override this via SPHINXOPT.

[SPARK-49680][PYTHON] Limit Sphinx build parallelism to 4 by default #48129

### What changes were proposed in this pull request? This PR aims to limit `Sphinx` build parallelism to 4 by default for the following goals. - This will preserve the same speed in GitHub Action environment. - This will prevent the exhaustive `SparkSubmit` invocation in large machines like `c6i.24xlarge`. - The user still can override by providing `SPHINXOPTS`. ### Why are the changes needed? `Sphinx` parallelism feature was added via the following on 2024-01-10. - #44680 However, unfortunately, this breaks Python API doc generation in large machines because this means the number of parallel `SparkSubmit` invocation of PySpark. In addition, given that each `PySpark` currently is launched with `local[*]`, this ends up `N * N` `pyspark.daemon`s. In other words, as of today, this default setting, `auto`, seems to work on low-core machine like `GitHub Action` runners (4 cores). For example, this breaks `Python` documentations build even on M3 Max environment and this is worse on large EC2 machines (c7i.24xlarge). You can see the failure locally like this. ``` $ build/sbt package -Phive-thriftserver $ cd python/docs $ make html ... 24/09/16 17:04:38 WARN Utils: Service 'SparkUI' could not bind on port 4040. Attempting port 4041. 24/09/16 17:04:38 WARN Utils: Service 'SparkUI' could not bind on port 4041. Attempting port 4042. 24/09/16 17:04:38 WARN Utils: Service 'SparkUI' could not bind on port 4042. Attempting port 4043. 24/09/16 17:04:38 WARN Utils: Service 'SparkUI' could not bind on port 4040. Attempting port 4041. 24/09/16 17:04:38 WARN Utils: Service 'SparkUI' could not bind on port 4041. Attempting port 4042. 24/09/16 17:04:38 WARN Utils: Service 'SparkUI' could not bind on port 4042. Attempting port 4043. 24/09/16 17:04:38 WARN Utils: Service 'SparkUI' could not bind on port 4043. Attempting port 4044. 24/09/16 17:04:39 WARN Utils: Service 'SparkUI' could not bind on port 4040. Attempting port 4041. 24/09/16 17:04:39 WARN Utils: Service 'SparkUI' could not bind on port 4041. Attempting port 4042. 24/09/16 17:04:39 WARN Utils: Service 'SparkUI' could not bind on port 4042. Attempting port 4043. 24/09/16 17:04:39 WARN Utils: Service 'SparkUI' could not bind on port 4043. Attempting port 4044. 24/09/16 17:04:39 WARN Utils: Service 'SparkUI' could not bind on port 4044. Attempting port 4045. ... java.lang.OutOfMemoryError: Java heap space ... 24/09/16 14:09:55 WARN PythonRunner: Incomplete task 7.0 in stage 30 (TID 177) interrupted: Attempting to kill Python Worker ... make: *** [html] Error 2 ``` ### Does this PR introduce _any_ user-facing change? No, this is a dev-only change. ### How was this patch tested? Pass the CIs and do manual tests. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #48129 from dongjoon-hyun/SPARK-49680. Authored-by: Dongjoon Hyun <[email protected]> Signed-off-by: Dongjoon Hyun <[email protected]>

nchammas added 2 commits January 10, 2024 23:03

sphinx -> 4.5

5d30cc4

enable parallel sphinx workers

23cb58b

github-actions bot added BUILD DOCS INFRA PYTHON labels Jan 11, 2024

trigger build

9c232ee

itholic approved these changes Jan 11, 2024

View reviewed changes

HyukjinKwon approved these changes Jan 11, 2024

View reviewed changes

HyukjinKwon closed this in 5eec99d Jan 11, 2024

nchammas deleted the SPARK-46668-parallel-sphinx branch January 11, 2024 17:05

dongjoon-hyun reviewed Sep 16, 2024

View reviewed changes

dongjoon-hyun mentioned this pull request Sep 16, 2024

[SPARK-49680][PYTHON] Limit Sphinx build parallelism to 4 by default #48129

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[SPARK-46668][DOCS] Parallelize Sphinx build of Python API docs #44680

[SPARK-46668][DOCS] Parallelize Sphinx build of Python API docs #44680

Uh oh!

nchammas commented Jan 11, 2024 •

edited

Loading

Uh oh!

nchammas commented Jan 11, 2024

Uh oh!

itholic left a comment

Uh oh!

HyukjinKwon commented Jan 11, 2024

Uh oh!

dongjoon-hyun Sep 16, 2024 •

edited

Loading

Uh oh!

nchammas Sep 17, 2024

Uh oh!

dongjoon-hyun commented Sep 16, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

[SPARK-46668][DOCS] Parallelize Sphinx build of Python API docs #44680

[SPARK-46668][DOCS] Parallelize Sphinx build of Python API docs #44680

Uh oh!

Conversation

nchammas commented Jan 11, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Was this patch authored or co-authored using generative AI tooling?

Uh oh!

nchammas commented Jan 11, 2024

Uh oh!

itholic left a comment

Choose a reason for hiding this comment

Uh oh!

HyukjinKwon commented Jan 11, 2024

Uh oh!

dongjoon-hyun Sep 16, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

nchammas Sep 17, 2024

Choose a reason for hiding this comment

Uh oh!

dongjoon-hyun commented Sep 16, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

nchammas commented Jan 11, 2024 •

edited

Loading

dongjoon-hyun Sep 16, 2024 •

edited

Loading