[SPARK-49680][PYTHON] Limit Sphinx build parallelism to 4 by default
#48129
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
What changes were proposed in this pull request?
This PR aims to limit
Sphinxbuild parallelism to 4 by default for the following goals.SparkSubmitinvocation in large machines likec6i.24xlarge.SPHINXOPTS.Why are the changes needed?
Sphinxparallelism feature was added via the following on 2024-01-10.However, unfortunately, this breaks Python API doc generation in large machines because this means the number of parallel
SparkSubmitinvocation of PySpark. In addition, given that eachPySparkcurrently is launched withlocal[*], this ends upN * Npyspark.daemons.In other words, as of today, this default setting,
auto, seems to work on low-core machine likeGitHub Actionrunners (4 cores). For example, this breaksPythondocumentations build even on M3 Max environment and this is worse on large EC2 machines (c7i.24xlarge). You can see the failure locally like this.Does this PR introduce any user-facing change?
No, this is a dev-only change.
How was this patch tested?
Pass the CIs and do manual tests.
Was this patch authored or co-authored using generative AI tooling?
No.