Skip to content

Conversation

@dongjoon-hyun
Copy link
Member

@dongjoon-hyun dongjoon-hyun commented Sep 16, 2024

What changes were proposed in this pull request?

This PR aims to limit Sphinx build parallelism to 4 by default for the following goals.

  • This will preserve the same speed in GitHub Action environment.
  • This will prevent the exhaustive SparkSubmit invocation in large machines like c6i.24xlarge.
  • The user still can override by providing SPHINXOPTS.

Why are the changes needed?

Sphinx parallelism feature was added via the following on 2024-01-10.

However, unfortunately, this breaks Python API doc generation in large machines because this means the number of parallel SparkSubmit invocation of PySpark. In addition, given that each PySpark currently is launched with local[*], this ends up N * N pyspark.daemons.

In other words, as of today, this default setting, auto, seems to work on low-core machine like GitHub Action runners (4 cores). For example, this breaks Python documentations build even on M3 Max environment and this is worse on large EC2 machines (c7i.24xlarge). You can see the failure locally like this.

$ build/sbt package -Phive-thriftserver
$ cd python/docs
$ make html
...
24/09/16 17:04:38 WARN Utils: Service 'SparkUI' could not bind on port 4040. Attempting port 4041.
24/09/16 17:04:38 WARN Utils: Service 'SparkUI' could not bind on port 4041. Attempting port 4042.
24/09/16 17:04:38 WARN Utils: Service 'SparkUI' could not bind on port 4042. Attempting port 4043.
24/09/16 17:04:38 WARN Utils: Service 'SparkUI' could not bind on port 4040. Attempting port 4041.
24/09/16 17:04:38 WARN Utils: Service 'SparkUI' could not bind on port 4041. Attempting port 4042.
24/09/16 17:04:38 WARN Utils: Service 'SparkUI' could not bind on port 4042. Attempting port 4043.
24/09/16 17:04:38 WARN Utils: Service 'SparkUI' could not bind on port 4043. Attempting port 4044.
24/09/16 17:04:39 WARN Utils: Service 'SparkUI' could not bind on port 4040. Attempting port 4041.
24/09/16 17:04:39 WARN Utils: Service 'SparkUI' could not bind on port 4041. Attempting port 4042.
24/09/16 17:04:39 WARN Utils: Service 'SparkUI' could not bind on port 4042. Attempting port 4043.
24/09/16 17:04:39 WARN Utils: Service 'SparkUI' could not bind on port 4043. Attempting port 4044.
24/09/16 17:04:39 WARN Utils: Service 'SparkUI' could not bind on port 4044. Attempting port 4045.
...
java.lang.OutOfMemoryError: Java heap space
...
24/09/16 14:09:55 WARN PythonRunner: Incomplete task 7.0 in stage 30 (TID 177) interrupted: Attempting to kill Python Worker
...
make: *** [html] Error 2

Does this PR introduce any user-facing change?

No, this is a dev-only change.

How was this patch tested?

Pass the CIs and do manual tests.

Was this patch authored or co-authored using generative AI tooling?

No.

@dongjoon-hyun
Copy link
Member Author

I found this issue during 4.0.0-preview2 RC preparations. Since the original PR arrived on January, it seems that there are many changes in Python side. Please consider this as an adjustment of AS-IS status.
cc @nchammas , @HyukjinKwon , @itholic

@dongjoon-hyun
Copy link
Member Author

Thank you so much for reviewing and approving during Holidays, Hyujkjin!
Merged to master.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants