[SPARK-53907][K8S] Support `spark.kubernetes.allocation.maximum` #52615

dongjoon-hyun · 2025-10-14T20:56:38Z

What changes were proposed in this pull request?

This PR aims to support spark.kubernetes.allocation.maximum.

Why are the changes needed?

Since we use AtomicInteger ID generator, we hit the overflow at Int.MaxValue. We had better throw exceptions explicitly in this case because it's highly a malfunctioning situation when a Spark driver tries to create 2147483647 (Int.MaxValue) executor pods.

spark/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/scheduler/cluster/k8s/ExecutorPodsAllocator.scala

Line 49 in e05c75e

protected val EXECUTOR_ID_COUNTER = new AtomicInteger(0)

$ jshell
|  Welcome to JShell -- Version 21.0.8
|  For an introduction type: /help intro

jshell> var x = new java.util.concurrent.atomic.AtomicInteger(Integer.MAX_VALUE)
x ==> 2147483647

jshell> x.incrementAndGet()
$2 ==> -2147483648

jshell> x.incrementAndGet()
$3 ==> -2147483647

jshell> x.incrementAndGet()
$4 ==> -2147483646

jshell> var x = new java.util.concurrent.atomic.AtomicInteger(-1)
x ==> -1

jshell> x.incrementAndGet()
$6 ==> 0

jshell> x.incrementAndGet()
$7 ==> 1

Does this PR introduce any user-facing change?

Practically no because a normal Spark job and K8s cluster cannot handle 2147483647 executor pod creation. If a user has this case, it means the ID overflow happens and the ID will be rotated to 0 and the executor IDs will be reused. It's already a bug situation.

How was this patch tested?

Pass the CIs with newly added test cases.

Was this patch authored or co-authored using generative AI tooling?

No.

kazuyukitanimura

Thank you @dongjoon-hyun LGTM

dongjoon-hyun · 2025-10-14T21:32:53Z

Thank you, @kazuyukitanimura .

cc @peter-toth

vrozov · 2025-10-14T22:12:08Z

+1, LGTM

dongjoon-hyun · 2025-10-14T22:20:02Z

Thank you, @vrozov .

dongjoon-hyun · 2025-10-15T00:18:53Z

Thank you, @HyukjinKwon . Merged to master for Apache Spark 4.1.0-preview3.

### What changes were proposed in this pull request? This PR aims to document newly added K8s configurations as a part of Apache Spark 4.1.0 preparation. ### Why are the changes needed? To sync the document with K8s `Config.scala`. For now, three PRs added four configurations. - #51522 - #51811 - #52615 ### Does this PR introduce _any_ user-facing change? No behavior change. This is only adding new configuration documents. ### How was this patch tested? Manual review. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #52618 from dongjoon-hyun/SPARK-53913. Authored-by: Dongjoon Hyun <[email protected]> Signed-off-by: Dongjoon Hyun <[email protected]>

peter-toth

Late LGTM.

### What changes were proposed in this pull request? This PR aims to support `spark.kubernetes.allocation.maximum`. ### Why are the changes needed? Since we use `AtomicInteger` ID generator, we hit the overflow at `Int.MaxValue`. We had better throw exceptions explicitly in this case because it's highly a malfunctioning situation when a Spark driver tries to create `2147483647 (Int.MaxValue)` executor pods. https://github.com/apache/spark/blob/e05c75e105d5bc9947fb6142f40695ecd5e817e6/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/scheduler/cluster/k8s/ExecutorPodsAllocator.scala#L49 ``` $ jshell | Welcome to JShell -- Version 21.0.8 | For an introduction type: /help intro jshell> var x = new java.util.concurrent.atomic.AtomicInteger(Integer.MAX_VALUE) x ==> 2147483647 jshell> x.incrementAndGet() $2 ==> -2147483648 jshell> x.incrementAndGet() $3 ==> -2147483647 jshell> x.incrementAndGet() $4 ==> -2147483646 jshell> var x = new java.util.concurrent.atomic.AtomicInteger(-1) x ==> -1 jshell> x.incrementAndGet() $6 ==> 0 jshell> x.incrementAndGet() $7 ==> 1 ``` ### Does this PR introduce _any_ user-facing change? Practically no because a normal Spark job and K8s cluster cannot handle `2147483647` executor pod creation. If a user has this case, it means the ID overflow happens and the ID will be rotated to `0` and the executor IDs will be reused. It's already a bug situation. ### How was this patch tested? Pass the CIs with newly added test cases. ### Was this patch authored or co-authored using generative AI tooling? No. Closes apache#52615 from dongjoon-hyun/SPARK-53907. Authored-by: Dongjoon Hyun <[email protected]> Signed-off-by: Dongjoon Hyun <[email protected]>

### What changes were proposed in this pull request? This PR aims to document newly added K8s configurations as a part of Apache Spark 4.1.0 preparation. ### Why are the changes needed? To sync the document with K8s `Config.scala`. For now, three PRs added four configurations. - apache#51522 - apache#51811 - apache#52615 ### Does this PR introduce _any_ user-facing change? No behavior change. This is only adding new configuration documents. ### How was this patch tested? Manual review. ### Was this patch authored or co-authored using generative AI tooling? No. Closes apache#52618 from dongjoon-hyun/SPARK-53913. Authored-by: Dongjoon Hyun <[email protected]> Signed-off-by: Dongjoon Hyun <[email protected]>

### What changes were proposed in this pull request? This PR aims to support `spark.kubernetes.allocation.maximum`. ### Why are the changes needed? Since we use `AtomicInteger` ID generator, we hit the overflow at `Int.MaxValue`. We had better throw exceptions explicitly in this case because it's highly a malfunctioning situation when a Spark driver tries to create `2147483647 (Int.MaxValue)` executor pods. https://github.com/apache/spark/blob/e05c75e105d5bc9947fb6142f40695ecd5e817e6/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/scheduler/cluster/k8s/ExecutorPodsAllocator.scala#L49 ``` $ jshell | Welcome to JShell -- Version 21.0.8 | For an introduction type: /help intro jshell> var x = new java.util.concurrent.atomic.AtomicInteger(Integer.MAX_VALUE) x ==> 2147483647 jshell> x.incrementAndGet() $2 ==> -2147483648 jshell> x.incrementAndGet() $3 ==> -2147483647 jshell> x.incrementAndGet() $4 ==> -2147483646 jshell> var x = new java.util.concurrent.atomic.AtomicInteger(-1) x ==> -1 jshell> x.incrementAndGet() $6 ==> 0 jshell> x.incrementAndGet() $7 ==> 1 ``` ### Does this PR introduce _any_ user-facing change? Practically no because a normal Spark job and K8s cluster cannot handle `2147483647` executor pod creation. If a user has this case, it means the ID overflow happens and the ID will be rotated to `0` and the executor IDs will be reused. It's already a bug situation. ### How was this patch tested? Pass the CIs with newly added test cases. ### Was this patch authored or co-authored using generative AI tooling? No. Closes apache#52615 from dongjoon-hyun/SPARK-53907. Authored-by: Dongjoon Hyun <[email protected]> Signed-off-by: Dongjoon Hyun <[email protected]> (cherry picked from commit 4eacd0b)

github-actions bot added the KUBERNETES label Oct 14, 2025

[SPARK-53907][K8S] Support spark.kubernetes.allocation.maximum

0a79172

dongjoon-hyun force-pushed the SPARK-53907 branch from b4b7b86 to 0a79172 Compare October 14, 2025 21:19

kazuyukitanimura approved these changes Oct 14, 2025

View reviewed changes

Remove redundant variable

f9161d1

HyukjinKwon approved these changes Oct 15, 2025

View reviewed changes

dongjoon-hyun closed this in 4eacd0b Oct 15, 2025

dongjoon-hyun deleted the SPARK-53907 branch October 15, 2025 00:19

dongjoon-hyun mentioned this pull request Oct 15, 2025

[SPARK-53913][DOCS] Document newly added K8s configurations #52618

Closed

peter-toth reviewed Oct 15, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[SPARK-53907][K8S] Support `spark.kubernetes.allocation.maximum` #52615

[SPARK-53907][K8S] Support `spark.kubernetes.allocation.maximum` #52615

Uh oh!

dongjoon-hyun commented Oct 14, 2025 •

edited

Loading

Uh oh!

kazuyukitanimura left a comment

Uh oh!

dongjoon-hyun commented Oct 14, 2025

Uh oh!

vrozov commented Oct 14, 2025

Uh oh!

dongjoon-hyun commented Oct 14, 2025

Uh oh!

dongjoon-hyun commented Oct 15, 2025

Uh oh!

peter-toth left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

[SPARK-53907][K8S] Support spark.kubernetes.allocation.maximum #52615

[SPARK-53907][K8S] Support spark.kubernetes.allocation.maximum #52615

Uh oh!

Conversation

dongjoon-hyun commented Oct 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Was this patch authored or co-authored using generative AI tooling?

Uh oh!

kazuyukitanimura left a comment

Choose a reason for hiding this comment

Uh oh!

dongjoon-hyun commented Oct 14, 2025

Uh oh!

vrozov commented Oct 14, 2025

Uh oh!

dongjoon-hyun commented Oct 14, 2025

Uh oh!

dongjoon-hyun commented Oct 15, 2025

Uh oh!

peter-toth left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

[SPARK-53907][K8S] Support `spark.kubernetes.allocation.maximum` #52615

[SPARK-53907][K8S] Support `spark.kubernetes.allocation.maximum` #52615

dongjoon-hyun commented Oct 14, 2025 •

edited

Loading