-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-53907][K8S] Support spark.kubernetes.allocation.maximum
#52615
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
b4b7b86 to
0a79172
Compare
kazuyukitanimura
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @dongjoon-hyun LGTM
|
Thank you, @kazuyukitanimura . cc @peter-toth |
|
+1, LGTM |
|
Thank you, @vrozov . |
|
Thank you, @HyukjinKwon . Merged to master for Apache Spark 4.1.0-preview3. |
### What changes were proposed in this pull request? This PR aims to document newly added K8s configurations as a part of Apache Spark 4.1.0 preparation. ### Why are the changes needed? To sync the document with K8s `Config.scala`. For now, three PRs added four configurations. - #51522 - #51811 - #52615 ### Does this PR introduce _any_ user-facing change? No behavior change. This is only adding new configuration documents. ### How was this patch tested? Manual review. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #52618 from dongjoon-hyun/SPARK-53913. Authored-by: Dongjoon Hyun <[email protected]> Signed-off-by: Dongjoon Hyun <[email protected]>
peter-toth
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Late LGTM.
### What changes were proposed in this pull request? This PR aims to support `spark.kubernetes.allocation.maximum`. ### Why are the changes needed? Since we use `AtomicInteger` ID generator, we hit the overflow at `Int.MaxValue`. We had better throw exceptions explicitly in this case because it's highly a malfunctioning situation when a Spark driver tries to create `2147483647 (Int.MaxValue)` executor pods. https://github.com/apache/spark/blob/e05c75e105d5bc9947fb6142f40695ecd5e817e6/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/scheduler/cluster/k8s/ExecutorPodsAllocator.scala#L49 ``` $ jshell | Welcome to JShell -- Version 21.0.8 | For an introduction type: /help intro jshell> var x = new java.util.concurrent.atomic.AtomicInteger(Integer.MAX_VALUE) x ==> 2147483647 jshell> x.incrementAndGet() $2 ==> -2147483648 jshell> x.incrementAndGet() $3 ==> -2147483647 jshell> x.incrementAndGet() $4 ==> -2147483646 jshell> var x = new java.util.concurrent.atomic.AtomicInteger(-1) x ==> -1 jshell> x.incrementAndGet() $6 ==> 0 jshell> x.incrementAndGet() $7 ==> 1 ``` ### Does this PR introduce _any_ user-facing change? Practically no because a normal Spark job and K8s cluster cannot handle `2147483647` executor pod creation. If a user has this case, it means the ID overflow happens and the ID will be rotated to `0` and the executor IDs will be reused. It's already a bug situation. ### How was this patch tested? Pass the CIs with newly added test cases. ### Was this patch authored or co-authored using generative AI tooling? No. Closes apache#52615 from dongjoon-hyun/SPARK-53907. Authored-by: Dongjoon Hyun <[email protected]> Signed-off-by: Dongjoon Hyun <[email protected]>
### What changes were proposed in this pull request? This PR aims to document newly added K8s configurations as a part of Apache Spark 4.1.0 preparation. ### Why are the changes needed? To sync the document with K8s `Config.scala`. For now, three PRs added four configurations. - apache#51522 - apache#51811 - apache#52615 ### Does this PR introduce _any_ user-facing change? No behavior change. This is only adding new configuration documents. ### How was this patch tested? Manual review. ### Was this patch authored or co-authored using generative AI tooling? No. Closes apache#52618 from dongjoon-hyun/SPARK-53913. Authored-by: Dongjoon Hyun <[email protected]> Signed-off-by: Dongjoon Hyun <[email protected]>
### What changes were proposed in this pull request? This PR aims to support `spark.kubernetes.allocation.maximum`. ### Why are the changes needed? Since we use `AtomicInteger` ID generator, we hit the overflow at `Int.MaxValue`. We had better throw exceptions explicitly in this case because it's highly a malfunctioning situation when a Spark driver tries to create `2147483647 (Int.MaxValue)` executor pods. https://github.com/apache/spark/blob/e05c75e105d5bc9947fb6142f40695ecd5e817e6/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/scheduler/cluster/k8s/ExecutorPodsAllocator.scala#L49 ``` $ jshell | Welcome to JShell -- Version 21.0.8 | For an introduction type: /help intro jshell> var x = new java.util.concurrent.atomic.AtomicInteger(Integer.MAX_VALUE) x ==> 2147483647 jshell> x.incrementAndGet() $2 ==> -2147483648 jshell> x.incrementAndGet() $3 ==> -2147483647 jshell> x.incrementAndGet() $4 ==> -2147483646 jshell> var x = new java.util.concurrent.atomic.AtomicInteger(-1) x ==> -1 jshell> x.incrementAndGet() $6 ==> 0 jshell> x.incrementAndGet() $7 ==> 1 ``` ### Does this PR introduce _any_ user-facing change? Practically no because a normal Spark job and K8s cluster cannot handle `2147483647` executor pod creation. If a user has this case, it means the ID overflow happens and the ID will be rotated to `0` and the executor IDs will be reused. It's already a bug situation. ### How was this patch tested? Pass the CIs with newly added test cases. ### Was this patch authored or co-authored using generative AI tooling? No. Closes apache#52615 from dongjoon-hyun/SPARK-53907. Authored-by: Dongjoon Hyun <[email protected]> Signed-off-by: Dongjoon Hyun <[email protected]> (cherry picked from commit 4eacd0b)
What changes were proposed in this pull request?
This PR aims to support
spark.kubernetes.allocation.maximum.Why are the changes needed?
Since we use
AtomicIntegerID generator, we hit the overflow atInt.MaxValue. We had better throw exceptions explicitly in this case because it's highly a malfunctioning situation when a Spark driver tries to create2147483647 (Int.MaxValue)executor pods.spark/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/scheduler/cluster/k8s/ExecutorPodsAllocator.scala
Line 49 in e05c75e
Does this PR introduce any user-facing change?
Practically no because a normal Spark job and K8s cluster cannot handle
2147483647executor pod creation. If a user has this case, it means the ID overflow happens and the ID will be rotated to0and the executor IDs will be reused. It's already a bug situation.How was this patch tested?
Pass the CIs with newly added test cases.
Was this patch authored or co-authored using generative AI tooling?
No.