-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-52933][K8S] Verify if the executor cpu request exceeds limit #51678
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Could you review this when you have some time, @peter-toth ? |
|
Could you review this PR when you have some time, @HyukjinKwon ? |
| val executorCpuLimitQuantity = new Quantity(limitCores) | ||
| if (executorCpuLimitQuantity.compareTo(executorCpuQuantity) < 0) { | ||
| throw new SparkException( | ||
| "The executor cpu request should be less than or equal to cpu limit") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Should the request value and limit value be included in the error message?
|
Thank you, @LuciferYang and @HyukjinKwon . |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1, LGTM
Thank you @dongjoon-hyun
|
All comments are addressed and I verified the test result manually because the last commit changes only the exception message string. Merged to master for Apache Spark 4.1.0. |
peter-toth
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Late LGTM.
|
Thanks a lot for pushing this PR! @dongjoon-hyun |
### What changes were proposed in this pull request? This PR aims to upgrade Spark to `4.1.0-preview2` for `4.0.1`. ### Why are the changes needed? Since Apache Spark 4.1.0 is planned next month, we had better prepare to use new features via using `4.1.0-preview2` (September) and `4.1.0-preview2 (October)` gradually. - apache/spark#51678 - apache/spark#51522 - apache/spark#50925 ### Does this PR introduce _any_ user-facing change? No behavior change. ### How was this patch tested? Pass the CIs. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #364 from dongjoon-hyun/SPARK-53787. Authored-by: Dongjoon Hyun <[email protected]> Signed-off-by: Dongjoon Hyun <[email protected]>
### What changes were proposed in this pull request? This PR aims to verify if the executor pod's cpu request exceeds cpu limit or not in order to make it fail-fast. ### Why are the changes needed? Since Spark creates many executor pods, we had better do fail-fast on the invalid settings before submitting invalid pod spec to K8s cluster. It wastes lots of K8s resources. Note that newly added validation check only happens when `spark.kubernetes.executor.limit.cores` is given explicitly. ### Does this PR introduce _any_ user-facing change? No behavior change eventually because the existing misconfigured `spark.kubernetes.executor.limit.cores` means Spark driver cannot get any executor pods and the job will hang or fail eventually. ### How was this patch tested? Pass the CIs with the newly added test case. ### Was this patch authored or co-authored using generative AI tooling? No. Closes apache#51678 from dongjoon-hyun/SPARK-52933. Authored-by: Dongjoon Hyun <[email protected]> Signed-off-by: Dongjoon Hyun <[email protected]> (cherry picked from commit 4dc3f0f)
What changes were proposed in this pull request?
This PR aims to verify if the executor pod's cpu request exceeds cpu limit or not in order to make it fail-fast.
Why are the changes needed?
Since Spark creates many executor pods, we had better do fail-fast on the invalid settings before submitting invalid pod spec to K8s cluster. It wastes lots of K8s resources.
Note that newly added validation check only happens when
spark.kubernetes.executor.limit.coresis given explicitly.Does this PR introduce any user-facing change?
No behavior change eventually because the existing misconfigured
spark.kubernetes.executor.limit.coresmeans Spark driver cannot get any executor pods and the job will hang or fail eventually.How was this patch tested?
Pass the CIs with the newly added test case.
Was this patch authored or co-authored using generative AI tooling?
No.