-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-38524][SPARK-38553][K8S] Bump Volcano to v1.5.1 and fix Volcano weight to be positive integer and use cpu capability instead
#35819
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the Volcano versioning? I don't believe this is sustainable to depend on latest. Please be specific about
- which Volcano version has these enforcements
- which Volcano version can delete the
Openqueue.
Without a specific volcano version, everything is meaningless because it's fragile.
|
Sorry for the trouble.
Always, but as I mentioned before, there was a compatiable problem in v1.5.0 webhook feature (admissionconrol, input parameter validation) are not work in v1.5.0 with K8S 1.22+, so we didn't see this enforcements. Epecially in kubernetes v1.22+ because of admissionregistration.k8s.io/v1beta1 drop, alread be fixed in volcano master volcano-sh/volcano#2063 and will be backported to v1.5.x soon.
TLDR:
|
|
Thank you for the detail, @Yikun ! |
weight to be positive integer and use capability instead
weight to be positive integer and use capability insteadweight to be positive integer and use cpu capability instead
|
Hi, @Yikun . Is there a pre-defined release cadence in Volcano community? |
|
@dongjoon-hyun Thanks for notification, the beta image will be published in 1-2 hours. I also do a pre-validation on x86 and arm based on volcano latest branch, all IT passed as expected with this PR. I will ping you once it's ready. @Thor-wl @william-wang is doing final check and then release the beta image, thank for help and efforts at the weekend.
major version no certain time, minor releases about 3-6 months, patch release on demand. |
|
Thank you for the confirmation. Please note that Apache Spark cannot depend on |
|
@dongjoon-hyun Yes for sure, the target version would be As we dicussed before, the release time would be a time after Spark 3.3 branch cut down (15. March) before Spark 3.3 release RC (April). |
|
All test passed with volcano release-1.5 (v1.5.1-beta.0 images) on x86 and arm64. test on arm64 detailstest on x86 detailsReady to go! @dongjoon-hyun |
Yes, For the test PR, it sounds good because March 15 is a deadline for |
| weight: 1 | ||
| capability: | ||
| cpu: "1" | ||
| cpu: "0.1" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cpu: 0 is also invalid by definition?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cap 0 is valid, if capacity <= 0 represents no limit in Volcano queue.
This is different with Weight, weight is soft constraint to calculate some proportions according to weight value, so 0 bring some unexpected behavior, so it's invalid.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it. If then, could you use more smaller value like 0.0001 than 0.1?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, unit of least precision is 0.001 as kuberentes. So we'd better just set to 0.001.
PR updated.
|
|
It seems that |
|
I'm not sure what is going on |
|
It was my bad. I made a follow-up to fix it. |
|
@dongjoon-hyun See also: volcano-sh/volcano#2079, note that how to cleanup volcano completely |
|
Then after above clean operations, you can try just use |
|
@Yikun . I cannot verify your PR because surprisingly |
|
I'm trying to rebuilt the cluster. I'll let you know if I succeed again. |
As we meet before in #35422 (comment) (which is fixed in master and backport to v1.5.x in volcano-sh/volcano@42fd488) Try this, as I mentioned before, there are some bugs fixed after volcano v1.5.0 release. such as:
or you mean we have to first bump volcano to v1.5.1, then merge this PR? |
|
Ya, I found that I was hitting the original issue of lack of ARM64 support of |
@dongjoon-hyun So my suggestion is first we do validation on |
|
I don't think that's Apache Spark's role. Apache Spark community will mark this is a known issue in the release note if Volcano doesn't provide any release until our release date, @Yikun . |
|
@dongjoon-hyun OK, so now we have only one way to solve this issue: volcano release v1.5.1 before cut down? If we find any other issues before Spark 3.3 RC, Volcano could release a demand patch version like v1.5.2. |
|
Unfortunately, I don't know what is inside Volcano code, @Yikun . It's a role of Volcano community to provide a reliable tested release. Given the current circumstance I observed so far during helping this area, I don't know We will test the stability again when it's available publicly. |
|
@dongjoon-hyun OK, thanks for your suggestion, I will discuss with Volcano community! Thanks! |
weight to be positive integer and use cpu capability insteadVolcano to v1.5.1 and fix Volcano weight to be positive integer and use cpu capability instead
|
@dongjoon-hyun Volcano are considering to release v1.5.1 (same code base with v1.5.1.beta.0). The release will ready soon. volcano-sh/volcano#2090 I also updated this PR to bump v1.5.1 and fix the test issue, I couldn't separate version bump and test case, so just squash them in to this PR. |
|
Ready for review. |
dongjoon-hyun
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1, LGTM. It's great, @Yikun .
Merged to master.
What changes were proposed in this pull request?
v1.5.1volcano-sh/volcano#2090Why are the changes needed?
In Volcano, weight should be a positive integer, so weight 0 is a wrong usage. As description for queue
We better to use capability to limit disable queue. This also fix the error
requestBody.spec.weight: Invalid value: 0: queue weight must be a positive integerwhen running latest volcano image.Does this PR introduce any user-facing change?
No
How was this patch tested?