-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-36061][K8S] Add volcano module and feature step
#35422
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/Config.scala
Outdated
Show resolved
Hide resolved
resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/Config.scala
Outdated
Show resolved
Hide resolved
.../core/src/main/scala/org/apache/spark/deploy/k8s/features/scheduler/VolcanoFeatureStep.scala
Outdated
Show resolved
Hide resolved
74b56c5 to
1126602
Compare
|
also cc @dongjoon-hyun @attilapiros @holdenk |
dongjoon-hyun
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for pinging me, @Yikun .
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you provide us some integration test from now, @Yikun ? At least, some procedure how to verify your contributions?
.../core/src/main/scala/org/apache/spark/deploy/k8s/features/scheduler/VolcanoFeatureStep.scala
Show resolved
Hide resolved
resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/Config.scala
Outdated
Show resolved
Hide resolved
dongjoon-hyun
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In addition, please propose the full config namespace design. Otherwise, the current proposal doesn't conform our policy.
spark.kubernetes.job.min.cpu
spark.kubernetes.job.min.memory
@dongjoon-hyun Thanks for reminder, and there are 5 configuration will be introduced: Should I put these all in this PR? or make a separte PR first? |
|
Thanks. You can make separate PRs, but you cannot make an intermediate namespace |
@dongjoon-hyun Thanks for your comments. It's reasonable to change |
|
It works with |
|
Configuration related code has moved to #35436 |
How about this: diff --git project/SparkBuild.scala project/SparkBuild.scala
index 3d3a65f3d2..63ce6cbbf5 100644
--- project/SparkBuild.scala
+++ project/SparkBuild.scala
@@ -429,6 +429,12 @@ object SparkBuild extends PomBuild {
enable(SparkR.settings)(core)
}
+ if (!profiles.contains("volcano")) {
+ enable(Seq(
+ Compile / unmanagedSources / excludeFilter := HiddenFileFilter || "VolcanoFeatureStep.scala"
+ ))(kubernetes)
+ }
+ |
|
In addition I think it would be good to add diff --git .github/workflows/build_and_test.yml .github/workflows/build_and_test.yml
index 4529cd9ba4..9edf5efd35 100644
--- .github/workflows/build_and_test.yml
+++ .github/workflows/build_and_test.yml
@@ -614,7 +614,7 @@ jobs:
export MAVEN_CLI_OPTS="--no-transfer-progress"
export JAVA_VERSION=${{ matrix.java }}
# It uses Maven's 'install' intentionally, see https://github.com/apache/spark/pull/26414.
- ./build/mvn $MAVEN_CLI_OPTS -DskipTests -Pyarn -Pmesos -Pkubernetes -Phive -Phive-thriftserver -Phadoop-cloud -Djava.version=${JAVA_VERSION/-ea} install
+ ./build/mvn $MAVEN_CLI_OPTS -DskipTests -Pyarn -Pmesos -Pkubernetes -Pvolcano -Phive -Phive-thriftserver -Phadoop-cloud -Djava.version=${JAVA_VERSION/-ea} install
rm -rf ~/.m2/repository/org/apache/spark
scala-213:
@@ -660,7 +660,7 @@ jobs:
- name: Build with SBT
run: |
./dev/change-scala-version.sh 2.13
- ./build/sbt -Pyarn -Pmesos -Pkubernetes -Phive -Phive-thriftserver -Phadoop-cloud -Pkinesis-asl -Pdocker-integration-tests -Pkubernetes-integration-tests -Pspark-ganglia-lgpl -Pscala-2.13 compile test:compile
+ ./build/sbt -Pyarn -Pmesos -Pkubernetes -Pvolcano -Phive -Phive-thriftserver -Phadoop-cloud -Pkinesis-asl -Pdocker-integration-tests -Pkubernetes-integration-tests -Pspark-ganglia-lgpl -Pscala-2.13 compile test:compile |
facde87 to
f38137a
Compare
|
@martin-g Thanks! I added sbt related code, and I will add -Pvolcano in action after I make sure nothing break. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To @Yikun .
- Thank you for adding profile
- No, we do not accept empty configurations PRs. All code should be working and have a test coverage.
To @william-wang, please don't use member.
And, +1 for @martin-g comment about adding new profiles to CIs.
@dongjoon-hyun lol, I just misunderstood before, I felt a little weird too. but it has no effect on this PR. This PR will only add the volcanofeaturestep and introduce volcano module (enable podgroup). then next PR will introduce all configurations and volcano implementions.
Sure, will address soon. Thanks for help! |
f38137a to
c8d7f5c
Compare
|
Thanks for review, will update soon.
The original thoughts was @dongjoon-hyun IIUC, you meant puting them togother into a module like |
|
No, I didn't mean anything yet at this stage~ I must be clear on that. Sorry. |
|
@dongjoon-hyun Fine, thanks for clarify |
|
Thank you for updates, @Yikun . |
|
Test result: 1. build/sbt -Pvolcano -Pkubernetes -Pkubernetes-integration-tests -Dtest.exclude.tags=minikube,r "kubernetes-integration-tests/test" |
dongjoon-hyun
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems that the new tests verify the customer scheduler feature.
Do you think we can verify the further volcano scheduler features like the following?
- Gang scheduling
- Fair-share scheduling
- Queue scheduling
- Preemption scheduling
- Topology-based scheduling
- Reclaim
- Backfill
- Resource reservation
|
It seems that the following is not working on |
|
@dongjoon-hyun Try this: And I will also do validations on arm64. : ) |
|
Thanks! Please add that into the PR description too. |
|
Ur, it seems to fail. |
@dongjoon-hyun yep, there are some failure on volcano arm-deploy yaml, but will adress soon by cc @william-wang volcano-sh/volcano#2010
Yep, most of them are configuration introduce and testing work, below cases will be added in integration test and added in Spark 3.3, I'm working on these (It is in my next week road map):
(later: I also updated the corresponding jira) The |
|
Failed test are unrelated: |
|
Confirmed! Works here! |
|
@dongjoon-hyun @william-wang @martin-g Thanks for all helps, it's also works and passed all integration test on my arm64 env. Env info:Test result on arm64 env: |
dongjoon-hyun
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for update, @Yikun . The plan looks reasonable to me.
To @william-wang , could you publish the images with multi-arch format? Since we need to document this officially in Apache Spark 3.3 document, it would be great if we can drop the complexity like -arm64 postfix.
kubectl apply -f https://raw.githubusercontent.com/volcano-sh/volcano/master/installer/volcano-development-arm64.yaml
@dongjoon-hyun That's a good idea :) .The multi-arch is already in volcano community roadmap. Especially, we are going to support it before Apache Spark 3.3 release, which will be in a seperate pr to tell user how to deploy Volcano in Spark on K8S. |
|
Got it. Thank you for the confirmation. |
dongjoon-hyun
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1, LGTM. Thank you everyone, @Yikun , @HyukjinKwon , @martin-g , and @william-wang .
Although there are many things we need to do in the future, I believe we can merge this PR to help the up-coming PRs easier. I'm in your side to help this efforts.
|
@dongjoon-hyun Much thanks for your efforts and key input. And thanks for all helps! @HyukjinKwon , @martin-g and @william-wang |
… to manage the code for `volcano`
### What changes were proposed in this pull request?
The main changes in this pr are as follows:
1. In `resource-managers/kubernetes/core/pom.xml` and `resource-managers/kubernetes/integration-tests/pom.xml`, the `build-helper-maven-plugin` configuration has been added for the `volcano` profile to ensure that when the profile is activated with `-Pvolcano`, the `volcano/src/main/scala` directory is treated as an additional source path, and `volcano/src/test/scala` directory is treated as an additional test code path.
- `resource-managers/kubernetes/core/pom.xml`
```xml
<build>
<plugins>
<plugin>
<groupId>org.codehaus.mojo</groupId>
<artifactId>build-helper-maven-plugin</artifactId>
<executions>
<execution>
<id>add-volcano-source</id>
<phase>generate-sources</phase>
<goals>
<goal>add-source</goal>
</goals>
<configuration>
<sources>
<source>volcano/src/main/scala</source>
</sources>
</configuration>
</execution>
<execution>
<id>add-volcano-test-sources</id>
<phase>generate-test-sources</phase>
<goals>
<goal>add-test-source</goal>
</goals>
<configuration>
<sources>
<source>volcano/src/test/scala</source>
</sources>
</configuration>
</execution>
</executions>
</plugin>
</plugins>
</build>
```
- `resource-managers/kubernetes/integration-tests/pom.xml`
```xml
<build>
<plugins>
<plugin>
<groupId>org.codehaus.mojo</groupId>
<artifactId>build-helper-maven-plugin</artifactId>
<executions>
<execution>
<id>add-volcano-test-sources</id>
<phase>generate-test-sources</phase>
<goals>
<goal>add-test-source</goal>
</goals>
<configuration>
<sources>
<source>volcano/src/test/scala</source>
</sources>
</configuration>
</execution>
</executions>
</plugin>
</plugins>
</build>
```
2. Removed the management configuration for `volcano`-related source/test code in SPARK-36061 | #35422.
Since Spark uses the `sbt-pom-reader` plugin in its sbt configuration, the behavior of the `build-helper-maven-plugin` will also propagate to the sbt build process. Therefore, no additional configuration is required in `SparkBuild.scala` after this pr.
### Why are the changes needed?
The previous configuration way was not very friendly to IntelliJ developers: when debugging code in IntelliJ, regardless of whether they were needed or not, the `volcano` profile had to be activated; otherwise compilation errors would occur when running tests that depended on the `kubernetes` module's source code, for example `org.apache.spark.shuffle.ShuffleChecksumUtilsSuite` :
<img width="1465" alt="image" src="https://github.com/user-attachments/assets/e16e3eba-d85e-45ad-bbae-533bd2f8ce0b">
### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?
1. Pass GitHub Actions
- https://github.com/LuciferYang/spark/actions/runs/10714343021/job/29707907464
<img width="1109" alt="image" src="https://github.com/user-attachments/assets/d893accb-508f-47f5-b19e-e178f6eff128">
- https://github.com/LuciferYang/spark/actions/runs/10714343021/job/29707906573
<img width="1183" alt="image" src="https://github.com/user-attachments/assets/735e0dc7-7d2c-418f-8fcd-200ee10eda0d">
It can be seen that the test cases `VolcanoFeatureStepSuite ` and `VolcanoSuite` have been successfully executed.
2. Manual Testing Using sbt
- Run `build/sbt clean "kubernetes/testOnly *VolcanoFeatureStepSuite" -Pkubernetes`, and without `-Pvolcano`, no tests will be executed:
```
[info] Passed: Total 0, Failed 0, Errors 0, Passed 0
[info] No tests to run for kubernetes / Test / testOnly
```
- Run `build/sbt clean "kubernetes/testOnly *VolcanoFeatureStepSuite" -Pkubernetes -Pvolcano`, and with `-Pvolcano`, `VolcanoFeatureStepSuite` will pass the tests:
```
[info] VolcanoFeatureStepSuite:
[info] - SPARK-36061: Driver Pod with Volcano PodGroup (74 milliseconds)
[info] - SPARK-36061: Executor Pod with Volcano PodGroup (8 milliseconds)
[info] - SPARK-38455: Support driver podgroup template (224 milliseconds)
[info] - SPARK-38503: return empty for executor pre resource (1 millisecond)
[info] Run completed in 1 second, 268 milliseconds.
[info] Total number of tests run: 4
[info] Suites: completed 1, aborted 0
[info] Tests: succeeded 4, failed 0, canceled 0, ignored 0, pending 0
[info] All tests passed.
```
- run `build/sbt clean "kubernetes/package" -Pkubernetes -Pvolcano`, and with `-Pvolcano`, confirm that `spark-kubernetes_2.13-4.0.0-SNAPSHOT.jar` contains `VolcanoFeatureStep.class`
- run `build/sbt clean "kubernetes/package" -Pkubernetes`, and without `-Pvolcano`, confirm that `spark-kubernetes_2.13-4.0.0-SNAPSHOT.jar` not contains `VolcanoFeatureStep.class`
3. Manual Testing Using Maven
- Run `build/mvn clean test -pl resource-managers/kubernetes/core -am -Dtest=none -DwildcardSuites=org.apache.spark.deploy.k8s.features.VolcanoFeatureStepSuite -Pkubernetes`, and without `-Pvolcano`, no tests will be executed:
```
Discovery starting.
Discovery completed in 80 milliseconds.
Run starting. Expected test count is: 0
DiscoverySuite:
Run completed in 99 milliseconds.
Total number of tests run: 0
Suites: completed 1, aborted 0
Tests: succeeded 0, failed 0, canceled 0, ignored 0, pending 0
No tests were executed.
```
- Run `build/mvn clean test -pl resource-managers/kubernetes/core -am -Dtest=none -DwildcardSuites=org.apache.spark.deploy.k8s.features.VolcanoFeatureStepSuite -Pkubernetes -Pvolcano`, and with `-Pvolcano`, `VolcanoFeatureStepSuite` will pass the tests:
```
Discovery starting.
Discovery completed in 263 milliseconds.
Run starting. Expected test count is: 4
VolcanoFeatureStepSuite:
- SPARK-36061: Driver Pod with Volcano PodGroup
- SPARK-36061: Executor Pod with Volcano PodGroup
- SPARK-38455: Support driver podgroup template
- SPARK-38503: return empty for executor pre resource
Run completed in 624 milliseconds.
Total number of tests run: 4
Suites: completed 2, aborted 0
Tests: succeeded 4, failed 0, canceled 0, ignored 0, pending 0
All tests passed.
```
- run `build/mvn clean package -pl resource-managers/kubernetes/core -am -DskipTests -Pkubernetes -Pvolcano` and with `-Pvolcano`, confirm that `spark-kubernetes_2.13-4.0.0-SNAPSHOT.jar` contains `VolcanoFeatureStep.class`
- run `build/mvn clean package -pl resource-managers/kubernetes/core -am -DskipTests -Pkubernetes` and without `-Pvolcano`, confirm that `spark-kubernetes_2.13-4.0.0-SNAPSHOT.jar` not contains `VolcanoFeatureStep.class`
4. Testing in IntelliJ (both imported as a Maven project and as an sbt project):
- By default, do not activate `volcano`, and confirm that `volcano`-related code is not recognized as source/test code, and does not affect the compilation and testing of other code.
- Manually activate `volcano`, and confirm that `volcano`-related code is recognized as source/test code, and can be compiled and tested normally.
5. Similar tests were conducted on `kubernetes-integration-tests` module to confirm the validity of the `volcano` profile.
### Was this patch authored or co-authored using generative AI tooling?
No
Closes #47997 from LuciferYang/refactor-volcano.
Authored-by: yangjie01 <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>
What changes were proposed in this pull request?
This patch added volcano feature step to help user integrate spark with Volcano Scheduler.
After this patch, users can enable this featurestep by submiting job by using
A PodGroup will be created before driver started, annotations will be set to driver pod to added driver pod to this pod group. Then, Volcano scheduler will help driver pod scheduling instead of deafult kubernetes scheduler.
Why are the changes needed?
This PR help user integrate Spark with Volcano Scheduler.
See also: SPARK-36057
Does this PR introduce any user-facing change?
Yes, introduced a user feature step.
These are used by
VolcanoFeatureStep, and also will be used byYunikornFeatureStepin future.How was this patch tested?