-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-38480][K8S] Remove spark.kubernetes.job.queue in favor of spark.kubernetes.driver.podGroupTemplateFile
#35783
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…k.kubernetes.driver.podGroupTemplateFile
...kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/features/VolcanoFeatureStep.scala
Show resolved
Hide resolved
yaooqinn
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@viirya @martin-g @yaooqinn Thanks for your review. And sorry to late reply, frankly, I was a little bit concerned about flexibility before, but now I think I'm +1 on this.
If needed, we still can select some configuration carefully in future to overwrite.
I also took some time to get some more feedback from our internal and local users/developers (@yaooqinn @aidaizyy @william-wang @k82cn) who are using kubernetes or using spark with volcano. They also think it's a good way.
Thanks @dongjoon-hyun for your help! LGTM!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did you meet some problem then changed val to var?
All related test are all passed with below changes, so maybe val is enough in here.
[info] VolcanoSuite:
[info] - Run SparkPi with volcano scheduler (12 seconds, 439 milliseconds)
[info] - SPARK-38188: Run SparkPi jobs with 2 queues (only 1 enabled) (14 seconds, 336 milliseconds)
[info] - SPARK-38188: Run SparkPi jobs with 2 queues (all enabled) (25 seconds, 422 milliseconds)
[info] - SPARK-38423: Run SparkPi Jobs with priorityClassName (18 seconds, 373 milliseconds)
[info] - SPARK-38423: Run driver job to validate priority order (16 seconds, 409 milliseconds)
Of course, it's fine for me to address this in a followup.
...ion-tests/src/test/scala/org/apache/spark/deploy/k8s/integrationtest/VolcanoTestsSuite.scala
Outdated
Show resolved
Hide resolved
...ion-tests/src/test/scala/org/apache/spark/deploy/k8s/integrationtest/VolcanoTestsSuite.scala
Outdated
Show resolved
Hide resolved
Overall, that's ok to me :) But it's better to have related parameters to make it easier. |
| .stringConf | ||
| .createOptional | ||
|
|
||
| val KUBERNETES_JOB_QUEUE = ConfigBuilder("spark.kubernetes.job.queue") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And also remove this in https://github.com/apache/spark/blob/master/docs/running-on-kubernetes.md
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks
|
No, @k82cn . That's not better because this is
|
|
The last commit updates documentation and changes
|
…park.kubernetes.driver.podGroupTemplateFile` ### What changes were proposed in this pull request? This PR aims to remove `spark.kubernetes.job.queue` in favor of `spark.kubernetes.driver.podGroupTemplateFile` for Apache Spark 3.3. ### Why are the changes needed? There are several batch execution scheduler options including custom schedulers in K8s environment. We had better isolate scheduler specific settings instead of introducing a new configuration. ### Does this PR introduce _any_ user-facing change? No, the previous configuration is not released yet. ### How was this patch tested? Pass the CIs and K8s IT. ``` [info] KubernetesSuite: [info] - Run SparkPi with no resources (8 seconds, 548 milliseconds) [info] - Run SparkPi with no resources & statefulset allocation (8 seconds, 419 milliseconds) [info] - Run SparkPi with a very long application name. (8 seconds, 360 milliseconds) [info] - Use SparkLauncher.NO_RESOURCE (8 seconds, 386 milliseconds) [info] - Run SparkPi with a master URL without a scheme. (8 seconds, 589 milliseconds) [info] - Run SparkPi with an argument. (8 seconds, 361 milliseconds) [info] - Run SparkPi with custom labels, annotations, and environment variables. (8 seconds, 363 milliseconds) [info] - All pods have the same service account by default (8 seconds, 332 milliseconds) [info] - Run extraJVMOptions check on driver (4 seconds, 331 milliseconds) [info] - Run SparkRemoteFileTest using a remote data file (8 seconds, 392 milliseconds) [info] - Verify logging configuration is picked from the provided SPARK_CONF_DIR/log4j2.properties (13 seconds, 915 milliseconds) [info] - Run SparkPi with env and mount secrets. (18 seconds, 172 milliseconds) [info] - Run PySpark on simple pi.py example (9 seconds, 368 milliseconds) [info] - Run PySpark to test a pyfiles example (11 seconds, 489 milliseconds) [info] - Run PySpark with memory customization (9 seconds, 378 milliseconds) [info] - Run in client mode. (6 seconds, 296 milliseconds) [info] - Start pod creation from template (8 seconds, 465 milliseconds) [info] - SPARK-38398: Schedule pod creation from template (9 seconds, 460 milliseconds) [info] - Test basic decommissioning (40 seconds, 795 milliseconds) [info] - Test basic decommissioning with shuffle cleanup (41 seconds, 16 milliseconds) [info] *** Test still running after 2 minutes, 19 seconds: suite name: KubernetesSuite, test name: Test decommissioning with dynamic allocation & shuffle cleanups. [info] - Test decommissioning with dynamic allocation & shuffle cleanups (2 minutes, 40 seconds) [info] - Test decommissioning timeouts (40 seconds, 446 milliseconds) [info] - SPARK-37576: Rolling decommissioning (1 minute, 5 seconds) [info] - Run SparkR on simple dataframe.R example (12 seconds, 562 milliseconds) [info] VolcanoSuite: [info] - Run SparkPi with no resources (10 seconds, 339 milliseconds) [info] - Run SparkPi with no resources & statefulset allocation (9 seconds, 346 milliseconds) [info] - Run SparkPi with a very long application name. (9 seconds, 306 milliseconds) [info] - Use SparkLauncher.NO_RESOURCE (9 seconds, 361 milliseconds) [info] - Run SparkPi with a master URL without a scheme. (9 seconds, 344 milliseconds) [info] - Run SparkPi with an argument. (9 seconds, 421 milliseconds) [info] - Run SparkPi with custom labels, annotations, and environment variables. (9 seconds, 365 milliseconds) [info] - All pods have the same service account by default (9 seconds, 337 milliseconds) [info] - Run extraJVMOptions check on driver (5 seconds, 348 milliseconds) [info] - Run SparkRemoteFileTest using a remote data file (8 seconds, 310 milliseconds) [info] - Verify logging configuration is picked from the provided SPARK_CONF_DIR/log4j2.properties (15 seconds, 13 milliseconds) [info] - Run SparkPi with env and mount secrets. (18 seconds, 466 milliseconds) [info] - Run PySpark on simple pi.py example (10 seconds, 558 milliseconds) [info] - Run PySpark to test a pyfiles example (11 seconds, 445 milliseconds) [info] - Run PySpark with memory customization (10 seconds, 395 milliseconds) [info] - Run in client mode. (6 seconds, 239 milliseconds) [info] - Start pod creation from template (10 seconds, 415 milliseconds) [info] - SPARK-38398: Schedule pod creation from template (9 seconds, 440 milliseconds) [info] - Test basic decommissioning (42 seconds, 799 milliseconds) [info] - Test basic decommissioning with shuffle cleanup (42 seconds, 836 milliseconds) [info] - Test decommissioning with dynamic allocation & shuffle cleanups (2 minutes, 41 seconds) [info] - Test decommissioning timeouts (42 seconds, 375 milliseconds) [info] - SPARK-37576: Rolling decommissioning (1 minute, 7 seconds) [info] - Run SparkR on simple dataframe.R example (12 seconds, 441 milliseconds) [info] - Run SparkPi with volcano scheduler (10 seconds, 421 milliseconds) [info] - SPARK-38188: Run SparkPi jobs with 2 queues (only 1 enabled) (13 seconds, 256 milliseconds) [info] - SPARK-38188: Run SparkPi jobs with 2 queues (all enabled) (16 seconds, 216 milliseconds) [info] - SPARK-38423: Run SparkPi Jobs with priorityClassName (14 seconds, 264 milliseconds [info] - SPARK-38423: Run driver job to validate priority order (16 seconds, 325 milliseconds) [info] Run completed in 28 minutes, 9 seconds. [info] Total number of tests run: 53 [info] Suites: completed 2, aborted 0 [info] Tests: succeeded 53, failed 0, canceled 0, ignored 0, pending 0 [info] All tests passed. [success] Total time: 1785 s (29:45), completed Mar 8, 2022 11:15:23 PM ``` Closes apache#35783 from dongjoon-hyun/SPARK-38480. Authored-by: Dongjoon Hyun <[email protected]> Signed-off-by: Dongjoon Hyun <[email protected]>
What changes were proposed in this pull request?
This PR aims to remove
spark.kubernetes.job.queuein favor ofspark.kubernetes.driver.podGroupTemplateFilefor Apache Spark 3.3.Why are the changes needed?
There are several batch execution scheduler options including custom schedulers in K8s environment.
We had better isolate scheduler specific settings instead of introducing a new configuration.
Does this PR introduce any user-facing change?
No, the previous configuration is not released yet.
How was this patch tested?
Pass the CIs and K8s IT.