Skip to content

Conversation

@ulysses-you
Copy link
Contributor

@ulysses-you ulysses-you commented Mar 21, 2023

Why are the changes needed?

This pr adds a new rule FinalStageResourceManager to inject custom reousrce profile for the final write stage.

It nowprovide two features:

  • Kill redundant executors
    We first get the final stage partition which is the actually required cores, then kill the redundant executors. The priority of kill executors follow:

    1. kill executor who is younger than other (The older the JIT works better)
    2. kill executor who produces less shuffle data first
  • Custom resource profile
    We can specify the custom executor resource for final write stage. It now supports:

    executor core
    executor memory
    executor memory overhead
    executor off heap memory
    

The reason why add this feature is that, if the previous stage contains lots executors but final stage has less, then the tasks of final stage would be scheduled randomly in all exists executors which may cause resource waste. e.g., each executor only run 1 or 2 tasks but holds 4 or 5 cores.

How was this patch tested?

test manually

  • test for the kill executor

image

  • test for custom resource profile

image

image

@ulysses-you ulysses-you force-pushed the stage-level-schedule branch from 213101b to f5bfd6b Compare March 22, 2023 03:19
@codecov-commenter
Copy link

codecov-commenter commented Mar 22, 2023

Codecov Report

Merging #4574 (5ce2c5d) into master (465e23a) will increase coverage by 0.03%.
The diff coverage is 96.87%.

❗ Current head 5ce2c5d differs from pull request most recent head 54afa36. Consider uploading reports for the commit 54afa36 to get more accurate results

@@             Coverage Diff              @@
##             master    #4574      +/-   ##
============================================
+ Coverage     53.32%   53.36%   +0.03%     
  Complexity       13       13              
============================================
  Files           573      577       +4     
  Lines         31501    31589      +88     
  Branches       4239     4245       +6     
============================================
+ Hits          16797    16856      +59     
- Misses        13128    13147      +19     
- Partials       1576     1586      +10     
Impacted Files Coverage Δ
...in/scala/org/apache/kyuubi/sql/KyuubiSQLConf.scala 98.61% <96.87%> (-0.50%) ⬇️

... and 22 files with indirect coverage changes

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@ulysses-you ulysses-you marked this pull request as draft March 22, 2023 07:23
@ulysses-you ulysses-you force-pushed the stage-level-schedule branch from 2f20ed4 to 93de3c5 Compare March 22, 2023 12:29
@ulysses-you ulysses-you force-pushed the stage-level-schedule branch 2 times, most recently from ec17f63 to 5ce2c5d Compare March 23, 2023 12:29
@ulysses-you ulysses-you marked this pull request as ready for review March 23, 2023 12:29
@ulysses-you
Copy link
Contributor Author

It is really hard to add some tests, so I test all of this manully. cc @yaooqinn @pan3793 @bowenliang123 @cfmcgrady if you have some other idea about this.

@bowenliang123
Copy link
Contributor

Is there any description available for the manual test results other than the screenshots? I don't quite understand the screenshots showing consequence related to the purposed features.

@ulysses-you
Copy link
Contributor Author

@bowenliang123 I have added some tags, ideally, the screenshots show that:

  1. kill executor
  2. inject custom resource profile

@bowenliang123
Copy link
Contributor

LGTM. Thx for the supplement.

2 similar comments
@bowenliang123
Copy link
Contributor

LGTM. Thx for the supplement.

@bowenliang123
Copy link
Contributor

LGTM. Thx for the supplement.

@ulysses-you ulysses-you force-pushed the stage-level-schedule branch from 5ce2c5d to 2225bec Compare March 24, 2023 02:02
@ulysses-you ulysses-you force-pushed the stage-level-schedule branch from 66751d6 to 0b57d58 Compare March 24, 2023 02:22
@yaooqinn
Copy link
Member

please separate this PR to small ones

@ulysses-you
Copy link
Contributor Author

I seperate thie pr into two parts: 1. kill executor, 2. inject custom resource profile.

here is the pr #4592 for kill executors

@ulysses-you ulysses-you deleted the stage-level-schedule branch March 30, 2023 05:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants