Configure DB Resource Group reload frequency#13931
Configure DB Resource Group reload frequency#13931swapsmagic merged 1 commit intoprestodb:masterfrom
Conversation
c6bd696 to
768e366
Compare
|
What is the problem that we are trying to solve here? How expensive is configuration reloading? |
768e366 to
f821cae
Compare
|
In general it makes sense to make this configurable, though I also wonder what's the motivation behind this. While we do not strictly require resource group change to be real time, anything more than minutes would make the experience unpleasant (whoever making the change needs to apply the changes to database, then check back in a few minutes to see whether it's applied). It's also annoying when users have SEV recovery and asked us to adjust resource group but it won't be effective until many minutes later. That's said, 1 second sounds arbitrary and might not be necessary so it does make sense to change it if there's any real concerns with resource utilization, etc. |
|
@rongrong The motivation behind this is:
@mayankgarg1990 One of the problem i see here is, we are unnecessary wasting cpu cycles on coordinator when it's not adding much value. I will get the numbers on how much cpu it's utilizing so we know for sure how much benefit we are going to get out of it. |
|
@viczhang861 no i don't have number as how long it's taking for us to load resource group. I will see if can get that number from somewhere. |
|
Let's get some numbers before we conclude that it's wasting enough CPU cycle to worth fixing for. Though even if it's worth fixing for, would change it to 10s good enough? 15s? I'm not convinced that different clusters need different configurations. Having different clusters with different configurations would actually complicate cluster management so they need to be well justified. I think change the default to 30s might just solve the problem. I'm fine with adding this as a config property though. It doesn't hurt. Though you probably want to have an upper bound. Anything more than 1 hour is not reasonable (I'd say more than 10 min is already very unreasonable). |
|
Echoing @rongrong's point - changing the default to |
|
For 1 hour of CPU profiling on a shadow T1 cluster which is taking traffic from vll1_batch*, it shows 1.03% of coordinator CPU being used by the DbResourceGroupConfigurationManager.load method. Data from the cpu profiling file: <title>com/facebook/presto/resourceGroups/db/DbResourceGroupConfigurationManager.load (22,582 samples, 1.03%)</title>Trying to get much longer period of data (~1 day) to make sure its being utilized consistently 1% CPU or not. |
f821cae to
071f117
Compare
|
So if we change it to 10s by default that would be 0.1% of total CPU, which would be quite acceptable I assume? 30s as @mayankgarg1990 suggested would make it 0.03% and I think it's still interactive enough for people who are actually materializing the changes. What do you guys think? Let's wait for some more profiling confirmation. |
|
Data from vll1_batch1 for ~18-20 hrs of CPU, the % has dropped to 0.21. com/facebook/presto/resourceGroups/db/DbResourceGroupConfigurationManager.load (378,415 samples, 0.21%). So it seems not too much of CPU utilization as assumed before, but still we can reduce this with the config. So making it 10s, reduces it to 0.02% which seems reasonable to me. @mayankgarg1990 any thoughts on the value of 10s vs 30s? |
mayankgarg1990
left a comment
There was a problem hiding this comment.
Frankly both of them are fine for me. Lets go for 10s since it is a smaller change from the existing value :) I don't think either of two have a significant impact on CPU/user experience.
071f117 to
e3a6619
Compare
|
If we are only changing this to 10s, do we still need to introduce a configuration setting? |
e3a6619 to
b8ff4a6
Compare
There was a problem hiding this comment.
no it won't. Fixing it.
b8ff4a6 to
00178aa
Compare
00178aa to
8f203c0
Compare
Making database resource group reload frequency configurable with name
resource-groups.reload-refresh-interval. This helps reduce some load on the coordinator at the same time give flexibility on changing the frequency given we don't make resource group changes often.Fixing issue: #13926