Skip to content

Configure DB Resource Group reload frequency#13931

Merged
swapsmagic merged 1 commit intoprestodb:masterfrom
swapsmagic:configured_resource_group_reload
Jan 14, 2020
Merged

Configure DB Resource Group reload frequency#13931
swapsmagic merged 1 commit intoprestodb:masterfrom
swapsmagic:configured_resource_group_reload

Conversation

@swapsmagic
Copy link
Contributor

@swapsmagic swapsmagic commented Jan 6, 2020

Making database resource group reload frequency configurable with name resource-groups.reload-refresh-interval. This helps reduce some load on the coordinator at the same time give flexibility on changing the frequency given we don't make resource group changes often.
Fixing issue: #13926

== RELEASE NOTES ==

General Changes
* Making database resource group reload frequency configurable
* ...

@mayankgarg1990
Copy link

What is the problem that we are trying to solve here? How expensive is configuration reloading?

@swapsmagic swapsmagic force-pushed the configured_resource_group_reload branch from 768e366 to f821cae Compare January 6, 2020 23:51
Copy link
Contributor

@viczhang861 viczhang861 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice job, do you know what is the current loading time? It may be possible to take more than 1 second to load.

@rongrong
Copy link
Contributor

rongrong commented Jan 8, 2020

In general it makes sense to make this configurable, though I also wonder what's the motivation behind this. While we do not strictly require resource group change to be real time, anything more than minutes would make the experience unpleasant (whoever making the change needs to apply the changes to database, then check back in a few minutes to see whether it's applied). It's also annoying when users have SEV recovery and asked us to adjust resource group but it won't be effective until many minutes later. That's said, 1 second sounds arbitrary and might not be necessary so it does make sense to change it if there's any real concerns with resource utilization, etc.

@swapsmagic
Copy link
Contributor Author

@rongrong The motivation behind this is:

  1. Reduce unencessary processing of loading this every second.
  2. With configurable property, we can easily change the value. And the idea is give flexibility, so based on use case it can configured instead same value for all clusters.
    Also for our use case, i think setting this to a 1 minute is a reasonable approach as if the new resource group changes are getting loaded in a minute that should be good to resolve any sevs. Correct me if that's too long and could be a problem for us.

@mayankgarg1990 One of the problem i see here is, we are unnecessary wasting cpu cycles on coordinator when it's not adding much value. I will get the numbers on how much cpu it's utilizing so we know for sure how much benefit we are going to get out of it.

@swapsmagic
Copy link
Contributor Author

@viczhang861 no i don't have number as how long it's taking for us to load resource group. I will see if can get that number from somewhere.

@rongrong
Copy link
Contributor

rongrong commented Jan 8, 2020

Let's get some numbers before we conclude that it's wasting enough CPU cycle to worth fixing for. Though even if it's worth fixing for, would change it to 10s good enough? 15s? I'm not convinced that different clusters need different configurations. Having different clusters with different configurations would actually complicate cluster management so they need to be well justified. I think change the default to 30s might just solve the problem. I'm fine with adding this as a config property though. It doesn't hurt. Though you probably want to have an upper bound. Anything more than 1 hour is not reasonable (I'd say more than 10 min is already very unreasonable).

@mayankgarg1990
Copy link

Echoing @rongrong's point - changing the default to 30s seems reasonable - it should enable quick enough reloads while not bloating the coordinator with unnecessary requests.

@swapsmagic
Copy link
Contributor Author

For 1 hour of CPU profiling on a shadow T1 cluster which is taking traffic from vll1_batch*, it shows 1.03% of coordinator CPU being used by the DbResourceGroupConfigurationManager.load method.

Data from the cpu profiling file:

<title>com/facebook/presto/resourceGroups/db/DbResourceGroupConfigurationManager.load (22,582 samples, 1.03%)</title>

Trying to get much longer period of data (~1 day) to make sure its being utilized consistently 1% CPU or not.

@swapsmagic swapsmagic force-pushed the configured_resource_group_reload branch from f821cae to 071f117 Compare January 14, 2020 00:02
@rongrong
Copy link
Contributor

So if we change it to 10s by default that would be 0.1% of total CPU, which would be quite acceptable I assume? 30s as @mayankgarg1990 suggested would make it 0.03% and I think it's still interactive enough for people who are actually materializing the changes. What do you guys think? Let's wait for some more profiling confirmation.

@swapsmagic
Copy link
Contributor Author

Data from vll1_batch1 for ~18-20 hrs of CPU, the % has dropped to 0.21.

com/facebook/presto/resourceGroups/db/DbResourceGroupConfigurationManager.load (378,415 samples, 0.21%).

So it seems not too much of CPU utilization as assumed before, but still we can reduce this with the config. So making it 10s, reduces it to 0.02% which seems reasonable to me. @mayankgarg1990 any thoughts on the value of 10s vs 30s?

Copy link

@mayankgarg1990 mayankgarg1990 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Frankly both of them are fine for me. Lets go for 10s since it is a smaller change from the existing value :) I don't think either of two have a significant impact on CPU/user experience.

@swapsmagic swapsmagic force-pushed the configured_resource_group_reload branch from 071f117 to e3a6619 Compare January 14, 2020 19:18
@rongrong
Copy link
Contributor

If we are only changing this to 10s, do we still need to introduce a configuration setting?

@swapsmagic swapsmagic force-pushed the configured_resource_group_reload branch from e3a6619 to b8ff4a6 Compare January 14, 2020 20:03
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would this compile?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no it won't. Fixing it.

@swapsmagic swapsmagic force-pushed the configured_resource_group_reload branch from b8ff4a6 to 00178aa Compare January 14, 2020 20:45
@swapsmagic swapsmagic force-pushed the configured_resource_group_reload branch from 00178aa to 8f203c0 Compare January 14, 2020 21:33
@swapsmagic swapsmagic merged commit 32ca539 into prestodb:master Jan 14, 2020
@swapsmagic swapsmagic deleted the configured_resource_group_reload branch January 14, 2020 22:45
@caithagoras caithagoras mentioned this pull request Feb 20, 2020
8 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants