Dynamic tablet throttler config: enable/disable, set metrics query/threshold#11604
Conversation
Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
…ards compatibility) Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
… STATUS Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
mattlord
left a comment
There was a problem hiding this comment.
LGTM! My last concern has been addressed. Made one minor comment about the vtctld help output. I'll let the other reviewers double check their respective areas. Thank you for working on this! I think this greatly improves the throttling feature! ❤️
| // UpdateThrottlerConfig makes a UpdateThrottlerConfig gRPC call to a vtctld. | ||
| UpdateThrottlerConfig = &cobra.Command{ | ||
| Use: "UpdateThrottlerConfig [--enable|--disable] [--threshold=<float64>] [--custom-query=<query>] [--check-as-check-self|--check-as-check-shard] <keyspace>", | ||
| Short: "Rebuilds the cell-specific SrvVSchema from the global VSchema objects in the provided cells (or all cells if none provided).", |
There was a problem hiding this comment.
This implies that you can specify cells but currently you cannot. It also doesn't rebuild/refresh so much as update the config in the topo which is then picked up by the watchers. Looks like we can mostly copy the help output from vtctl: "Update the table throttler configuration for all cells and tablets of a given keyspace"
There was a problem hiding this comment.
Good catch. This is just an overlooked copy+paste. Updated the comment.
Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
|
@ajm188 is this looking good on your side? |
Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
|
@shlomi-noach can you
Rest LGTM |
Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
done
Added release notes |
ya, approving formally for completeness! great stuff |
Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
|
huh! I used to have the superpower to force merge a PR, I seem to not have it. Will solicit more approvals |
Description
A different implementation for dynamic throttler config from the one described in #11316
We have decided to implement dynamic throttler config in the following way:
vtctldclient UpdateThrottlerConfigcommand.topo, not in a backend_vttable.vtgateis postponed and to be re-evaluated if we want such control.show vitess_throttler statusquery returns per-tablet throttler state.--throttler-config-via-topoDiscussion & details.
The main deviation from #11316 is that we do not use a
_vtbackend table to store the throttler's config, and instead store it intopo. There are multiple reasons to that:_vtmeans replica tablets are susceptible to replication lag, which introduces a dependency looptoposeems a logical place to set this kind of configurationtopolisteners/callback to simplify the propagation of information to the tablets.We utilize local
topos ; changes to configuration will apply to all cell-topos of a keyspace.We do require global
topoto be available if you want to make a change to the throttler. This is because we need globaltopoto tell us where to find per-celltopos.vtctldclient UpdateThrottlerConfig
You indicate the specific configuration changes you make to the throttler, like so:
Examples:
Any changes made, are sent to all tablets of given keyspace, in all cells and all shards.
10secat current configuration).Configuration and backwards compatibility
Today, the throttler is controlled per-tablet via
vttabletcommand line flags:enable_lag_throttlerthrottle_thresholdthrottle_metrics_threshold(used when metrics query is defined, overrides the above, and that's confusing)throttle_metrics_querythrottle_check_as_check_selfThe above five flags are consolidated into four in the new
ThrottlerConfigproto:For backwards compatibility, the existing
vttabletflags are still accepted, but will be deprecated in the future. A newvttabletflag,--throttler-config-via-topo, indicates that aSrvKeyspace_ThrottlerConfigconfiguration (i.e. configuration stored intopo) overrides the above flags. The way to transition into the new setup is to first run your vitess cluster with existing configuration, untouched. Then, populate topo with the new config viavtctldclient UpdateThrottlerConfigas described above. Then, add--throttler-config-via-topoand restart tablets.show vitess_throttler status
The command
show vitess_throttler statusretrieves throttler status from tablets in all cells and shards. To clarify, the command does not read anything fromtopo. The command represents how the tablets are actually running the throttler. Is it enabled? Disabled? What's the threshold?At this time the query is only sent to
PRIMARYtablets, but we will follow up and make it run on all tablets.Tests
The main test in this PR is the new
go/test/endtoend/tabletmanager/throttler_topo/throttler_test.go, which is tested in a new CI workflow/shard calledtabletmanager_throttler_topo.This new test runs the throttler, changes configuration dynamically, enables, disables, changes threshold, changes the metrics query, etc. etc.
This PR also has a minor effect on on-demand heatbeats to ensure they get an initial "kick" upon startup, and both
tabletmanager_throttlerandtabletmanager_throttler_custom_configare adjusted accordingly. In the future, we will delete those two tests/workflows/shards, and keep onlytabletmanager_throttler_topo.Related Issue(s)
#11316
Checklist
Deployment Notes