Add stats for shards watched by VTOrc, purge stale shards#17815
Add stats for shards watched by VTOrc, purge stale shards#17815deepthi merged 12 commits intovitessio:mainfrom
Conversation
Review ChecklistHello reviewers! 👋 Please follow this checklist when reviewing this Pull Request. General
Tests
Documentation
New flags
If a workflow is added or modified:
Backward compatibility
|
There was a problem hiding this comment.
refreshAllShards does a save + read + delete, per keyspace. Lets make sure that doesn't happen concurrently
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #17815 +/- ##
==========================================
- Coverage 67.94% 67.44% -0.50%
==========================================
Files 1586 1594 +8
Lines 255224 259058 +3834
==========================================
+ Hits 173420 174731 +1311
- Misses 81804 84327 +2523 ☔ View full report in Codecov by Sentry. |
6942354 to
c65971e
Compare
Signed-off-by: Tim Vaillancourt <tim@timvaillancourt.com>
08f4c4b to
eae50cb
Compare
Signed-off-by: Tim Vaillancourt <tim@timvaillancourt.com>
|
This is working nicely on our fork: vitess@REDACTED:/vt$ curl -s localhost:15000/metrics | grep vtorc_shards_watched | head -10
# HELP vtorc_shards_watched Keyspace/shards currently watched
# TYPE vtorc_shards_watched gauge
vtorc_shards_watched{keyspace="REDACTED",shard="-0040"} 1
vtorc_shards_watched{keyspace="REDACTED",shard="0040-0080"} 1
vtorc_shards_watched{keyspace="REDACTED",shard="0080-00c0"} 1
vtorc_shards_watched{keyspace="REDACTED",shard="00c0-0100"} 1
vtorc_shards_watched{keyspace="REDACTED",shard="0100-0140"} 1
vtorc_shards_watched{keyspace="REDACTED",shard="0140-0180"} 1
vtorc_shards_watched{keyspace="REDACTED",shard="0180-01c0"} 1
vtorc_shards_watched{keyspace="REDACTED",shard="01c0-0200"} 1 |
Signed-off-by: Tim Vaillancourt <tim@timvaillancourt.com>
|
After testing internally this PR isn't quite ready It seems |
@GuptaManan100 fixed this issue above. Will test on real hosts tomorrow. Re-requested review 🙇 |
Signed-off-by: Tim Vaillancourt <tim@timvaillancourt.com>
Signed-off-by: Tim Vaillancourt <tim@timvaillancourt.com>
Signed-off-by: Tim Vaillancourt <tim@timvaillancourt.com>
Signed-off-by: Tim Vaillancourt <tim@timvaillancourt.com>
Signed-off-by: Tim Vaillancourt <tim@timvaillancourt.com>
Signed-off-by: Tim Vaillancourt <tim@timvaillancourt.com>
…7815) Signed-off-by: Tim Vaillancourt <tim@timvaillancourt.com>
…7815) Signed-off-by: Tim Vaillancourt <tim@timvaillancourt.com>
* `vtorc`: remove duplicate instance read from backend (vitessio#17834) Signed-off-by: Tim Vaillancourt <tim@timvaillancourt.com> * `vtorc`: add index for `inst.ReadInstanceClusterAttributes` table scan (vitessio#17866) Signed-off-by: Tim Vaillancourt <tim@timvaillancourt.com> * Add stats for shards watched by VTOrc, purge stale shards (vitessio#17815) Signed-off-by: Tim Vaillancourt <tim@timvaillancourt.com> * Remove unused code in discovery queue creation (vitessio#17515) Signed-off-by: Manan Gupta <manan@planetscale.com> * `vtorc`: cleanup discover queue, add concurrency flag (vitessio#17825) Signed-off-by: Tim Vaillancourt <tim@timvaillancourt.com> * make compat with v21 Signed-off-by: Tim Vaillancourt <tim@timvaillancourt.com> * `vtorc`: use `errgroup` in keyspace/shard discovery (vitessio#17857) Signed-off-by: Tim Vaillancourt <tim@timvaillancourt.com> * missing `sync` import Signed-off-by: Tim Vaillancourt <tim@timvaillancourt.com> * `vtorc`: skip unnecessary backend read in `logic.LockShard(...)` (vitessio#17900) Signed-off-by: Tim Vaillancourt <tim@timvaillancourt.com> * `vtorc`: add tablets watched stats (vitessio#17911) Signed-off-by: Tim Vaillancourt <tim@timvaillancourt.com> * `vtorc`: add stats for discovery workers (vitessio#17937) Signed-off-by: Tim Vaillancourt <tim@timvaillancourt.com> --------- Signed-off-by: Tim Vaillancourt <tim@timvaillancourt.com> Signed-off-by: Manan Gupta <manan@planetscale.com> Co-authored-by: Manan Gupta <35839558+GuptaManan100@users.noreply.github.com>
…d Journal Events (#585) * VTGate VStream: Ensure reasonable delivery time for reshard journal event (vitessio#16639) Signed-off-by: Malcolm Akinje <malcolm.akinje@gmail.com> Signed-off-by: Malcolm Akinje <makinje@slack-corp.com> * Backport sqlparser patch for v15->v19 upgrade: 14763 Fix accepting bind variables in time related function calls (#590) * Fix accepting bind variables in time related function calls. (vitessio#14763) Signed-off-by: Manan Gupta <manan@planetscale.com> * fix test --------- Signed-off-by: Manan Gupta <manan@planetscale.com> Co-authored-by: Manan Gupta <35839558+GuptaManan100@users.noreply.github.com> * Upgrade vitess addons to 0.19.8 (#591) This upgrade allows us to control whether vtorc raises problems or not via an environment variable. Signed-off-by: Eduardo J. Ortega U. <5791035+ejortegau@users.noreply.github.com> * Use prefix in all vtorc check and recover logs (vitessio#17526) (#592) This is a backport of vitessio#17526 . Original PR description below: Description This is meant to make recovery actions more easily identified from the logs. See vitessio#17465 Signed-off-by: Eduardo J. Ortega U. <5791035+ejortegau@users.noreply.github.com> * `slack-19.0`: various backports for `vtorc`, part 2 (#596) * Ensure all topo read calls consider `--topo_read_concurrency` (vitessio#17276) Signed-off-by: Tim Vaillancourt <tim@timvaillancourt.com> * Revert "add keyrange support for vtorc clusters_to_watch (#457)" This reverts commit 45c2199. * [release-19.0] `vtorc`: require topo for `Healthy: true` in `/debug/health` (vitessio#17129) (vitessio#17351) Signed-off-by: Tim Vaillancourt <tim@timvaillancourt.com> Signed-off-by: Manan Gupta <manan@planetscale.com> Co-authored-by: vitess-bot[bot] <108069721+vitess-bot[bot]@users.noreply.github.com> Co-authored-by: Tim Vaillancourt <tim@timvaillancourt.com> Co-authored-by: Manan Gupta <manan@planetscale.com> * `vtorc`: fetch all tablets from cells once + filter during refresh (vitessio#17388) Signed-off-by: Tim Vaillancourt <tim@timvaillancourt.com> * Support KeyRange in `--clusters_to_watch` flag (vitessio#17604) Signed-off-by: Manan Gupta <manan@planetscale.com> * missing func Signed-off-by: Tim Vaillancourt <tim@timvaillancourt.com> * Add api end point to print the current database state in VTOrc (vitessio#15485) Signed-off-by: Manan Gupta <manan@planetscale.com> --------- Signed-off-by: Tim Vaillancourt <tim@timvaillancourt.com> Signed-off-by: Manan Gupta <manan@planetscale.com> Co-authored-by: vitess-bot[bot] <108069721+vitess-bot[bot]@users.noreply.github.com> Co-authored-by: Manan Gupta <manan@planetscale.com> Co-authored-by: Manan Gupta <35839558+GuptaManan100@users.noreply.github.com> * `slack-19.0`: `vtorc`: improve handling of partial cell topo results (#599) * `vtorc`: improve handling of partial cell topo results Signed-off-by: Tim Vaillancourt <tim@timvaillancourt.com> * add unit test Signed-off-by: Tim Vaillancourt <tim@timvaillancourt.com> * improve test Signed-off-by: Tim Vaillancourt <tim@timvaillancourt.com> * add comments Signed-off-by: Tim Vaillancourt <tim@timvaillancourt.com> * move sort to test Signed-off-by: Tim Vaillancourt <tim@timvaillancourt.com> * goimports Signed-off-by: Tim Vaillancourt <tim@timvaillancourt.com> --------- Signed-off-by: Tim Vaillancourt <tim@timvaillancourt.com> * `slack-19.0`: skip tests that will fail on v15 downgrade testing (#605) Signed-off-by: Tim Vaillancourt <tim@timvaillancourt.com> * `slack-19.0`: Add stats for shards watched by VTOrc (#606) * Add stats for shards watched by VTOrc Signed-off-by: Tim Vaillancourt <tim@timvaillancourt.com> * Use len() in make --------- Signed-off-by: Tim Vaillancourt <tim@timvaillancourt.com> * Add `GetServerStatus` RPC to use in PRS (vitessio#16022) (#607) Signed-off-by: Manan Gupta <manan@planetscale.com> Co-authored-by: Manan Gupta <35839558+GuptaManan100@users.noreply.github.com> * backport/patch connection pool bug/perf fixes (#604) * [release-19.0] smartconnpool: do not allow connections to starve (vitessio#17675) (vitessio#17683) Signed-off-by: Dirkjan Bussink <d.bussink@gmail.com> Co-authored-by: vitess-bot[bot] <108069721+vitess-bot[bot]@users.noreply.github.com> * smartconnpool: Better handling for idle expiration (vitessio#17756) Signed-off-by: Vicent Marti <vmg@strn.cat> --------- Signed-off-by: Dirkjan Bussink <d.bussink@gmail.com> Signed-off-by: Vicent Marti <vmg@strn.cat> Co-authored-by: vitess-bot[bot] <108069721+vitess-bot[bot]@users.noreply.github.com> Co-authored-by: Vicent Martí <42793+vmg@users.noreply.github.com> Co-authored-by: Tim Vaillancourt <tim@timvaillancourt.com> * pool: reopen connection closed by idle timeout (vitessio#17818) (#609) Signed-off-by: Harshit Gangal <harshit@planetscale.com> Signed-off-by: Vicent Martí <42793+vmg@users.noreply.github.com> Co-authored-by: Harshit Gangal <harshit@planetscale.com> Co-authored-by: Vicent Martí <42793+vmg@users.noreply.github.com> * VReplication: Support excluding lagging tablets and use this in vstream manager (vitessio#17835) (#612) * `slack-19.0`: backport v22 VTOrc optimizations, part 2 (#613) * `vtorc`: remove duplicate instance read from backend (vitessio#17834) Signed-off-by: Tim Vaillancourt <tim@timvaillancourt.com> * `vtorc`: add index for `inst.ReadInstanceClusterAttributes` table scan Signed-off-by: Tim Vaillancourt <tim@timvaillancourt.com> --------- Signed-off-by: Tim Vaillancourt <tim@timvaillancourt.com> * Add stats for shards watched by VTOrc, purge stale shards (vitessio#17815) (#616) * --consolidator-query-waiter-cap to set the max number of waiter for consolidated query (vitessio#17244) (#614) Signed-off-by: Jun Wang <jun.wang@demonware.net> Signed-off-by: Tim Vaillancourt <tim@timvaillancourt.com> Co-authored-by: jwang <121262788+jwangace@users.noreply.github.com> Co-authored-by: Jun Wang <jun.wang@demonware.net> * `slack-19.0` backport v22 `vtorc` optimizations + stats, part 3 (#618) * Remove unused code in discovery queue creation (vitessio#17515) Signed-off-by: Manan Gupta <manan@planetscale.com> * vtorc: Cleanup unused code (vitessio#15508) Signed-off-by: Dirkjan Bussink <d.bussink@gmail.com> * `vtorc`: cleanup discover queue, add concurrency flag (vitessio#17825) Signed-off-by: Tim Vaillancourt <tim@timvaillancourt.com> * `vtorc`: add tablets watched stats Signed-off-by: Tim Vaillancourt <tim@timvaillancourt.com> * fix missing merge conflict update Signed-off-by: Tim Vaillancourt <tim@timvaillancourt.com> * `vtorc`: skip unnecessary `inst.ReadTablet` in `logic.LockShard(...)` Signed-off-by: Tim Vaillancourt <tim@timvaillancourt.com> * `vtorc`: use `errgroup` in keyspace/shard discovery Signed-off-by: Tim Vaillancourt <tim@timvaillancourt.com> * fix import Signed-off-by: Tim Vaillancourt <tim@timvaillancourt.com> * fix ineffassign Signed-off-by: Tim Vaillancourt <tim@timvaillancourt.com> * missing import Signed-off-by: Tim Vaillancourt <tim@timvaillancourt.com> * `vtorc`: add stats for discovery workers Signed-off-by: Tim Vaillancourt <tim@timvaillancourt.com> * get count from backend Signed-off-by: Tim Vaillancourt <tim@timvaillancourt.com> * rm unused map Signed-off-by: Tim Vaillancourt <tim@timvaillancourt.com> --------- Signed-off-by: Manan Gupta <manan@planetscale.com> Signed-off-by: Dirkjan Bussink <d.bussink@gmail.com> Signed-off-by: Tim Vaillancourt <tim@timvaillancourt.com> Co-authored-by: Manan Gupta <35839558+GuptaManan100@users.noreply.github.com> Co-authored-by: Dirkjan Bussink <d.bussink@gmail.com> * Bp pr 17558 pr 17858.slack19.0 (#615) * VReplication: Improve error handling in VTGate VStreams (vitessio#17558) Signed-off-by: Tom Thornton <thomaswilliamthornton@gmail.com> * Backport vitessio#17858 --------- Signed-off-by: Tom Thornton <thomaswilliamthornton@gmail.com> * `slack-19.0`: re-backport tweaks from vitessio#17911 (#621) * fix bug in reverse `if` Signed-off-by: Tim Vaillancourt <tim@timvaillancourt.com> * simplify Signed-off-by: Tim Vaillancourt <tim@timvaillancourt.com> * add `ReadTabletCountsByShard` test Signed-off-by: Tim Vaillancourt <tim@timvaillancourt.com> * use map of map Signed-off-by: Tim Vaillancourt <tim@timvaillancourt.com> * capitalize Cell Signed-off-by: Tim Vaillancourt <tim@timvaillancourt.com> * gofmt lint Signed-off-by: Tim Vaillancourt <tim@timvaillancourt.com> * fix plural in names Signed-off-by: Tim Vaillancourt <tim@timvaillancourt.com> --------- Signed-off-by: Tim Vaillancourt <tim@timvaillancourt.com> * fix releasing the global read lock when mysqlshell backup fails (vitessio#17000) (#623) Signed-off-by: Renan Rangel <rrangel@slack-corp.com> * VStream API: allow keyspace-level heartbeats to be streamed (vitessio#16593) (#620) * VStream API: allow keyspace-level heartbeats to be streamed (vitessio#16593) Signed-off-by: Malcolm Akinje <makinje@slack-corp.com> * `slack-19.0` backport v22 `vtorc` optimizations + stats, part 3 (#618) * Remove unused code in discovery queue creation (vitessio#17515) Signed-off-by: Manan Gupta <manan@planetscale.com> * vtorc: Cleanup unused code (vitessio#15508) Signed-off-by: Dirkjan Bussink <d.bussink@gmail.com> * `vtorc`: cleanup discover queue, add concurrency flag (vitessio#17825) Signed-off-by: Tim Vaillancourt <tim@timvaillancourt.com> * `vtorc`: add tablets watched stats Signed-off-by: Tim Vaillancourt <tim@timvaillancourt.com> * fix missing merge conflict update Signed-off-by: Tim Vaillancourt <tim@timvaillancourt.com> * `vtorc`: skip unnecessary `inst.ReadTablet` in `logic.LockShard(...)` Signed-off-by: Tim Vaillancourt <tim@timvaillancourt.com> * `vtorc`: use `errgroup` in keyspace/shard discovery Signed-off-by: Tim Vaillancourt <tim@timvaillancourt.com> * fix import Signed-off-by: Tim Vaillancourt <tim@timvaillancourt.com> * fix ineffassign Signed-off-by: Tim Vaillancourt <tim@timvaillancourt.com> * missing import Signed-off-by: Tim Vaillancourt <tim@timvaillancourt.com> * `vtorc`: add stats for discovery workers Signed-off-by: Tim Vaillancourt <tim@timvaillancourt.com> * get count from backend Signed-off-by: Tim Vaillancourt <tim@timvaillancourt.com> * rm unused map Signed-off-by: Tim Vaillancourt <tim@timvaillancourt.com> --------- Signed-off-by: Manan Gupta <manan@planetscale.com> Signed-off-by: Dirkjan Bussink <d.bussink@gmail.com> Signed-off-by: Tim Vaillancourt <tim@timvaillancourt.com> Co-authored-by: Manan Gupta <35839558+GuptaManan100@users.noreply.github.com> Co-authored-by: Dirkjan Bussink <d.bussink@gmail.com> * Bp pr 17558 pr 17858.slack19.0 (#615) * VReplication: Improve error handling in VTGate VStreams (vitessio#17558) Signed-off-by: Tom Thornton <thomaswilliamthornton@gmail.com> * Backport vitessio#17858 --------- Signed-off-by: Tom Thornton <thomaswilliamthornton@gmail.com> * `slack-19.0`: re-backport tweaks from vitessio#17911 (#621) * fix bug in reverse `if` Signed-off-by: Tim Vaillancourt <tim@timvaillancourt.com> * simplify Signed-off-by: Tim Vaillancourt <tim@timvaillancourt.com> * add `ReadTabletCountsByShard` test Signed-off-by: Tim Vaillancourt <tim@timvaillancourt.com> * use map of map Signed-off-by: Tim Vaillancourt <tim@timvaillancourt.com> * capitalize Cell Signed-off-by: Tim Vaillancourt <tim@timvaillancourt.com> * gofmt lint Signed-off-by: Tim Vaillancourt <tim@timvaillancourt.com> * fix plural in names Signed-off-by: Tim Vaillancourt <tim@timvaillancourt.com> --------- Signed-off-by: Tim Vaillancourt <tim@timvaillancourt.com> --------- Signed-off-by: Malcolm Akinje <makinje@slack-corp.com> Signed-off-by: Manan Gupta <manan@planetscale.com> Signed-off-by: Dirkjan Bussink <d.bussink@gmail.com> Signed-off-by: Tim Vaillancourt <tim@timvaillancourt.com> Signed-off-by: Tom Thornton <thomaswilliamthornton@gmail.com> Signed-off-by: Malcolm Akinje <malcolm.akinje@gmail.com> Co-authored-by: Tim Vaillancourt <tim@timvaillancourt.com> Co-authored-by: Manan Gupta <35839558+GuptaManan100@users.noreply.github.com> Co-authored-by: Dirkjan Bussink <d.bussink@gmail.com> Co-authored-by: Tom Thornton <thomaswilliamthornton@gmail.com> * Increase health check channel buffer (vitessio#17821) (#625) Signed-off-by: Manan Gupta <manan@planetscale.com> Signed-off-by: Malcolm Akinje <makinje@slack-corp.com> Co-authored-by: Manan Gupta <35839558+GuptaManan100@users.noreply.github.com> * VStream: Allow for automatic resume after Reshard across VStreams (vitessio#15393) (#627) Signed-off-by: Tanjin Xu <tanjin.xu@slack-corp.com> Co-authored-by: Matt Lord <mattalord@gmail.com> --------- Signed-off-by: Malcolm Akinje <malcolm.akinje@gmail.com> Signed-off-by: Malcolm Akinje <makinje@slack-corp.com> Signed-off-by: Manan Gupta <manan@planetscale.com> Signed-off-by: Eduardo J. Ortega U. <5791035+ejortegau@users.noreply.github.com> Signed-off-by: Tim Vaillancourt <tim@timvaillancourt.com> Signed-off-by: Dirkjan Bussink <d.bussink@gmail.com> Signed-off-by: Vicent Marti <vmg@strn.cat> Signed-off-by: Harshit Gangal <harshit@planetscale.com> Signed-off-by: Vicent Martí <42793+vmg@users.noreply.github.com> Signed-off-by: Jun Wang <jun.wang@demonware.net> Signed-off-by: Tom Thornton <thomaswilliamthornton@gmail.com> Signed-off-by: Renan Rangel <rrangel@slack-corp.com> Signed-off-by: Tanjin Xu <tanjin.xu@slack-corp.com> Co-authored-by: Tanjin Xu <109303790+tanjinx@users.noreply.github.com> Co-authored-by: Manan Gupta <35839558+GuptaManan100@users.noreply.github.com> Co-authored-by: Eduardo J. Ortega U. <5791035+ejortegau@users.noreply.github.com> Co-authored-by: Tim Vaillancourt <tim@timvaillancourt.com> Co-authored-by: vitess-bot[bot] <108069721+vitess-bot[bot]@users.noreply.github.com> Co-authored-by: Manan Gupta <manan@planetscale.com> Co-authored-by: Vicent Martí <42793+vmg@users.noreply.github.com> Co-authored-by: Harshit Gangal <harshit@planetscale.com> Co-authored-by: Tom Thornton <thomaswilliamthornton@gmail.com> Co-authored-by: jwang <121262788+jwangace@users.noreply.github.com> Co-authored-by: Jun Wang <jun.wang@demonware.net> Co-authored-by: Dirkjan Bussink <d.bussink@gmail.com> Co-authored-by: Renan Rangel <rvrangel@users.noreply.github.com> Co-authored-by: Matt Lord <mattalord@gmail.com>
Description
This PR adds stats for what keyspace/shards are being watched by VTOrc, populated using the backend database
While adding this support I noticed records are never deleted from the backend
vitess_shardtable, so I added support for stale records to be deleted in a similar way to tablets: by recording what was successfully "saved" in the latest polling and removing anything that was not in that listFinally, shards that do not meet the
--clusters_to_watchkey-ranges will not be saved to the backend. Currently all shards of a keyspace are storedRelated Issue(s)
Closes #17816
Checklist
Deployment Notes