add option for warming reads to mirror primary read queries onto replicas from vtgates to warm bufferpools#13206
Conversation
Review ChecklistHello reviewers! 👋 Please follow this checklist when reviewing this Pull Request. General
If a new flag is being introduced:
If a workflow is added or modified:
Bug fixes
Non-trivial changes
New/Existing features
Backward compatibility
|
|
@olyazavr this is really cool! 🎉 I wonder if this could more "observable". Is it possible to add Assuming |
|
@timvaillancourt : metrics in the vtgate about warm-up might generate a large metric volume, which makes me think it is not the best place to have this observability. What would you think of putting these metrics on the vttablet ? |
@jfg956 that's a good idea that has given me a few more ideas 👍 I think a single I think
Last, a deferrable new question: Is there an upper limit to how many queries |
|
@timvaillancourt those are all really good ideas! I'm waiting to get some sort of approval or comment from Vitess team that this is even a feature they would consider merging before I go and implement more parts of this |
@olyazavr makes sense! They're all deferrable ideas, don't let them block things 👍. Let me know if I can help with anything |
maxenglander
left a comment
There was a problem hiding this comment.
Hey @olyazavr this is awesome, and something I am excited to see land. I work at PlanetScale, my review does not carry the power of approval. I think you can continue to hold off on making changes until someone from the Vitess eng. team chimes in. That said, left some questions and small suggestions.
At a higher-level, had some bigger questions:
- From your experience using this in production, what CPU/memory impact have you observed on VTGates? What about overall performance?
- I think it might be nice to have the option to change this value at runtime. If setting 1-2% for warming has a noticeable impact on overall performance, this might be something we only want to set during a maintenance window prior to PRS. No idea how much work it would be to incorporate structural support for this in your PR. If it's a lot I imagine it could be added later on.
I know that the Vitess eng. team will eventually request an addition to changelog/ and a website PR, so noting that as well. I also think adding or modifying existing E2E tests would be a good idea and something they will likely request.
go/vt/vtgate/engine/route.go
Outdated
There was a problem hiding this comment.
+1 to the idea in the main thread to have a warming pool or something else that caps the inflight warming requests.
go/vt/vtgate/executor_select_test.go
Outdated
There was a problem hiding this comment.
Looks like missing mysqlCtx argument between the context and the method.
go/vt/vtgate/vcursor_impl.go
Outdated
There was a problem hiding this comment.
Might be nice to pull this 5*time.Second up into a constant, make it configurable by flag, or maybe default to --query-timeout.
There was a problem hiding this comment.
Also, not sure if this is something we could/woud want to use:
vitess/go/vt/vtgate/engine/route.go
Line 186 in 344280a
There was a problem hiding this comment.
Or this?
vitess/go/vt/vtgate/safe_session.go
Line 206 in aab7c89
go/vt/vtgate/executor_select_test.go
Outdated
There was a problem hiding this comment.
Would be nice to modify this test so that it runs through a bunch of different cases which exercise the various case statements you have in executeWarmingReplicaRead, and also some tests for negatives like insert, update, and queries that compose select with insert and update.
There was a problem hiding this comment.
Would also be good to add tests with trailing comments to see how that lands on the replicas.
go/vt/vtgate/executor_select_test.go
Outdated
There was a problem hiding this comment.
Query serving team may disagree to me, but might be good to set this to a number greater than 0 to help shake out any issues over the next release cycle.
There was a problem hiding this comment.
This would create nondeterministic replica queries, which can make for flakey tests (we encountered this problem)
go/vt/vtgate/executor_select_test.go
Outdated
There was a problem hiding this comment.
Same thought here and everywhere else.
go/vt/vtgate/engine/primitive.go
Outdated
There was a problem hiding this comment.
Why return interface{} here instead of VCursor?
go/vt/vtgate/engine/route.go
Outdated
There was a problem hiding this comment.
Might be good to add some stats to add visibility into this.
There was a problem hiding this comment.
Ah I see in the main conversation now that there's talk about getting this at the tablet level. FWIW I think it could be useful here in case for whatever reason the queries fail to reach the tablets.
Good point. I think the viper work is actually ready, if you want to use that to make this dynamically configurable. |
GuptaManan100
left a comment
There was a problem hiding this comment.
I really like the idea of warming reads!
go/vt/vtgate/engine/route.go
Outdated
There was a problem hiding this comment.
MultiEqual might also be something to consider in the opcodes to allow.
|
This PR is being marked as stale because it has been open for 30 days with no activity. To rectify, you may do any of the following:
If no action is taken within 7 days, this PR will be closed. |
|
This PR was closed because it has been stale for 7 days with no activity. |
d56d434 to
c8576ca
Compare
GuptaManan100
left a comment
There was a problem hiding this comment.
I was able to track down one of the unit test failures to this. Because we aren't passing in the context that we cancel later, instead we pass in context.Background this causes the topo watchers to not shutdown properly, which eventually show up as leaked golang threads -
noleak.go:56: found unexpected goroutines:
[Goroutine 50 in state chan receive, with vitess.io/vitess/go/vt/topo/memorytopo.(*Conn).Watch.func1 on top of the stack:
goroutine 50 [chan receive]:
vitess.io/vitess/go/vt/topo/memorytopo.(*Conn).Watch.func1()
/Users/manangupta/vitess/go/vt/topo/memorytopo/watch.go:56 +0x5c
created by vitess.io/vitess/go/vt/topo/memorytopo.(*Conn).Watch in goroutine 15
/Users/manangupta/vitess/go/vt/topo/memorytopo/watch.go:55 +0x250
Goroutine 51 in state chan receive, with vitess.io/vitess/go/vt/topo.(*Server).WatchSrvVSchema.func1 on top of the stack:
goroutine 51 [chan receive]:
vitess.io/vitess/go/vt/topo.(*Server).WatchSrvVSchema.func1()
/Users/manangupta/vitess/go/vt/topo/srv_vschema.go:74 +0x9c
created by vitess.io/vitess/go/vt/topo.(*Server).WatchSrvVSchema in goroutine 15
/Users/manangupta/vitess/go/vt/topo/srv_vschema.go:70 +0x14c
Goroutine 52 in state select, with vitess.io/vitess/go/vt/vtgate.(*sandboxTopo).WatchSrvVSchema.func1 on top of the stack:
goroutine 52 [select]:
vitess.io/vitess/go/vt/vtgate.(*sandboxTopo).WatchSrvVSchema.func1()
/Users/manangupta/vitess/go/vt/vtgate/sandbox_test.go:323 +0x94
created by vitess.io/vitess/go/vt/vtgate.(*sandboxTopo).WatchSrvVSchema in goroutine 15
/Users/manangupta/vitess/go/vt/vtgate/sandbox_test.go:321 +0x16c
Goroutine 59 in state chan receive, with vitess.io/vitess/go/vt/topo/memorytopo.(*Conn).Watch.func1 on top of the stack:
goroutine 59 [chan receive]:
vitess.io/vitess/go/vt/topo/memorytopo.(*Conn).Watch.func1()
/Users/manangupta/vitess/go/vt/topo/memorytopo/watch.go:56 +0x5c
created by vitess.io/vitess/go/vt/topo/memorytopo.(*Conn).Watch in goroutine 15
/Users/manangupta/vitess/go/vt/topo/memorytopo/watch.go:55 +0x250
Goroutine 60 in state chan receive, with vitess.io/vitess/go/vt/topo.(*Server).WatchSrvVSchema.func1 on top of the stack:
goroutine 60 [chan receive]:
vitess.io/vitess/go/vt/topo.(*Server).WatchSrvVSchema.func1()
/Users/manangupta/vitess/go/vt/topo/srv_vschema.go:74 +0x9c
created by vitess.io/vitess/go/vt/topo.(*Server).WatchSrvVSchema in goroutine 15
/Users/manangupta/vitess/go/vt/topo/srv_vschema.go:70 +0x14c
Goroutine 61 in state select, with vitess.io/vitess/go/vt/vtgate.(*sandboxTopo).WatchSrvVSchema.func1 on top of the stack:
goroutine 61 [select]:
vitess.io/vitess/go/vt/vtgate.(*sandboxTopo).WatchSrvVSchema.func1()
/Users/manangupta/vitess/go/vt/vtgate/sandbox_test.go:323 +0x94
created by vitess.io/vitess/go/vt/vtgate.(*sandboxTopo).WatchSrvVSchema in goroutine 15
/Users/manangupta/vitess/go/vt/vtgate/sandbox_test.go:321 +0x16c
]
|
I would have committed these changes directly, but unfortunately I don't have the access to do that 😢 |
| utils.MustMatch(t, wantQueriesReplica, replica.Queries) | ||
| replica.Queries = nil | ||
|
|
||
| _, err = executor.Execute(ctx, nil, "TestSelect", session, "insert into user (age, city) values (5, 'Boston')", map[string]*querypb.BindVariable{}) |
There was a problem hiding this comment.
| _, err = executor.Execute(ctx, nil, "TestSelect", session, "insert into user (age, city) values (5, 'Boston')", map[string]*querypb.BindVariable{}) | |
| _, err = executor.Execute(ctx, nil, "TestWarmingReads", session, "insert into user (age, city) values (5, 'Boston')", map[string]*querypb.BindVariable{}) |
GuptaManan100
left a comment
There was a problem hiding this comment.
TestHelpOutput is also failing -
--- FAIL: TestHelpOutput (0.70s)
--- FAIL: TestHelpOutput/vtcombo (0.06s)
flags_test.go:142: []: (-want +got)
(
"""
... // 421 identical lines
--vttablet_skip_buildinfo_tags string comma-separated list of buildinfo tags to skip from merging with --init_tags. each tag is either an exact match or a regular expression of the form '/regexp/'. (default "/.*/")
--wait_for_backup_interval duration (init restore parameter) if this is greater than 0, instead of starting up empty when no backups are found, keep checking at this interval for a backup to appear
+ --warming-reads-percent int Percentage of reads on the primary to forward to replicas. Useful for keeping buffer pools warm (default 0)
+ --warming-reads-pool-size int Size of goroutine pool for warming reads (default 500) (default 500)
+ --warming-reads-query-timeout duration Timeout of warming read queries (default 5s) (default 5s)
--warn_memory_rows int Warning threshold for in-memory results. A row count higher than this amount will cause the VtGateWarnings.ResultsExceeded counter to be incremented. (default 30000)
--warn_payload_size int The warning threshold for query payloads in bytes. A payload greater than this threshold will cause the VtGateWarnings.WarnPayloadSizeExceeded counter to be incremented.
... // 11 identical lines
"""
)
--- FAIL: TestHelpOutput/vtgate (0.10s)
flags_test.go:142: []: (-want +got)
strings.Join({
... // 32472 identical bytes
"\n --warming-reads-pool-size int ",
" Size of goroutine pool for warming reads (default 500)",
+ " (default 500)",
"\n --warming-reads-query-timeout duration ",
" Timeout of warming read queri",
+ "es (d",
"e",
+ "fault 5",
"s",
+ ")",
" (default 5s)\n --warn_memory_rows int ",
" Warning threshold for in-memory results. ",
... // 563 identical bytes
}, "")
FAIL
This requires fixing the vtgate.txt and vtcombo.txt file in the flags/endtoend directory to match the new expectation.
Co-authored-by: Manan Gupta <35839558+GuptaManan100@users.noreply.github.com> Signed-off-by: Olga Shestopalova <olgash@mit.edu>
Signed-off-by: Olga Shestopalova <oshestopalova@hubspot.com>
go/flags/endtoend/vtcombo.txt
Outdated
| --vttablet_skip_buildinfo_tags string comma-separated list of buildinfo tags to skip from merging with --init_tags. each tag is either an exact match or a regular expression of the form '/regexp/'. (default "/.*/") | ||
| --wait_for_backup_interval duration (init restore parameter) if this is greater than 0, instead of starting up empty when no backups are found, keep checking at this interval for a backup to appear | ||
| --warming-reads-percent int Percentage of reads on the primary to forward to replicas. Useful for keeping buffer pools warm (default 0) | ||
| --warming-reads-pool-size int Size of goroutine pool for warming reads (default 500) |
There was a problem hiding this comment.
In Vitess, pools usually mean connection pools. When I read the description I thought it's a waitGroup or something like that. Actually it turns out to be a channel bool whose capacity is set by this flag.
The flag name and description are misleading. They need to be changed to reflect the actual usage. It should be something like --warming-reads-concurrency and be documented as Number of concurrent warming reads allowed.
There was a problem hiding this comment.
I edited the PR description at the top to list all 3 flags. Once these changes are made, that needs to change again.
Signed-off-by: Olga Shestopalova <oshestopalova@hubspot.com>
Signed-off-by: Olga Shestopalova <oshestopalova@hubspot.com>
|
So close. flags_test is still failing. you can fix that after addressing the feedback and then this will be |
|
@ajm188 given https://vitess.slack.com/archives/CMDJ2KFEZ/p1695982611699689 can we say that this PR does not need a separate website docs PR to update the flags? |
|
Yep that's correct! |
Signed-off-by: Olga Shestopalova <oshestopalova@hubspot.com>
8d4030b to
498d1dc
Compare
|
What else is needed for approval here? |
deepthi
left a comment
There was a problem hiding this comment.
Nice work! Thank you for the contribution!
Description
When reparenting to a replica, if that replica has recently been restarted, it will have a cold bufferpool, and for bufferpool-reliant workloads, this means a performance hit for a few minutes until the bufferpool of the new primary warms up.
As such, it would be great to have the ability to mirror a certain percentage of SELECTs from the current primary to the replicas at the vtgate level, so that when time comes to reparent, the replicas will have a warmer bufferpool than before and not suffer this consequence.
This adds three vtgate flags that can be used to enable and control mirroring a percentage of SELECTs from the current primary to replicas.
We've been running with this feature at HubSpot for years now and it has significantly improved performance during reparents/rolling restarts. Previously, rolling servers to release a fix/feature was risky and could impact running apps and customers, but now it's something invisible, largely because we are no longer reparenting to a replica with a cold bufferpool
Related Issue(s)
Fixes #13205
Checklist
Deployment Notes