Conversation
Review ChecklistHello reviewers! 👋 Please follow this checklist when reviewing this Pull Request. General
Tests
Documentation
New flags
If a workflow is added or modified:
Backward compatibility
|
79a26cd to
4cfab07
Compare
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #18777 +/- ##
==========================================
+ Coverage 69.73% 69.78% +0.04%
==========================================
Files 1608 1608
Lines 214776 214861 +85
==========================================
+ Hits 149781 149942 +161
+ Misses 64995 64919 -76 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
a note on valid values for |
mattlord
left a comment
There was a problem hiding this comment.
Thanks, @sbaker617 ! I do think that this logical behavior makes sense and we may end up making this behavior the default at some point in the future. But I also think that we should be very conservative here and start with it being experimental.
Let me know what you think. ❤️
| @@ -105,7 +106,8 @@ func registerInitFlags(fs *pflag.FlagSet) { | |||
| utils.SetFlagStringVar(fs, &tabletHostname, "tablet-hostname", tabletHostname, "if not empty, this hostname will be assumed instead of trying to resolve it") | |||
| utils.SetFlagStringVar(fs, &initKeyspace, "init-keyspace", initKeyspace, "(init parameter) keyspace to use for this tablet") | |||
| utils.SetFlagStringVar(fs, &initShard, "init-shard", initShard, "(init parameter) shard to use for this tablet") | |||
| utils.SetFlagStringVar(fs, &initTabletType, "init-tablet-type", initTabletType, "(init parameter) tablet type to use for this tablet. Valid values are: PRIMARY, REPLICA, SPARE, and RDONLY. The default is REPLICA.") | |||
| utils.SetFlagStringVar(fs, &initTabletType, "init-tablet-type", initTabletType, "(init parameter) the tablet type to use for this tablet. Can be REPLICA, RDONLY, or SPARE. The default is REPLICA.") | |||
| fs.BoolVar(&initTabletTypeLookup, "init-tablet-type-lookup", initTabletTypeLookup, "(optional, init parameter) if enabled, look up the tablet type from the existing topology record on restart and use that instead of init-tablet-type. This allows tablets to maintain their changed roles (e.g., RDONLY/DRAINED) across restarts. If disabled or if no topology record exists, init-tablet-type will be used.") | |||
There was a problem hiding this comment.
- I think that we need to mark this as
Experimental:in the flag description. This has the potential to cause a lot of unexpected edge cases. So this should be opt-in and communicate to the user that is interested in this behavior that there may be unexpected and undesirable side effects. Then that gives users that are still interested the opportunity to try it out and report any issues for follow-up work. - We should be more clear about what we're looking up here. We're looking up the tablet alias, which consists of:
a. The cell the vttablet is starting in
b. The--tablet-uidvalue passed to it
In your deployment environment (AFAIK) both of those are static, but in many deployments (e.g. k8s with the operator) they are not. Only a running process has live state, so persisting more of that (and such a key part) across restarts for the next process is, I don't think, quite so cut and dry as it may initially seem. And the tablet type of a process is changed for a very wide variety of reasons and in various scenarios by various entities so this is why I think that both of these points are important. For example, "DRAINED" is documented as:A tablet that has been reserved by a Vitess background process (such as rdonly tablets for resharding).Should that state persist to a new process for the same logical tablet? I'm not sure. What if the same logical tablet of e.g.zone1-100comes back up but with a different volume? RDONLY is certainly a simpler case. For most modern orchestrated deployments using the operator e.g., you define how many RDONLY tablets you want though and the system ensures you have N of them. So it could potentially start trying to add a new one as the old one is still coming back up or something. Anyway, lots of potential unexpected side effects which comes back to point 1. 🙂 Eventually we may well want to make this behavior the default, but I expect to have some follow-up work to address some edge/corner cases.
There was a problem hiding this comment.
Makes sense, I'm good with marking as Experimental, def more familiar with a setup where the aliases are tied to a specific instance.
There was a problem hiding this comment.
marked as Experimental and noted use of the tablet alias as the means of lookup. I kept the reference to RDONLY/DRAINED to highlight this would impact a range of types and might have unintended consequences... so it's a warning of sorts: cffa4e1
There was a problem hiding this comment.
In your deployment environment (AFAIK) both of those are static, but in many deployments (e.g. k8s with the operator) they are not.
Agreed, this worries me a little. I'm assuming dynamic/changing tablet aliases is the most common style of Vitess deployment out there (in raw numbers, etc), so this feature could be a risk to the most common deployment style
We can document somewhere that it's only safe for static tablet aliases but I think that can easily be missed and I can't think of another way to warn/etc those users 🤔. Thoughts appreciated
There was a problem hiding this comment.
We can document somewhere that it's only safe for static tablet aliases
I think marking as Experimental sends a strong signal that the flag shouldn't be enabled without proper testing/understanding of its functionality.
The current help message calls out that 'tablet alias' is used for the lookup:
(Experimental, init parameter) if enabled, uses tablet alias to look up the tablet type from the existing topology record on restart and use that instead of init-tablet-type. This allows tablets to maintain their changed roles (e.g., RDONLY/DRAINED) across restarts. If disabled or if no topology record exists, init-tablet-type will be used.
Hopefully that would give enough info for an env operator to know if this flag would be compatible with their deployment and would require testing before choosing to enable.
mattlord
left a comment
There was a problem hiding this comment.
LGTM! I only had minor comments and nits that you can address as you prefer.
Thanks, @sbaker617 !
Can you also please open a docs PR in the website repo? https://github.com/vitessio/website
We'll need to document this new flag in the 24.0 docs. Let me know if you need any help with that.
I'll then remove the "Needs Website Docs" label.
added doc PR here 👍🏻 |
| @@ -173,7 +173,8 @@ Flags: | |||
| --init-db-name-override string (init parameter) override the name of the db used by vttablet. Without this flag, the db name defaults to vt_<keyspacename> | |||
| --init-keyspace string (init parameter) keyspace to use for this tablet | |||
| --init-shard string (init parameter) shard to use for this tablet | |||
| --init-tablet-type string (init parameter) tablet type to use for this tablet. Valid values are: PRIMARY, REPLICA, SPARE, and RDONLY. The default is REPLICA. | |||
| --init-tablet-type string (init parameter) tablet type to use for this tablet. Valid values are: REPLICA, RDONLY, and SPARE. The default is REPLICA. | |||
| --init-tablet-type-lookup (Experimental, init parameter) if enabled, uses tablet alias to look up the tablet type from the existing topology record on restart and use that instead of init-tablet-type. This allows tablets to maintain their changed roles (e.g., RDONLY/DRAINED) across restarts. If disabled or if no topology record exists, init-tablet-type will be used. | |||
There was a problem hiding this comment.
@sbaker617 I feel this comment should be updated so Vitess users understand this feature won't work on Vitess operator deployments, where tablet records are not left around in the topo
I think this current description may lead to those users trying this and believing there is a bug
There was a problem hiding this comment.
@timvaillancourt what would you like to see included in the message to best guide Vitess operator users ?
There was a problem hiding this comment.
I'd rather not get into specifics like that. The operator's pruning behavior is configurable. There's a good chance that the replacement has a different alias (e.g. in another AZ) though too. But these are implementation details and examples of where the record would not be found and thus init-tablet-type is used.
|
@sbaker617 I feel this change needs an explanation in the v24 release notes as well: https://github.com/vitessio/vitess/blob/main/changelog/24.0/24.0.0/summary.md This change, in my opinion, is kind of awkward for the project because it will only work in environments where topo records are left behind after tablets are removed, which is not the common case, for example in the Vitess operator So a concern I have is this feature sounds like it will work for everyone, when in reality it only functions in a specific deployment scenario. Another general concern I have is how future changes/improvements to the logic may be hindered by having to support this split behaviour, but that is not a blocker per se |
Signed-off-by: Stephen Baker <s.baker@slack-corp.com>
Signed-off-by: Stephen Baker <s.baker@slack-corp.com>
Signed-off-by: Stephen Baker <s.baker@slack-corp.com>
Signed-off-by: Stephen Baker <s.baker@slack-corp.com>
Signed-off-by: Stephen Baker <s.baker@slack-corp.com>
Signed-off-by: Stephen Baker <s.baker@slack-corp.com>
Signed-off-by: Stephen Baker <s.baker@slack-corp.com>
Signed-off-by: Stephen Baker <s.baker@slack-corp.com>
Signed-off-by: Stephen Baker <s.baker@slack-corp.com>
Signed-off-by: Stephen Baker <s.baker@slack-corp.com>
Signed-off-by: Stephen Baker <s.baker@slack-corp.com>
Signed-off-by: Stephen Baker <s.baker@slack-corp.com>
Signed-off-by: Stephen Baker <s.baker@slack-corp.com>
Signed-off-by: Stephen Baker <s.baker@slack-corp.com>
b2d925e to
bc3087d
Compare
@timvaillancourt , added note to changelog here. Docs PR also updated with note. |
timvaillancourt
left a comment
There was a problem hiding this comment.
LGTM, thanks @sbaker617
|
|
||
| When enabled, the tablet uses its alias to look up the tablet type from the existing topology record on restart. This allows tablets to maintain their changed roles (e.g., RDONLY/DRAINED) across restarts without manual reconfiguration. If disabled or if no topology record exists, the standard `--init-tablet-type` value will be used instead. | ||
|
|
||
| **Note**: Vitess Operator–managed deployments generally do not keep tablet records in the topo between restarts, so this feature will not take effect in those environments. |
There was a problem hiding this comment.
I'm not sure how accurate or helpful this is in the end. I'm fine leaving it though.
Signed-off-by: Stephen Baker <s.baker@slack-corp.com> Signed-off-by: siddharth16396 <siddharth16396@gmail.com>
Description
This adds an optional
--init-tablet-type-lookupstartup flag forvttabletto implement functionality described in this issue.When present,
--init-tablet-type-lookupwill result in searching the topology for the current tablet being started. If found, will use the previous tablet type instead of the one passed with--init-tablet-type.Related Issue(s)
Checklist
Deployment Notes
AI Disclosure