Tablet throttler: post 18 refactoring, race condition fixes, unit & race testing, deprecation of HTTP checks#14181
Conversation
Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
…s); exhaust Operate() channels (mostly for unit tests) Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
…sts); 'select' on ctx.Done() when writing to channels Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
Review ChecklistHello reviewers! 👋 Please follow this checklist when reviewing this Pull Request. General
Tests
Documentation
New flags
If a workflow is added or modified:
Backward compatibility
|
Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
|
This PR is being marked as stale because it has been open for 30 days with no activity. To rectify, you may do any of the following:
If no action is taken within 7 days, this PR will be closed. |
|
Request for review 🙏 |
Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
mattlord
left a comment
There was a problem hiding this comment.
Thanks! ❤️ I only had minor comments/questions/nits so will leave it up to you how to best resolve.
|
|
||
| // Probes maps instances to probe(s) | ||
| type Probes map[InstanceKey](*Probe) | ||
| // Probes maps tablet aliases to probe(s) |
There was a problem hiding this comment.
Not sure if it's worth it or not, but we could use the *topodatapb.TabletAlias type as the key instead of string. String is fine too, IMO.
There was a problem hiding this comment.
Yeah it's a good point. It will be yet another somewhat significant change what with all the tests. I'm happy to give this a go in a followup PR, as this PR is already bloated as it is.
| defer requestCancel() | ||
| throttlerConfig, err := throttler.readThrottlerConfig(requestCtx) | ||
| if err == nil { | ||
| log.Errorf("Throttler.retryReadAndApplyThrottlerConfig(): success reading throttler config: %+v", throttlerConfig) |
| for { | ||
| if !throttler.IsOpen() { | ||
| // Throttler is not open so no need to keep retrying. | ||
| log.Errorf("Throttler.retryReadAndApplyThrottlerConfig(): throttler no longer seems to be open, exiting") |
There was a problem hiding this comment.
Shouldn't this be INFO or maybe WARNING?
| select { | ||
| case <-ctx.Done(): | ||
| // Throttler is not open so no need to keep retrying. | ||
| log.Errorf("Throttler.retryReadAndApplyThrottlerConfig(): throttler no longer seems to be open, exiting") |
There was a problem hiding this comment.
This feels like WARNING to me.
| if throttler.IsOpen() { | ||
| if throttler.tabletTypeFunc() == topodatapb.TabletType_PRIMARY { |
There was a problem hiding this comment.
I think it would be slightly more readable if it was if throttler.IsOpen() && throttler.tabletTypeFunc() == topodatapb.TabletType_PRIMARY {
| probe := probe | ||
| go func() { |
There was a problem hiding this comment.
IMO it's nicer to pass the value into the groutine function, but this is also fine. Personal pref I suppose, I just find it more clear/deliberate/obvious. Maybe we can just add a comment about the closure issue instead for future readers so they don't delete the line as "unneeded". :-)
There was a problem hiding this comment.
rewritten. The one thing I don't like about the pass-the-value-to-goroutine paradigm, is that the value is passed 20 lines of code below. Otherwise I recognize it's the dominant paradigm.
| ) | ||
|
|
||
| // TestProbesPostDisable runs the throttler for some time, and then investigates the internal throttler maps and values. | ||
| // While the therottle is disabled, it is technically safe to iterate those structures. However, `go test -race` disagrees, |
There was a problem hiding this comment.
fixed. I'm now working with a new spellchecker that is not as intrusive as others I've worked with, let's see how it improves my typos rate.
Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
…ace testing, deprecation of HTTP checks (vitessio#14181) Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
Description
Followup to #13341
Addresses and fixes #14180. This PR is quite large and incorporates many changes; some planned in advance, some found while investigating #14164, and some on the go as race detections were added.
To make sense of the changes, most commits in this PR can be considered in isolation:
InstanceKeytype; all probes are now identified by a tablet alias, and all access to tablets is done by tablt alias or tablet reference. Anyhostname:portaccess (or information) is gone.Open()&Close(). Things that don't need to operate when the throttler isClose()d now won't.golangatomictypes.Related Issue(s)
Fixes #14180
It also fixes issue #14178 for post-v18. However, a dedicated PR #14179 fixes #14178 for both post v18 as well as v18, v17, v16 backport.
Checklist