-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add jitter to periodic cert rotation and fix spurious writes #6263
Conversation
09c9d7a
to
afe9110
Compare
42176fe
to
0607360
Compare
I'm not 100% satisfied with the changes to |
} | ||
mu.Lock() | ||
defer mu.Unlock() | ||
return (6 * d / 7) + time.Duration(rng.Int63n(int64(d))/7) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do these constants represent?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In terms of what the math is doing, its randomizing up to one seventh of the supplied duration (i.e. creating a jitter on the range of [6n/7,n)
).
In terms of why I chose that specific constant, its essentially arbitrary. We needed a smaller jitter than 1/2
(what the default jitter uses) for periodic operations since a jitter of 1/2
could worst-case double the amount of work. I went with a 1/7
jitter because most of our periodic operations happen on a 10 minute interval, and a 1/7
jitter will worst-case cause 1 additional operation per hour when dealing with 10-minute intervals. This is itself a pretty arbitrary goal, but it seemed reasonable. It means that if instances did sync up coincidentally, you'd expect them to be reasonably "desynced" within an hour. They won't be fully desynced, since repeated jitters tend to scatter on a normal distribution, but desynced enough to have a standard deviation of 1 minute.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you push choosing these fractions up to the caller?
Something like func NewJitter(frac float64) Jitter // returns values between [(1-frac) * n : n)
, which can then be caller like NewJitter(0.5)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tried a few different iterations of having caller supply fraction manually. Everything I came up with felt a bit too brittle. Instead, I've renamed the DefaultJitter
to HalfJitter
and NewSmallJitter
to NewSeventhJitter
so that the caller now explicitly commits to the desired fraction by selecting the appropriate function. Ex:
interval.New(interval.Config{
Duration: defaults.HighResPollingPeriod,
FirstDuration: utils.HalfJitter(defaults.HighResPollingPeriod),
Jitter: utils.NewSeventhJitter(),
})
68e30fd
to
b7ac8ab
Compare
lib/utils/interval.go
Outdated
// interval durations (equivalent to time.NewTicker). | ||
func NewInterval(cfg IntervalConfig) *Interval { | ||
if cfg.Duration <= 0 { | ||
panic(errors.New("non-positive interval for NewInterval")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure we want to panic in this case, why not just return an error as usual. Or at least rename function to MustNewInterval
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is deliberate. The Interval
is meant to have the same contract as time.NewTicker
, which has this condition:
if d <= 0 {
panic(errors.New("non-positive interval for NewTicker"))
}
By keeping the panic conditions identical, we ensure that tickers and intervals can be swapped back and forth without changing validation requirements. If an invalid duration returned an error, then switching from an interval to a ticker would silently introduce a new panic condition.
lib/utils/interval.go
Outdated
// per-tick jitter. This will ensure that operations started at similar times will | ||
// have varying initial interval states, while minimizing the amount of extra work | ||
// introduced by the per-tick jitter. | ||
type Interval struct { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we need exponential backoff here too?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code that needs backoff should use utils.Retry
for that (we already use a combination of time.Ticker
and utils.Retry
in various parts of the codebase where we have periodics have a backoff component). I'd prefer to keep time.Ticker
and interval.Interval
interchangeable as much as possible so that its easy to swap back to time.Ticker
if the extra features of the interval are no longer needed.
lib/utils/interval.go
Outdated
} | ||
|
||
func (i *Interval) run(timer *time.Timer) { | ||
defer timer.Stop() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
defer timer.Stop() | |
defer timer.Stop() | |
defer close(i.ch) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not closing the output channel is deliberate and consistent with the behavior of time.Ticker
. From the docs of time.Ticker
:
Stop does not close the channel, to prevent a concurrent goroutine reading from the channel from seeing an erroneous "tick".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Gotcha, can you leave a comment about that in Interval.Stop
too?
} | ||
mu.Lock() | ||
defer mu.Unlock() | ||
return (6 * d / 7) + time.Duration(rng.Int63n(int64(d))/7) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you push choosing these fractions up to the caller?
Something like func NewJitter(frac float64) Jitter // returns values between [(1-frac) * n : n)
, which can then be caller like NewJitter(0.5)
b7ac8ab
to
599d044
Compare
* Eliminates spurious leaf cluster CA writes. * Adds jitters to various periodic operations.
599d044
to
b9dadad
Compare
Adds randomized interval helper, adds jitter to certificate rotation periodics, and fixes an issue where spurious write events were being generated due to unnecessary calls to
UpsertCertAuthority
.