Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add jitter to periodic cert rotation and fix spurious writes #6263

Merged
merged 1 commit into from
Apr 7, 2021

Conversation

fspmarshall
Copy link
Contributor

@fspmarshall fspmarshall commented Mar 31, 2021

Adds randomized interval helper, adds jitter to certificate rotation periodics, and fixes an issue where spurious write events were being generated due to unnecessary calls to UpsertCertAuthority.

@fspmarshall fspmarshall force-pushed the fspmarshall/jitter-periodic-rotation branch from 09c9d7a to afe9110 Compare March 31, 2021 21:45
lib/utils/retry.go Outdated Show resolved Hide resolved
lib/utils/retry.go Outdated Show resolved Hide resolved
@fspmarshall fspmarshall force-pushed the fspmarshall/jitter-periodic-rotation branch from 42176fe to 0607360 Compare April 1, 2021 21:11
@fspmarshall fspmarshall changed the title add jitter to periodic cert rotation add jitter to periodic cert rotation and fix spurious writes Apr 1, 2021
@fspmarshall
Copy link
Contributor Author

I'm not 100% satisfied with the changes to Upsert behavior, but I think its an overall improvement compared to what it was before. In the long run, I think this could be improved by having the auth server perform stringent checks under optimistic lock to ensure that the new CA state is one that should come after the current CA state. I'm not certain what the best plan of attack for this is yet (resource ID isn't meaningful across cluster boundaries, so the usual method doesn't work here).

lib/reversetunnel/remotesite.go Show resolved Hide resolved
lib/utils/interval.go Outdated Show resolved Hide resolved
}
mu.Lock()
defer mu.Unlock()
return (6 * d / 7) + time.Duration(rng.Int63n(int64(d))/7)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do these constants represent?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In terms of what the math is doing, its randomizing up to one seventh of the supplied duration (i.e. creating a jitter on the range of [6n/7,n)).

In terms of why I chose that specific constant, its essentially arbitrary. We needed a smaller jitter than 1/2 (what the default jitter uses) for periodic operations since a jitter of 1/2 could worst-case double the amount of work. I went with a 1/7 jitter because most of our periodic operations happen on a 10 minute interval, and a 1/7 jitter will worst-case cause 1 additional operation per hour when dealing with 10-minute intervals. This is itself a pretty arbitrary goal, but it seemed reasonable. It means that if instances did sync up coincidentally, you'd expect them to be reasonably "desynced" within an hour. They won't be fully desynced, since repeated jitters tend to scatter on a normal distribution, but desynced enough to have a standard deviation of 1 minute.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you push choosing these fractions up to the caller?
Something like func NewJitter(frac float64) Jitter // returns values between [(1-frac) * n : n), which can then be caller like NewJitter(0.5)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tried a few different iterations of having caller supply fraction manually. Everything I came up with felt a bit too brittle. Instead, I've renamed the DefaultJitter to HalfJitter and NewSmallJitter to NewSeventhJitter so that the caller now explicitly commits to the desired fraction by selecting the appropriate function. Ex:

interval.New(interval.Config{
    Duration:      defaults.HighResPollingPeriod,
    FirstDuration: utils.HalfJitter(defaults.HighResPollingPeriod),
    Jitter:        utils.NewSeventhJitter(),
})

@awly awly self-requested a review April 2, 2021 20:50
@fspmarshall fspmarshall force-pushed the fspmarshall/jitter-periodic-rotation branch 3 times, most recently from 68e30fd to b7ac8ab Compare April 5, 2021 16:17
// interval durations (equivalent to time.NewTicker).
func NewInterval(cfg IntervalConfig) *Interval {
if cfg.Duration <= 0 {
panic(errors.New("non-positive interval for NewInterval"))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure we want to panic in this case, why not just return an error as usual. Or at least rename function to MustNewInterval.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is deliberate. The Interval is meant to have the same contract as time.NewTicker, which has this condition:

if d <= 0 {
	panic(errors.New("non-positive interval for NewTicker"))
}

By keeping the panic conditions identical, we ensure that tickers and intervals can be swapped back and forth without changing validation requirements. If an invalid duration returned an error, then switching from an interval to a ticker would silently introduce a new panic condition.

lib/services/authority.go Outdated Show resolved Hide resolved
lib/utils/interval.go Outdated Show resolved Hide resolved
lib/utils/interval.go Outdated Show resolved Hide resolved
// per-tick jitter. This will ensure that operations started at similar times will
// have varying initial interval states, while minimizing the amount of extra work
// introduced by the per-tick jitter.
type Interval struct {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need exponential backoff here too?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code that needs backoff should use utils.Retry for that (we already use a combination of time.Ticker and utils.Retry in various parts of the codebase where we have periodics have a backoff component). I'd prefer to keep time.Ticker and interval.Interval interchangeable as much as possible so that its easy to swap back to time.Ticker if the extra features of the interval are no longer needed.

}

func (i *Interval) run(timer *time.Timer) {
defer timer.Stop()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
defer timer.Stop()
defer timer.Stop()
defer close(i.ch)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not closing the output channel is deliberate and consistent with the behavior of time.Ticker. From the docs of time.Ticker:

Stop does not close the channel, to prevent a concurrent goroutine reading from the channel from seeing an erroneous "tick".

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gotcha, can you leave a comment about that in Interval.Stop too?

}
mu.Lock()
defer mu.Unlock()
return (6 * d / 7) + time.Duration(rng.Int63n(int64(d))/7)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you push choosing these fractions up to the caller?
Something like func NewJitter(frac float64) Jitter // returns values between [(1-frac) * n : n), which can then be caller like NewJitter(0.5)

@fspmarshall fspmarshall force-pushed the fspmarshall/jitter-periodic-rotation branch from b7ac8ab to 599d044 Compare April 7, 2021 21:03
@fspmarshall fspmarshall requested a review from awly April 7, 2021 21:19
* Eliminates spurious leaf cluster CA writes.
* Adds jitters to various periodic operations.
@fspmarshall fspmarshall force-pushed the fspmarshall/jitter-periodic-rotation branch from 599d044 to b9dadad Compare April 7, 2021 21:48
@fspmarshall fspmarshall merged commit e118629 into master Apr 7, 2021
@fspmarshall fspmarshall deleted the fspmarshall/jitter-periodic-rotation branch April 7, 2021 22:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants