Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a periodic test of the autoseal to detect loss of connectivity. #13078

Merged
merged 17 commits into from
Nov 10, 2021
Merged
Show file tree
Hide file tree
Changes from 5 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions changelog/13078.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
```release-note:improvement
core: Periodically test the health of connectivity to auto-seal backends
```
8 changes: 8 additions & 0 deletions vault/core.go
Original file line number Diff line number Diff line change
Expand Up @@ -2140,6 +2140,10 @@ func (c *Core) postUnseal(ctx context.Context, ctxCancelFunc context.CancelFunc,
if err := seal.UpgradeKeys(c.activeContext); err != nil {
c.logger.Warn("post-unseal upgrade seal keys failed", "error", err)
}

// Start a periodic but infrequent heartbeat to detect auto-seal backend outages at runtime rather than being
// surprised by this at the next need to unseal.
seal.StartHealthCheck()
}

c.metricsCh = make(chan struct{})
Expand Down Expand Up @@ -2226,6 +2230,10 @@ func (c *Core) preSeal() error {
c.autoRotateCancel = nil
}

if seal, ok := c.seal.(*autoSeal); ok {
seal.StopHealthCheck()
}

preSealPhysical(c)

c.logger.Info("pre-seal teardown complete")
Expand Down
64 changes: 60 additions & 4 deletions vault/seal_autoseal.go
Original file line number Diff line number Diff line change
@@ -1,11 +1,14 @@
package vault

import (
"bytes"
"context"
"crypto/subtle"
"encoding/json"
"fmt"
mathrand "math/rand"
"sync/atomic"
"time"

proto "github.com/golang/protobuf/proto"
log "github.com/hashicorp/go-hclog"
Expand All @@ -18,16 +21,20 @@ import (
// applicable in the OSS side
var barrierTypeUpgradeCheck = func(_ string, _ *SealConfig) {}

const sealHeathTestInterval = 1 * time.Minute
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Didn't we want to check this hourly?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I'm back and forth on that. Since it's a cheap encrypt/decrypt I coded it to be more frequent. Open to thoughts.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm assuming this is being done on all auto-unseal implementations, not just pkcs11 implementations. If that is the case could the costs associated with using the various KMS providers within the cloud start adding up? I don't have a good sense of those costs across the various implementations.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sealHeathTestInterval - should this be sealHealthTestInterval?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It sure should.


// autoSeal is a Seal implementation that contains logic for encrypting and
// decrypting stored keys via an underlying AutoSealAccess implementation, as
// well as logic related to recovery keys and barrier config.
type autoSeal struct {
*seal.Access

barrierConfig atomic.Value
recoveryConfig atomic.Value
core *Core
logger log.Logger
barrierConfig atomic.Value
recoveryConfig atomic.Value
core *Core
logger log.Logger
healthCheck *time.Ticker
healthCheckStop chan struct{}
}

// Ensure we are implementing the Seal interface
Expand Down Expand Up @@ -499,3 +506,52 @@ func (d *autoSeal) migrateRecoveryConfig(ctx context.Context) error {

return nil
}

func (d *autoSeal) StartHealthCheck() {
d.healthCheck = time.NewTicker(sealHeathTestInterval)
d.healthCheckStop = make(chan struct{})
go func() {
lastTestOk := true
lastSeenOk := time.Now()
for {
select {
case <-d.healthCheckStop:
d.healthCheck.Stop()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm probably being paranoid here but if we get multiple healthCheckStop we will get a nil deference error. Is it worth adding a quick test and return if d.healthCheck is nil?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nvm the case statement will never fire.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes it is, thanks.

close(d.healthCheckStop)
d.healthCheck = nil
d.healthCheckStop = nil
return
case t := <-d.healthCheck.C:
testVal := fmt.Sprintf("Heartbeat %d", mathrand.Intn(1000))
ciphertext, err := d.Wrapper.Encrypt(d.core.activeContext, []byte(testVal), nil)
if err != nil {
lastTestOk = false
d.logger.Warn("failed to encrypt seal health test value, seal backend may be unreachable", "error", err)
} else {
plaintext, err := d.Wrapper.Decrypt(d.core.activeContext, ciphertext, nil)
if err != nil {
lastTestOk = false
d.logger.Warn("failed to decrypt seal health test value, seal backend may be unreachable", "error", err)
}
if !bytes.Equal([]byte(testVal), plaintext) {
lastTestOk = false
d.logger.Warn("seal health test value failed to decrypt to expected value")
} else {
d.logger.Debug("seal health test passed")
if !lastTestOk {
d.logger.Info("seal backend is now healthy again", "downtime", t.Sub(lastSeenOk).String())
}
lastTestOk = true
lastSeenOk = t
}
}
}
}
}()
}

func (d *autoSeal) StopHealthCheck() {
if d.healthCheckStop != nil {
d.healthCheckStop <- struct{}{}
}
}