Skip to content
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
24 changes: 16 additions & 8 deletions pkg/webhook/health_check.go
Original file line number Diff line number Diff line change
Expand Up @@ -4,30 +4,38 @@ import (
"bytes"
"crypto/tls"
"fmt"
"io"
"net/http"
"path/filepath"

logf "sigs.k8s.io/controller-runtime/pkg/log"
)

// disabling gosec linting here as the http client used in this checking is intended to skip CA verification
//
//nolint:gosec
var tr = &http.Transport{
TLSClientConfig: &tls.Config{InsecureSkipVerify: true},
}
var insecureClient = &http.Client{Transport: tr}

var tlsCheckerLog = logf.Log.WithName("webhook-tls-checker")

func NewTLSChecker(certDir string, port int) func(*http.Request) error {
//nolint:forcetypeassert
tr := http.DefaultTransport.(*http.Transport).Clone()
// disabling gosec linting here as the http client used in this checking is intended to skip CA verification
//
//nolint:gosec
tr.TLSClientConfig = &tls.Config{InsecureSkipVerify: true}
// disable keep alives to ensure that http connection aren't reused, otherwise the check may
// fail if the cert was rotated in between
tr.DisableKeepAlives = true
Comment on lines +23 to +25
Copy link
Copy Markdown
Contributor

@acpana acpana Apr 10, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would keep alives here just cause some network flakiness? iow, if checks retry once the certs have been rotated, would this still be a problem?

Copy link
Copy Markdown
Contributor Author

@dethi dethi Apr 10, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it would be just flakiness. I didn't get the time to validate my theory, but here is the scenario that I'm thinking (with keepalive enabled):

  • First health check pass, with certificate A. Because we drained the body before closing it, the connection is kept open (that's the default behaviour of the Go HTTP client, as long as the server also allow keepalive)
  • Then, the certificate is renewed. Certificate A is replaced by certificate B on file, and the transport layer of the HTTP server now use certificate B for new connections.
  • A new health check is started. The previous connection is open, in the pool of connections. The Go HTTP client select it and make a new request. We get a response. When we check the response, we compare what we have on disk (certificate B) with the peer cert associated to the connection (certificate A). The check fail.
  • This repeat indefinitely until the connection is closed, which may never happen (or after X minutes when the maximum connection lifetime is exceeded), because we are always properly draining and closing the body, so the HTTP client should always keep the connection open.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That seems likely. IIRC certs are only used for negotiating a session key, so rotating a cert wouldn't necessarily break a pre-existing connection.

insecureClient := &http.Client{Transport: tr}

returnFunc := func(_ *http.Request) error {
resp, err := insecureClient.Get(fmt.Sprintf("https://127.0.0.1:%d", port))
if err != nil {
newErr := fmt.Errorf("unable to connect to server: %w", err)
tlsCheckerLog.Error(newErr, "error in connecting to webhook server with https")
return newErr
}
defer resp.Body.Close()
// explicitly discard the body to avoid any memory leak
_, _ = io.Copy(io.Discard, resp.Body)

if len(resp.TLS.PeerCertificates) == 0 {
newErr := fmt.Errorf("webhook does not serve TLS certificate")
tlsCheckerLog.Error(newErr, "error in connecting to webhook server with https")
Expand Down