Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
12 changes: 7 additions & 5 deletions pkg/operator/controller/crl/controller.go
Original file line number Diff line number Diff line change
Expand Up @@ -286,12 +286,14 @@ func (r *reconciler) Reconcile(ctx context.Context, request reconcile.Request) (

// TODO Consider letting ensureCRLConfigmap get the deployment and build
// the owner reference as we don't know yet whether we need it.
if _, _, err := r.ensureCRLConfigmap(ctx, ic, deployment.Namespace, ownerRef, haveCAConfigmap, clientCAConfigmap); err != nil {
if _, _, ctx, err := r.ensureCRLConfigmap(ctx, ic, deployment.Namespace, ownerRef, haveCAConfigmap, clientCAConfigmap); err != nil {
return reconcile.Result{}, fmt.Errorf("failed to ensure client CA CRL configmap for ingresscontroller %s: %w", request.NamespacedName, err)
} else {
if nextCRLUpdate, ok := ctx.Value("nextCRLUpdate").(time.Time); ok && !nextCRLUpdate.IsZero() {
log.Info("Requeueing when next CRL expires", "requeue time", nextCRLUpdate.String(), "time until requeue", time.Until(nextCRLUpdate))
//Re-reconcile when any of the CRLs expire
return reconcile.Result{RequeueAfter: time.Until(nextCRLUpdate)}, nil
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a risk of this requeueing multiple times for the same next CRL update? I.e. on every reconcile, do we queue up this request? Is that okay?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thoughts?

Copy link
Contributor

@gcs278 gcs278 Oct 20, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thoughts @rfredette?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry it took me a while to get back to you on this one. I think that's a valid concern. I'm not sure at first glance what's the best way to solve it, but let me look into this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gcs278 I've pushed a change that seems to fix this from my manual testing; please take a look when you get a chance and let me know what you think.

I added a global variable to track when the next CRL update is expected to be, and now the reconcile is only triggered when the next update as computed in desiredCRLConfigMap will be sooner than any already pending requeues.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a risk of this requeueing multiple times for the same next CRL update? I.e. on every reconcile, do we queue up this request? Is that okay?

Are you worried that the queue may never be empty? I think it is all right. Is there some other potential problem with the logic as it was when you asked about it?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you worried that the queue may never be empty?

I don't think that's what I meant. I just noticed we queue up a request on every reconcile for the same next CRL update. That's not a big problem, just seemed a bit extraneous. We might have 10 or 20 requests that will all trigger at the same time for the same (or immediately after each other) nextCRLUpdate, depending on how many times the CRL was reconciled.

I don't know if there is an easy solution to say "this request is already in the queue, don't queue again". Just pointing it out more than anything.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was able to do a bit more testing on this, and I got the operator to requeue a reconcile for the same time 10 times, but when the time came to do the reconcile, it seems to execute the reconcile twice then stop retrying (at least for a bit; when the new CRL expired 5 minutes later it still re-reconciled). I'm not sure why it's twice; maybe there's a rate limit implemented somewhere, but it doesn't look like this will flood the operator

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I think we're all right here. I did a little more investigation:

  • controller-runtime's reconcileHandler calls our Reconcile method (technically, it calls the controller's Reconcile method, but that ultimately results in ours' being called) and requeues the request using AddAfter if the result has nil error and non-zero RequeueAfter:
    result, err := c.Reconcile(ctx, req)
    switch {
    case err != nil:
    c.Queue.AddRateLimited(req)
    ctrlmetrics.ReconcileErrors.WithLabelValues(c.Name).Inc()
    ctrlmetrics.ReconcileTotal.WithLabelValues(c.Name, labelError).Inc()
    log.Error(err, "Reconciler error")
    case result.RequeueAfter > 0:
    // The result.RequeueAfter request will be lost, if it is returned
    // along with a non-nil error. But this is intended as
    // We need to drive to stable reconcile loops before queuing due
    // to result.RequestAfter
    c.Queue.Forget(obj)
    c.Queue.AddAfter(req, result.RequeueAfter)
  • AddAfter adds the item to the queue using Add or sends a waitFor to a channel:
    // AddAfter adds the given item to the work queue after the given delay
    func (q *delayingType) AddAfter(item interface{}, duration time.Duration) {
    // don't add if we're already shutting down
    if q.ShuttingDown() {
    return
    }
    q.metrics.retry()
    // immediately add things with no delay
    if duration <= 0 {
    q.Add(item)
    return
    }
    select {
    case <-q.stopCh:
    // unblock if ShutDown() is called
    case q.waitingForAddCh <- &waitFor{data: item, readyAt: q.clock.Now().Add(duration)}:
  • The queue has a goroutine that reads the waitFor off the channel and adds it to its internal waitingForQueue queue using insert or adds it to the queue using Add:
    case waitEntry := <-q.waitingForAddCh:
    if waitEntry.readyAt.After(q.clock.Now()) {
    insert(waitingForQueue, waitingEntryByData, waitEntry)
    } else {
    q.Add(waitEntry.data)
  • insert checks whether there is an existing entry for the reconcile request; if there is one, it updates the existing entry in the queue instead of pushing a duplicate entry onto the queue:
    // insert adds the entry to the priority queue, or updates the readyAt if it already exists in the queue
    func insert(q *waitForPriorityQueue, knownEntries map[t]*waitFor, entry *waitFor) {
    // if the entry already exists, update the time only if it would cause the item to be queued sooner
    existing, exists := knownEntries[entry.data]
    if exists {
    if existing.readyAt.After(entry.readyAt) {
    existing.readyAt = entry.readyAt
    heap.Fix(q, existing.index)
    }
    return
  • The queue itself defines Add to prevent adding duplicates:
    // Add marks item as needing processing.
    func (q *Type) Add(item interface{}) {
    q.cond.L.Lock()
    defer q.cond.L.Unlock()
    if q.shuttingDown {
    return
    }
    if q.dirty.has(item) {
    return
    }
    q.metrics.add(item)
    q.dirty.insert(item)
    if q.processing.has(item) {
    return
    }
    q.queue = append(q.queue, item)

Between Ryan's testing and my understanding of the queue and delaying_queue logic, I believe that there should be no issue with duplicate items in the queue or the delaying_queue's internal channel or queue.

}
}

// TODO Maybe parse the CRLs and set RequeueAfter based on the earliest
// expiration date.

return reconcile.Result{}, nil
}
57 changes: 34 additions & 23 deletions pkg/operator/controller/crl/crl_configmap.go
Original file line number Diff line number Diff line change
Expand Up @@ -38,10 +38,10 @@ var authorityKeyIdentifierOID = asn1.ObjectIdentifier{2, 5, 29, 35}
// specifies a client CA certificate bundle in which any certificates specify
// any CRL distribution points. Returns a Boolean indicating whether the
// configmap exists, the configmap if it does exist, and an error value.
func (r *reconciler) ensureCRLConfigmap(ctx context.Context, ic *operatorv1.IngressController, namespace string, ownerRef metav1.OwnerReference, haveClientCA bool, clientCAConfigmap *corev1.ConfigMap) (bool, *corev1.ConfigMap, error) {
func (r *reconciler) ensureCRLConfigmap(ctx context.Context, ic *operatorv1.IngressController, namespace string, ownerRef metav1.OwnerReference, haveClientCA bool, clientCAConfigmap *corev1.ConfigMap) (bool, *corev1.ConfigMap, context.Context, error) {
haveCM, current, err := r.currentCRLConfigMap(ctx, ic)
if err != nil {
return false, nil, err
return false, nil, ctx, err
}

var oldCRLs map[string]*pkix.CertificateList
Expand All @@ -60,45 +60,47 @@ func (r *reconciler) ensureCRLConfigmap(ctx context.Context, ic *operatorv1.Ingr
if haveClientCA {
clientCABundleFilename := "ca-bundle.pem"
if data, ok := clientCAConfigmap.Data[clientCABundleFilename]; !ok {
return haveCM, current, fmt.Errorf("client CA configmap %s/%s is missing %q", clientCAConfigmap.Namespace, clientCAConfigmap.Name, clientCABundleFilename)
return haveCM, current, ctx, fmt.Errorf("client CA configmap %s/%s is missing %q", clientCAConfigmap.Namespace, clientCAConfigmap.Name, clientCABundleFilename)
} else {
clientCAData = []byte(data)
}
}

wantCM, desired, err := desiredCRLConfigMap(ic, ownerRef, clientCAData, oldCRLs)
wantCM, desired, ctx, err := desiredCRLConfigMap(ctx, ic, ownerRef, clientCAData, oldCRLs)
if err != nil {
return false, nil, fmt.Errorf("failed to build configmap: %w", err)
return false, nil, ctx, fmt.Errorf("failed to build configmap: %w", err)
}

switch {
case !wantCM && !haveCM:
return false, nil, nil
return false, nil, ctx, nil
case !wantCM && haveCM:
if err := r.client.Delete(ctx, current); err != nil {
if !errors.IsNotFound(err) {
return true, current, fmt.Errorf("failed to delete configmap: %w", err)
return true, current, ctx, fmt.Errorf("failed to delete configmap: %w", err)
}
} else {
log.Info("deleted configmap", "namespace", current.Namespace, "name", current.Name)
}
return false, nil, nil
return false, nil, ctx, nil
case wantCM && !haveCM:
if err := r.client.Create(ctx, desired); err != nil {
return false, nil, fmt.Errorf("failed to create configmap: %w", err)
return false, nil, ctx, fmt.Errorf("failed to create configmap: %w", err)
}
log.Info("created configmap", "namespace", desired.Namespace, "name", desired.Name)
return r.currentCRLConfigMap(ctx, ic)
exists, current, err := r.currentCRLConfigMap(ctx, ic)
return exists, current, ctx, err
case wantCM && haveCM:
if updated, err := r.updateCRLConfigMap(ctx, current, desired); err != nil {
return true, current, fmt.Errorf("failed to update configmap: %w", err)
return true, current, ctx, fmt.Errorf("failed to update configmap: %w", err)
} else if updated {
log.Info("updated configmap", "namespace", desired.Namespace, "name", desired.Name)
return r.currentCRLConfigMap(ctx, ic)
exists, current, err := r.currentCRLConfigMap(ctx, ic)
return exists, current, ctx, err
}
}

return true, current, nil
return true, current, ctx, nil
}

// buildCRLMap builds a map of key identifier to certificate list using the
Expand Down Expand Up @@ -130,18 +132,20 @@ func buildCRLMap(crlData []byte) (map[string]*pkix.CertificateList, error) {
}

// desiredCRLConfigMap returns the desired CRL configmap. Returns a Boolean
// indicating whether a configmap is desired, as well as the configmap if one is
// desired.
func desiredCRLConfigMap(ic *operatorv1.IngressController, ownerRef metav1.OwnerReference, clientCAData []byte, crls map[string]*pkix.CertificateList) (bool, *corev1.ConfigMap, error) {
// indicating whether a configmap is desired, the configmap if one is desired,
// the context (containing the next CRL update time as "nextCRLUpdate"), and an
// error if one occurred
func desiredCRLConfigMap(ctx context.Context, ic *operatorv1.IngressController, ownerRef metav1.OwnerReference, clientCAData []byte, crls map[string]*pkix.CertificateList) (bool, *corev1.ConfigMap, context.Context, error) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I noticed we don't have unit testing for this function. Should we have a unit test or E2E test to verify this CRL update logic? Can we have static CA's in our test code that can trigger some of logical paths?

As far as expiration, I saw in CoreDNS they pass in now as an argument, that way in unit testing they can change now (aka teleport) to the future, to test logical code paths for expiration. This is probably not trivial though...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I saw in CoreDNS they pass in now as an argument, that way in unit testing they can change now (aka teleport) to the future, to test logical code paths for expiration.

I haven't included tests because I haven't had a good idea on how to automate testing it, since the certificates are time sensitive. My plan is to follow up later with an e2e test that generates certificates and CRLs at test run time, but this is a great idea for unit testing.

if len(ic.Spec.ClientTLS.ClientCertificatePolicy) == 0 || len(ic.Spec.ClientTLS.ClientCA.Name) == 0 {
return false, nil, nil
return false, nil, ctx, nil
}

if crls == nil {
crls = make(map[string]*pkix.CertificateList)
}

var subjectKeyIds []string
var nextCRLUpdate time.Time
now := time.Now()
for len(clientCAData) > 0 {
block, data := pem.Decode(clientCAData)
Expand All @@ -151,7 +155,7 @@ func desiredCRLConfigMap(ic *operatorv1.IngressController, ownerRef metav1.Owner
clientCAData = data
cert, err := x509.ParseCertificate(block.Bytes)
if err != nil {
return false, nil, fmt.Errorf("client CA configmap has an invalid certificate: %w", err)
return false, nil, ctx, fmt.Errorf("client CA configmap has an invalid certificate: %w", err)
}
subjectKeyId := hex.EncodeToString(cert.SubjectKeyId)
if len(cert.CRLDistributionPoints) == 0 {
Expand All @@ -162,6 +166,9 @@ func desiredCRLConfigMap(ic *operatorv1.IngressController, ownerRef metav1.Owner
log.Info("certificate revocation list has expired", "subject key identifier", subjectKeyId)
} else {
subjectKeyIds = append(subjectKeyIds, subjectKeyId)
if (nextCRLUpdate.IsZero() || crl.TBSCertList.NextUpdate.Before(nextCRLUpdate)) && crl.TBSCertList.NextUpdate.After(now) {
nextCRLUpdate = crl.TBSCertList.NextUpdate
}
continue
}
}
Expand All @@ -170,29 +177,33 @@ func desiredCRLConfigMap(ic *operatorv1.IngressController, ownerRef metav1.Owner
// Creating or updating the configmap with incomplete
// data would compromise security by potentially
// permitting revoked certificates.
return false, nil, fmt.Errorf("failed to get certificate revocation list for certificate key %s: %w", subjectKeyId, err)
return false, nil, ctx, fmt.Errorf("failed to get certificate revocation list for certificate key %s: %w", subjectKeyId, err)
} else {
crls[subjectKeyId] = crl
subjectKeyIds = append(subjectKeyIds, subjectKeyId)
log.Info("new certificate revocation list", "subject key identifier", subjectKeyId, "next update", crl.TBSCertList.NextUpdate.String())
if (nextCRLUpdate.IsZero() || crl.TBSCertList.NextUpdate.Before(nextCRLUpdate)) && crl.TBSCertList.NextUpdate.After(now) {
nextCRLUpdate = crl.TBSCertList.NextUpdate
}
}
}

if len(subjectKeyIds) == 0 {
return false, nil, nil
return false, nil, ctx, nil
}

buf := &bytes.Buffer{}
for _, subjectKeyId := range subjectKeyIds {
asn1Data, err := asn1.Marshal(*crls[subjectKeyId])
if err != nil {
return false, nil, fmt.Errorf("failed to encode ASN.1 for CRL for certificate key %s: %w", subjectKeyId, err)
return false, nil, ctx, fmt.Errorf("failed to encode ASN.1 for CRL for certificate key %s: %w", subjectKeyId, err)
}
block := &pem.Block{
Type: "X509 CRL",
Bytes: asn1Data,
}
if err := pem.Encode(buf, block); err != nil {
return false, nil, fmt.Errorf("failed to encode PEM for CRL for certificate key %s: %w", subjectKeyId, err)
return false, nil, ctx, fmt.Errorf("failed to encode PEM for CRL for certificate key %s: %w", subjectKeyId, err)
}
}
crlData := buf.String()
Expand All @@ -209,7 +220,7 @@ func desiredCRLConfigMap(ic *operatorv1.IngressController, ownerRef metav1.Owner
}
crlConfigmap.SetOwnerReferences([]metav1.OwnerReference{ownerRef})

return true, &crlConfigmap, nil
return true, &crlConfigmap, context.WithValue(ctx, "nextCRLUpdate", nextCRLUpdate), nil
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've never seen context used like this, but that could just because we've never needed to. What's the reason behind using context vs. returning a nextCRLUpdatevariable? Is it because it's time/deadline related? Just curious, I don't have an opinion.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I went back and forth for a while on how to return nextCRLUpdate because it's kind of tangental to the main purpose of desiredCRLConfigMap, and adding that kind of extra return value for callers to manage feels like a bad design choice.

Ideally, nextCRLUpdate would be calculated in its own function to keep the code more clear, but doing that would require all the CRLs to be parsed a second time, and I understand that in certain scenarios, users are running up against the max size of a configmap (1MiB) for the CRL configmap, so parsing again could be pretty wasteful.

I think this is a reasonable compromise, where nextCRLUpdate is available for Reconcile to deal with it, but intermediate functions don't need to worry about it other than to pass the context back up.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, I can buy that, sounds like you just want to decouple nextCRLUpdate from the output of desiredCRLConfigMap.

}

// getCRL gets a certificate revocation list using the provided distribution
Expand Down