Skip to content

OCPBUGS-6661: Handle mTLS CRLs in the router#891

Closed
rfredette wants to merge 2 commits intoopenshift:masterfrom
rfredette:ocpbugs-6661-crl-limit
Closed

OCPBUGS-6661: Handle mTLS CRLs in the router#891
rfredette wants to merge 2 commits intoopenshift:masterfrom
rfredette:ocpbugs-6661-crl-limit

Conversation

@rfredette
Copy link
Contributor

CRL lifecycle management is being moved to the router pod to avoid hitting the configmap max size (1MB).

This relies on openshift/router#447 to function.

@openshift-ci openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Feb 15, 2023
@openshift-ci openshift-ci bot requested review from candita and knobunc February 15, 2023 18:43
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Feb 15, 2023

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please ask for approval from rfredette. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@rfredette rfredette changed the title WIP: Handle mTLS CRLs in the router WIP: OCPBUGS-6661: Handle mTLS CRLs in the router Feb 17, 2023
@openshift-ci-robot openshift-ci-robot added jira/severity-important Referenced Jira bug's severity is important for the branch this PR is targeting. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. labels Feb 17, 2023
@openshift-ci-robot
Copy link
Contributor

@rfredette: This pull request references Jira Issue OCPBUGS-6661, which is invalid:

  • expected the bug to target the "4.13.0" version, but no target version was set

Comment /jira refresh to re-evaluate validity if changes to the Jira bug are made, or edit the title of this pull request to link to a different bug.

The bug has been updated to refer to the pull request using the external bug tracker.

Details

In response to this:

CRL lifecycle management is being moved to the router pod to avoid hitting the configmap max size (1MB).

This relies on openshift/router#447 to function.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-ci-robot openshift-ci-robot added the jira/invalid-bug Indicates that a referenced Jira bug is invalid for the branch this PR is targeting. label Feb 17, 2023
@rfredette rfredette force-pushed the ocpbugs-6661-crl-limit branch from cbb9efe to 5d33116 Compare February 27, 2023 19:34
Copy link
Contributor

@gcs278 gcs278 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No major issues from me.

@openshift-merge-robot openshift-merge-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Mar 27, 2023
@rfredette rfredette force-pushed the ocpbugs-6661-crl-limit branch from 37f4516 to 6d756cf Compare March 27, 2023 19:14
@openshift-merge-robot openshift-merge-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Mar 27, 2023
@rfredette
Copy link
Contributor Author

I've pushed a temporary commit that makes the router image the latest commit in openshift/router#447 so that the new tests work. Once that router PR merges, I will revert the router image change.

/hold

@openshift-ci openshift-ci bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Apr 3, 2023
@rfredette rfredette force-pushed the ocpbugs-6661-crl-limit branch from e0bf8e9 to 7242c1c Compare April 3, 2023 21:02
@rfredette rfredette changed the title WIP: OCPBUGS-6661: Handle mTLS CRLs in the router OCPBUGS-6661: Handle mTLS CRLs in the router Apr 3, 2023
@openshift-ci openshift-ci bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Apr 3, 2023
@rfredette rfredette force-pushed the ocpbugs-6661-crl-limit branch from 7242c1c to c5fb8b4 Compare April 4, 2023 21:27
Copy link
Contributor

@gcs278 gcs278 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

still reviewing, but adding a couple of comments

// service. Returns the HTTP status code returned from curl, and an error either if there is an HTTP error, or if
// there's another error in running the command. If the error was not an HTTP error, the HTTP status code returned will
// be -1.
func curl(t *testing.T, clientPod *corev1.Pod, certName, endpoint, ingressControllerIP string, verbose bool) (int64, error) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit should the name of this function reflect that it's getting a status code? i.e. curlHttpStatus or getHttpStatusCode?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ehh sorry I see you didn't write this, but adapted it. Nevermind this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's still reasonable to try to convey what's being returned. I'm not sure I love either of those names, but I'll see if I can come up with something that will work

{
// This test case has CA certificates including a CRL distribution point (CDP) for the CRL that they
// generate and sign. This is the default way to distribute CRLs according to RFC-5280
Name: "subject-crl",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[from meeting] - better name.

i'm not so sure I understand why "subject" is in the name here, but I could see something like: crls-included-in-ca-certificates.

// This test case has certificates including the CRL distribution point of their signer (i.e. intermediate
// CA is signed by root CA, and includes the URL for root's CRL). In this case, neither of the certificates
// in the CA bundle include the intermediate CRL, so connections that rely on it will be rejected.
Name: "signer-crl",
Copy link
Contributor

@gcs278 gcs278 Apr 6, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand the word signer better than subject, but could be more explicit crls-included-in-signer-certs or something.

},
},
{
// This test case has certificates including the CRL distribution point of their signer. In this case, a
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it worth referencing the bug in this comment? For additional supporting info.

t.Fatalf("Failed to create client certificate: %v", err)
}

_, rootCRLPem, err := CreateCRL(nil, rootCA, time.Now(), time.Now().Add(10*time.Minute), RevokeCertificates(time.Now(), revokedByRoot))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Anything significant about this 10 min next update value? Is there any chance that these tests could take 10 mins and something would break? Should it be like an hour to be safe? Does it matter?

// TLS/SSL verification failures result in a 0 http status code (no connection is made to the backend, so no http status code is returned).
continue
}
t.Errorf("Unexpected error from curl for cert %q: %v", certName, err)
Copy link
Contributor

@gcs278 gcs278 Apr 6, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be beneficial to add logging of the httpStatusCode value for future debugging?

t.Fatalf("timed out waiting for pod %q to become ready: %v", clientPodName, err)
}

// We need an IP address to which to send requests. The test client
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit I struggled a bit on this comment. But I realized it's because the curl function doesn't directly use the DNS name. Is there a better way we can describe this?

Suggested change
// We need an IP address to which to send requests. The test client
// We need an IP address to which to send requests (in lieu of using a DNS domain name). The test client

if err := waitForIngressControllerCondition(t, kclient, 5*time.Minute, icName, availableConditionsForPrivateIngressController...); err != nil {
t.Fatalf("failed to observe expected conditions: %v", err)
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit it's a nit pick, but probably worth a block comment here describing clientCerts are going to do, I tend to try to read comments first when deciphering code, feels more effcient:

Suggested change
// Create client certificate bundle as a configmap which we will mount into our test client pod

Copy link
Contributor

@gcs278 gcs278 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally, I agree with these changes. Most comments are documentation related, but I'm being a bit more nit-picky than usual, since this is a complex E2E.

@rfredette rfredette force-pushed the ocpbugs-6661-crl-limit branch from ad3b9b1 to 8eb9601 Compare April 20, 2023 21:27
@rfredette
Copy link
Contributor Author

/jira refresh

@openshift-ci-robot openshift-ci-robot added jira/valid-bug Indicates that a referenced Jira bug is valid for the branch this PR is targeting. bugzilla/valid-bug Indicates that a referenced Bugzilla bug is valid for the branch this PR is targeting. and removed jira/invalid-bug Indicates that a referenced Jira bug is invalid for the branch this PR is targeting. labels Apr 20, 2023
@openshift-ci-robot
Copy link
Contributor

@rfredette: This pull request references Jira Issue OCPBUGS-6661, which is valid.

3 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target version (4.14.0) matches configured target version for branch (4.14.0)
  • bug is in the state POST, which is one of the valid states (NEW, ASSIGNED, POST)

Requesting review from QA contact:
/cc @lihongan

The bug has been updated to refer to the pull request using the external bug tracker.

Details

In response to this:

/jira refresh

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-ci openshift-ci bot requested a review from lihongan April 20, 2023 21:34
@rfredette rfredette force-pushed the ocpbugs-6661-crl-limit branch from 8eb9601 to e28f320 Compare May 8, 2023 17:30
Leave a stub of the CRL controller to clean up any existing configmaps.
The stub controller will need to be removed in a future release

Use cluster-wide proxy for CRL downloads when available

Add a test with several test cases to test CRL management
@rfredette rfredette force-pushed the ocpbugs-6661-crl-limit branch from e28f320 to 8bc1171 Compare May 10, 2023 17:04
@rfredette rfredette force-pushed the ocpbugs-6661-crl-limit branch from 8bc1171 to 46b92e6 Compare May 10, 2023 17:04
@rfredette
Copy link
Contributor Author

/retest

@rfredette
Copy link
Contributor Author

Test failures seem unrelated; Even the operator suite failure had TestMTLSWithCRLs pass.

/retest

@rfredette
Copy link
Contributor Author

more unrelated failures.

/retest

@openshift-ci
Copy link
Contributor

openshift-ci bot commented May 11, 2023

@rfredette: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/e2e-azure-ovn 46b92e6 link false /test e2e-azure-ovn
ci/prow/e2e-aws-ovn 46b92e6 link true /test e2e-aws-ovn
ci/prow/e2e-aws-ovn-single-node 46b92e6 link false /test e2e-aws-ovn-single-node
ci/prow/e2e-gcp-ovn 46b92e6 link false /test e2e-gcp-ovn

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

return false, nil, ctx, fmt.Errorf("failed to build configmap: %w", err)
}
// The CRL management code has been moved into the router, so the CRL configmap is no longer necessary.
// TODO: Remove this whole controller after 4.13
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// TODO: Remove this whole controller after 4.13
// TODO: Remove this whole controller after 4.14.

@rfredette
Copy link
Contributor Author

Closing this PR in favor of the combined fix in #930

@rfredette rfredette closed this May 17, 2023
@openshift-ci-robot
Copy link
Contributor

@rfredette: This pull request references Jira Issue OCPBUGS-6661. The bug has been updated to no longer refer to the pull request using the external bug tracker.

Details

In response to this:

CRL lifecycle management is being moved to the router pod to avoid hitting the configmap max size (1MB).

This relies on openshift/router#447 to function.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bugzilla/valid-bug Indicates that a referenced Bugzilla bug is valid for the branch this PR is targeting. do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. jira/severity-important Referenced Jira bug's severity is important for the branch this PR is targeting. jira/valid-bug Indicates that a referenced Jira bug is valid for the branch this PR is targeting. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants