Skip to content

Conversation

@gcs278
Copy link
Contributor

@gcs278 gcs278 commented Jan 11, 2024

This PR adds support for setting a new status condition type, UnservableInFutureVersions, when a SHA1 cert is detected on a route. It accomplishes this by adding a new plugin, the upgrade validation plugin. This plugin is simple: when a route configuration that is unsupported in the future is detected, it adds a UnservableInFutureVersions status condition to the route. When the problematic configuration is no longer detected, it clears/removes the UnservableInFutureVersions condition.

In order to support updating a new route status condition, the routeRejectionRecorder and associated methods were refactored to be more generic. The writerlease logic used to update route status had to be refactored because the old code made assumptions that only one type of status condition would ever be updated.

It also updates certificates for existing unit tests, as some certificates were SHA1 and incorrectly labeled as expired.

This PR is intended to be introduced in 4.16, but to be backported to 4.15 as UnservableInFutureVersions only makes sense in 4.15 because 4.16 will reject SHA1 routes. In PR #552, there will be a partial revert, as it will no longer add UnservableInFutureVersions to SHA1 routes due to them now being rejection.

Dependent on openshift/api#1722
Related Ingress Operator PR openshift/cluster-ingress-operator#1014

@openshift-ci-robot openshift-ci-robot added the jira/severity-moderate Referenced Jira bug's severity is moderate for the branch this PR is targeting. label Jan 11, 2024
@openshift-ci openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jan 11, 2024
@openshift-ci-robot openshift-ci-robot added jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. jira/invalid-bug Indicates that a referenced Jira bug is invalid for the branch this PR is targeting. labels Jan 11, 2024
@openshift-ci-robot
Copy link
Contributor

@gcs278: This pull request references Jira Issue OCPBUGS-26498, which is invalid:

  • expected the bug to target the "4.16.0" version, but no target version was set

Comment /jira refresh to re-evaluate validity if changes to the Jira bug are made, or edit the title of this pull request to link to a different bug.

The bug has been updated to refer to the pull request using the external bug tracker.

Details

In response to this:

If a route has a SHA1 certificate, set a new route ingress condition status type of Upgradeable to false.

Dependent on openshift/api#1722

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci openshift-ci bot requested review from Miciah and rfredette January 11, 2024 00:01
@gcs278 gcs278 force-pushed the OCPBUGS-26498-upgradeable-status branch 3 times, most recently from c6a9ad2 to 5eb0acb Compare January 18, 2024 20:27
@gcs278 gcs278 force-pushed the OCPBUGS-26498-upgradeable-status branch 12 times, most recently from b88197d to a378d74 Compare January 30, 2024 16:53
@gcs278 gcs278 changed the title [WIP] OCPBUGS-26498: Route Upgradeable status for SHA1 [WIP] OCPBUGS-26498: Route Deprecated plugin for SHA1 certs Jan 30, 2024
@openshift-ci-robot openshift-ci-robot added jira/valid-bug Indicates that a referenced Jira bug is valid for the branch this PR is targeting. and removed jira/invalid-bug Indicates that a referenced Jira bug is invalid for the branch this PR is targeting. labels Jan 30, 2024
@openshift-ci-robot
Copy link
Contributor

@gcs278: This pull request references Jira Issue OCPBUGS-26498, which is valid.

3 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target version (4.16.0) matches configured target version for branch (4.16.0)
  • bug is in the state POST, which is one of the valid states (NEW, ASSIGNED, POST)

Requesting review from QA contact:
/cc @ShudiLi

The bug has been updated to refer to the pull request using the external bug tracker.

Details

In response to this:

This PR adds support for setting a new status condition type, Deprecated, when a SHA1 cert is detected on a route. It accomplishes this by adding a new plugin, the deprecation validation plugin. This plugin is simple: when a specific deprecated configuration on a route is detected, it adds a Deprecated status condition to the route. When the deprecated configuration is no longer detected, it clears/removes the Deprecated condition.

In order to support updating a new route status condition, the routeRejectionRecorder and associated methods were refactored to be more generic. The writerlease logic used to update route status had to be refactored because the old code made assumptions that only one type of status condition would ever be updated.

It also updates certificates for existing unit tests, as some certificates were SHA1 and incorrectly labeled as expired.

Dependent on openshift/api#1722
Related Ingress Operator PR openshift/cluster-ingress-operator#1014

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci openshift-ci bot requested a review from ShudiLi January 30, 2024 17:09
@gcs278
Copy link
Contributor Author

gcs278 commented Jan 30, 2024

/test verify

@gcs278 gcs278 changed the title [WIP] OCPBUGS-26498: Route Deprecated plugin for SHA1 certs [WIP] OCPBUGS-26498: Deprecation Validation plugin for SHA1 certs Jan 30, 2024
@gcs278
Copy link
Contributor Author

gcs278 commented Apr 17, 2024

/test e2e-upgrade

Copy link
Contributor

@Miciah Miciah left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Excellent work! I made some comments, but the one thing I want to clarify is whether UpgradeRouteValidation should be checking tlsConfig.CACertificate or tlsConfig.DestinationCACertificate in addition to tlsConfig.Certificate (see #555 (comment)).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are the changes in this PR causing scanners to report issues with these files, or is this change logically separate from the SHA1 validation change?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the PR commit Regenerate certs for unit tests is causes the scanner to report repo leaks due to certs changing in the test files.

Regenerate certs for unit tests is required because we need to differentiate between SHA1 and SHA256 certs in our testing for the Upgrade Validation plugin.

Comment on lines 1232 to 1243
name: "Edge termination cert key mismatch",
route: &routev1.Route{
Spec: routev1.RouteSpec{
Host: "www.example.com",
TLS: &routev1.TLSConfig{
Termination: routev1.TLSTerminationEdge,
Certificate: testExpiredCAUnknownCertificate,
Certificate: testCAUnknownCertificate,
Key: testPrivateKey,
CACertificate: testCACertificate,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test case seems to have both a mismatch between server certificate and key and a mismatch between server certificate and CA certificate. The test case fails with the expected error (namely the mismatch between server certificate and key), so it passes as is. However, there are a few oddities here:

  • The test-case name only mentions the one mismatch.
  • The test case expects only one error—perhaps for the same reason that "Edge termination with expired cert, ignoring CA mismatch" passes, in which case it might be helpful to add a comment saying as much.
  • I suspect the test case would pass when it shouldn't if the validation erroneously checked for a match between the server certificate and CA certificate and not between the server certificate and key.
  • There is already a test, "Edge termination mismatched key and cert", which seems to cover the case of a mismatch between server certificate and key:
    {
    name: "Edge termination cert with unknown CA",
    route: &routev1.Route{
    Spec: routev1.RouteSpec{
    Host: "www.example.com",
    TLS: &routev1.TLSConfig{
    Termination: routev1.TLSTerminationEdge,
    Certificate: testCAUnknownCertificate,
    Key: testCAUnknownCertificateKey,
    },
    },
    },
    expectedErrors: 0,
    },

    Is this a duplicate test case, or should it be named something like "Edge termination mismatched key and cert with unrelated CA"?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test-case name only mentions the one mismatch.

Updated the name to "Edge termination mismatched key and cert with unrelated CA".

The test case expects only one error—perhaps for the same reason that "Edge termination with expired cert, ignoring CA mismatch" passes, in which case it might be helpful to add a comment saying as much.

Sorry, this is where I get confused. I thought you were taking about previously named "Edge termination cert key mismatch", which returns 2 expectedErrors.

Is this a duplicate test case, or should it be named something like "Edge termination mismatched key and cert with unrelated CA"?

I don't think it's a duplicate. It returns a different number of errors, so I've taken your suggested name change.

Copy link
Contributor

@Miciah Miciah Apr 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test case expects only one error—perhaps for the same reason that "Edge termination with expired cert, ignoring CA mismatch" passes, in which case it might be helpful to add a comment saying as much.

Sorry, this is where I get confused. I thought you were taking about previously named "Edge termination cert key mismatch", which returns 2 expectedErrors.

Sorry, I must have been mixing up test cases or misreading the diff.

gcs278 added 6 commits April 19, 2024 15:16
SHA1 certs will be unsupported in the future. Clean up unit tests to
organize clearly between SHA1 and SHA256 certs. The unit tests had SHA1
certs that weren't tested separately. Also, the unit tests claimed to
be using an expired certificate, but it was NOT expired. Document all
certs by adding openssl comands for the regenerated certs to help future
developers recreate certs if needed.
Bumping openshift/api requires bumping golang to 1.21 otherwise the
ci/prow/verify job will fail on go tidy.
Bump openshift/api to get new route UnservableInFutureVersions
status condition type and update K8S packages to 0.29.1.
When an route ingress already exists, but an expected condition isn't
found, recordIngressCondition would previously not change/update
anything. This wasn't an issue for when only had Admitted because the
route ingress was updated synchronously with the Admitted condition.
However, adding another route status condition exposes the scenario in
which a route ingress will exist, but the desired condition won't.

This also adds a unit test for recordIngressCondition which explicitly
tests this case along with other edge cases.
Previously, recordIngressCondition sometimes clears the original return value
conditions depending on whether the condition was found and changed. This
doesn't seem consistent and makes it hard to unit test.

The original return value only affects logging. Logging the original conditions
is advantageous because modifying conditions can also lead to contention,
and having visibility into the modified conditions is beneficial.

Therefore, it seems reasonable to remove condition clearing as it
benefits logging and it's inconsistent.

Update unit testing to now verify original return value from
recordIngressCondition.
Add the Upgrade Validation plugin which provides a framework for adding
a status to a route to indicate future upgrade issues. For this
initial implementation, if a route has a SHA1 certificate, the upgrade
validation plugin will mark the route with a UnservableInFutureVersions
status. If the route no longer has a SHA1 certificate, remove the status
instead of setting UnservableInFutureVersions=false.

This is specific to OpenShift 4.15 and is expected to be removed
from 4.16 once backported to avoid confusion. It's specific to 4.15
because 4.15 accepts SHA1 certs, but 4.16 does not. Therefore,
validating the upgrade should be only done in 4.15.

Update the RejectionRecorder interface to be the RouteStatusRecorder to
support more generic status-related use cases as we have more than just
Admitted status now.

Add UPGRADE_VALIDATION env variable, but default to true, so that we can
disable in the future. Follow existing pattern similar to existing
EXTENDED_VALIDATION env variable.

Key the writerlease logic off of route UID AND condition type. This is
now essential because if only route UID, then the lease logic will only
update one condition as it doesn't differentiate from the work items for
the different conditions.

Adjust findMostRecentIngress to use the latest condition instead of just
Admitted as we can no longer assume that Admitted is the only condition.

Add a type for writelease WorkKey to ensure strings aren't used anymore
as the previous logic was only using route.UID. Add a type for
ContentionTracker key to ensure it is different than writelease WorkKey.

Update ingressEqual function to detect contention if any condition is
not equal. Previously, it hardcoded the "Admitted" status. Add a unit
test, TestRouterContentionUpdateCondition to test these changes.
@gcs278 gcs278 force-pushed the OCPBUGS-26498-upgradeable-status branch from d5ea141 to c22b003 Compare April 19, 2024 19:40
Copy link
Contributor Author

@gcs278 gcs278 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the PR commit Regenerate certs for unit tests is causes the scanner to report repo leaks due to certs changing in the test files.

Regenerate certs for unit tests is required because we need to differentiate between SHA1 and SHA256 certs in our testing for the Upgrade Validation plugin.

Comment on lines +1094 to +1515
if diff := cmp.Diff(baselineIngressState, original, cmpOpts...); len(diff) > 0 {
t.Errorf("expected original to match baseline ingress from route.Status (-want +got):\n%s", diff)
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because I believe removing the original.Conditions = nil in recordIngressCondition is a net positive .

  • First, it had no impact other than logging. original was only used for logging
  • Besides being confusing to modify a variable called original, removing the conditions removes half the benefit of logging here.
    • I found myself trying to debug a contention issue, and the fact it doesn't log the conditions makes that much harder.
    • Adding the new UnservableInFutureVersions condition and condition removal as contention points makes this even more desirable (more potential contention will be between conditions)
  • It only clears conditions on the original return value when the ingress has changed...that feels confusing and inconsistent.

Comment on lines 199 to 200
// preserve a deep copy of the original ingress
original := existing.DeepCopy()
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like a pessimization. Doing a deep copy has significant time and memory impact.

Yes, it's slower, I agree.

Is the intention only to make writing the unit test easier, or is it the intention also to improve logging or something else?

Well neither, I wrote it as a "future proofing" or "avoiding fragility" fix.

Reviewing how shallow copies work in Go again, I see now that we technically don't need the deepcopy, as the RouteIngress object is made up of values, not pointers, except for []conditions, which I now understand each item in a slice is a pointer, but the slice header is a value (which includes slice length).

Since the logic below only ever appends to the shallow copy slice of existing.Conditions (different slice header), that means that original will be unaffected by the current logic below.

I'll switch back to a shallow copy for now since it's provides the same result.


return handleRouteStatusUpdate(context.TODO(), action, oc, route, latest, tracker)
})
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes I like that. I had the idea to do a callback, but got stuck following through with it, but this was I wanted.

Maybe better left as a follow-up as it would be nontrivial to check for correctness.

Could you elaborate? It appear "correct" for our two use-cases (performIngressConditionUpdate, performIngressConditionRemoval), but are you saying we would need to explore "correctness" for other (future) use cases?

I ask because I prefer this logic, and don't exactly see how any more "non-trivial" it is than what I currently have.

Comment on lines 1232 to 1243
name: "Edge termination cert key mismatch",
route: &routev1.Route{
Spec: routev1.RouteSpec{
Host: "www.example.com",
TLS: &routev1.TLSConfig{
Termination: routev1.TLSTerminationEdge,
Certificate: testExpiredCAUnknownCertificate,
Certificate: testCAUnknownCertificate,
Key: testPrivateKey,
CACertificate: testCACertificate,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test-case name only mentions the one mismatch.

Updated the name to "Edge termination mismatched key and cert with unrelated CA".

The test case expects only one error—perhaps for the same reason that "Edge termination with expired cert, ignoring CA mismatch" passes, in which case it might be helpful to add a comment saying as much.

Sorry, this is where I get confused. I thought you were taking about previously named "Edge termination cert key mismatch", which returns 2 expectedErrors.

Is this a duplicate test case, or should it be named something like "Edge termination mismatched key and cert with unrelated CA"?

I don't think it's a duplicate. It returns a different number of errors, so I've taken your suggested name change.

Comment on lines -993 to +1171
name: "Edge termination expired cert unknown CA",
name: "Edge termination cert with unknown CA",
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, sorry this is painful to review. Thanks for writing this out.

  • "Edge termination expired cert unknown CA" is deleted and replaced by these two tests:
    • "Edge termination expired cert with unknown CA", which uses an actually expired certificate so it does what the name says.
    • "Edge termination cert with unknown CA", which does what "Edge termination expired cert unknown CA" actually did (despite the latter's misleading name).

Yes, that seems right.

"Edge termination expired cert key mismatch" is renamed to "Edge termination cert key mismatch" to reflect what it actually tests. We already have "Expired cert mismatched key", which actually covers what the name implies (though unlike "Edge termination expired cert key mismatch", "Expired cert mismatched key" doesn't specify CACertificate).

Yes, this one just had an incorrect name.

  • "Edge termination expired cert mismatched CA" is deleted and replaced by these two tests:
    • "Edge termination with expired cert, ignoring CA mismatch", which uses an actually expired certificate so it does what the name says. (Using an actually expired certificate also means that expectedErrors changes from 0 to 1.)
    • "Edge termination cert mismatched CA", which does what "Edge termination expired cert mismatched CA" actually did (despite the misleading name).

Correct, the testExpiredCAUnknownCertificate it was using was NOT expired. Super confusing.

"Expired cert" is renamed to "expired cert edge" and fixed to use an actually expired certificate; what it actually tested is already tested by "Edge termination OK with certs".

Correct.

The rest of the changes are changes to individual tests, the test runner logic, or net-new tests.

I think so.

@Miciah
Copy link
Contributor

Miciah commented Apr 20, 2024

Thanks!
/approve
/lgtm
Do you want to implement #555 (comment), or should we go with what you have?
/hold
for the above question, and in case Andy wants to have another look.

@openshift-ci openshift-ci bot added do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. lgtm Indicates that a PR is ready to be merged. labels Apr 20, 2024
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Apr 20, 2024

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: Miciah

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Apr 20, 2024
@gcs278
Copy link
Contributor Author

gcs278 commented Apr 20, 2024

Thanks @Miciah! For the sake of time, I'm going to save #555 (comment) as a follow up. I do plan a follow up change in a hopeful fix for https://issues.redhat.com/browse/OCPBUGS-1689 as this PR has laid the groundwork from removing conditions.

I'll wait for to hear from @frobware to remove the hold.

@gcs278
Copy link
Contributor Author

gcs278 commented Apr 20, 2024

Unrelated pod IP failure:
/test e2e-upgrade

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Apr 20, 2024

@gcs278: all tests passed!

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@frobware
Copy link
Contributor

/hold cancel

@openshift-ci openshift-ci bot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Apr 22, 2024
@openshift-merge-bot openshift-merge-bot bot merged commit 7ad7ec1 into openshift:master Apr 22, 2024
@openshift-ci-robot
Copy link
Contributor

@gcs278: Jira Issue OCPBUGS-26498: Some pull requests linked via external trackers have merged:

The following pull requests linked via external trackers have not merged:

These pull request must merge or be unlinked from the Jira bug in order for it to move to the next state. Once unlinked, request a bug refresh with /jira refresh.

Jira Issue OCPBUGS-26498 has not been moved to the MODIFIED state.

Details

In response to this:

This PR adds support for setting a new status condition type, UnservableInFutureVersions, when a SHA1 cert is detected on a route. It accomplishes this by adding a new plugin, the upgrade validation plugin. This plugin is simple: when a route configuration that is unsupported in the future is detected, it adds a UnservableInFutureVersions status condition to the route. When the problematic configuration is no longer detected, it clears/removes the UnservableInFutureVersions condition.

In order to support updating a new route status condition, the routeRejectionRecorder and associated methods were refactored to be more generic. The writerlease logic used to update route status had to be refactored because the old code made assumptions that only one type of status condition would ever be updated.

It also updates certificates for existing unit tests, as some certificates were SHA1 and incorrectly labeled as expired.

Dependent on openshift/api#1722
Related Ingress Operator PR openshift/cluster-ingress-operator#1014

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-bot
Copy link
Contributor

[ART PR BUILD NOTIFIER]

This PR has been included in build ose-haproxy-router-base-container-v4.16.0-202404220845.p0.g7ad7ec1.assembly.stream.el9 for distgit ose-haproxy-router-base.
All builds following this will include this PR.

@openshift-ci-robot
Copy link
Contributor

@gcs278: Jira Issue OCPBUGS-26498: Some pull requests linked via external trackers have merged:

The following pull requests linked via external trackers have not merged:

These pull request must merge or be unlinked from the Jira bug in order for it to move to the next state. Once unlinked, request a bug refresh with /jira refresh.

Jira Issue OCPBUGS-26498 has not been moved to the MODIFIED state.

Details

In response to this:

This PR adds support for setting a new status condition type, UnservableInFutureVersions, when a SHA1 cert is detected on a route. It accomplishes this by adding a new plugin, the upgrade validation plugin. This plugin is simple: when a route configuration that is unsupported in the future is detected, it adds a UnservableInFutureVersions status condition to the route. When the problematic configuration is no longer detected, it clears/removes the UnservableInFutureVersions condition.

In order to support updating a new route status condition, the routeRejectionRecorder and associated methods were refactored to be more generic. The writerlease logic used to update route status had to be refactored because the old code made assumptions that only one type of status condition would ever be updated.

It also updates certificates for existing unit tests, as some certificates were SHA1 and incorrectly labeled as expired.

This PR is intended to be introduced in 4.16, but to be backported to 4.15 as UnservableInFutureVersions only makes sense in 4.15 because 4.16 will reject SHA1 routes. In PR #552, there will be a partial revert, as it will no longer add UnservableInFutureVersions to SHA1 routes due to them now being rejection.

Dependent on openshift/api#1722
Related Ingress Operator PR openshift/cluster-ingress-operator#1014

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@gcs278
Copy link
Contributor Author

gcs278 commented Apr 22, 2024

/cherry-pick 4.15

@openshift-cherrypick-robot

@gcs278: cannot checkout 4.15: error checking out "4.15": exit status 1 error: pathspec '4.15' did not match any file(s) known to git

Details

In response to this:

/cherry-pick 4.15

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@gcs278
Copy link
Contributor Author

gcs278 commented Apr 22, 2024

/cherry-pick release-4.15

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. jira/severity-moderate Referenced Jira bug's severity is moderate for the branch this PR is targeting. jira/valid-bug Indicates that a referenced Jira bug is valid for the branch this PR is targeting. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. lgtm Indicates that a PR is ready to be merged.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants