-
Notifications
You must be signed in to change notification settings - Fork 160
CCO-366 Add ability to detect AWS STS and behave accordingly #542
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Skipping CI for Draft Pull Request. |
Codecov Report
Additional details and impacted files@@ Coverage Diff @@
## master #542 +/- ##
==========================================
- Coverage 48.06% 47.84% -0.23%
==========================================
Files 93 93
Lines 11305 11488 +183
==========================================
+ Hits 5434 5496 +62
- Misses 5251 5359 +108
- Partials 620 633 +13
|
pkg/aws/actuator/actuator.go
Outdated
| ErrReason: minterv1.InsufficientCloudCredentials, | ||
| Message: msg, | ||
| stsDetected, err := utils.IsTimedTokenCluster(a.Client, logger) | ||
| if stsDetected { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After this change e2e started working. I am kind of unsure about it though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, here is logging from a Clusterbot cluster launched with workflow-launch openshift-e2e-aws-manual-oidc-sts https://github.com/openshift/cloud-credential-operator/pull/542 and then having run make test-e2e-sts manually:
time="2023-06-12T13:40:45Z" level=info msg="reconciling clusteroperator status"
time="2023-06-12T13:40:45Z" level=info msg="operator set to disabled / manual mode" controller=credreq cr=openshift-cloud-credential-operator/test-sts-creds-req
time="2023-06-12T13:40:45Z" level=info msg="syncing credentials request" controller=credreq cr=openshift-cloud-credential-operator/test-sts-creds-req
time="2023-06-12T13:40:45Z" level=info msg="adding finalizer: cloudcredential.openshift.io/deprovision" controller=credreq cr=openshift-cloud-credential-operator/test-sts-creds-req secret=default/test-sts-secret
time="2023-06-12T13:40:45Z" level=info msg="operator set to disabled / manual mode" controller=credreq cr=openshift-cloud-credential-operator/test-sts-creds-req
time="2023-06-12T13:40:45Z" level=info msg="syncing credentials request" controller=credreq cr=openshift-cloud-credential-operator/test-sts-creds-req
time="2023-06-12T13:40:45Z" level=info msg="timed token access cluster detected: true, so not trying to provision with root secret" controller=credreq cr=openshift-cloud-credential-operator/test-sts-creds-req secret=default/test-sts-secret
time="2023-06-12T13:40:45Z" level=info msg="actuator detected STS enabled cluster making secret" actuator=aws cr=openshift-cloud-credential-operator/test-sts-creds-req
time="2023-06-12T13:40:45Z" level=info msg="creating secret" actuator=aws cr=openshift-cloud-credential-operator/test-sts-creds-req
time="2023-06-12T13:40:45Z" level=info msg="clusteroperator status updated" controller=status
time="2023-06-12T13:40:45Z" level=info msg="reconciling clusteroperator status"
time="2023-06-12T13:40:45Z" level=info msg="secret created successfully" actuator=aws cr=openshift-cloud-credential-operator/test-sts-creds-req
time="2023-06-12T13:40:56Z" level=info msg="calculating metrics for all CredentialsRequests" controller=metrics
time="2023-06-12T13:40:56Z" level=info msg="reconcile complete" controller=metrics elapsed=4.220597ms
time="2023-06-12T13:41:56Z" level=info msg="reconciling clusteroperator status"
time="2023-06-12T13:42:56Z" level=info msg="calculating metrics for all CredentialsRequests" controller=metrics
time="2023-06-12T13:42:56Z" level=info msg="reconcile complete" controller=metrics elapsed=4.133929ms
time="2023-06-12T13:44:45Z" level=info msg="reconciling clusteroperator status"
time="2023-06-12T13:44:45Z" level=info msg="operator set to disabled / manual mode" controller=credreq cr=openshift-cloud-credential-operator/test-sts-creds-req
time="2023-06-12T13:44:45Z" level=info msg="syncing credentials request" controller=credreq cr=openshift-cloud-credential-operator/test-sts-creds-req
time="2023-06-12T13:44:45Z" level=warning msg="no user name set on credentials being deleted, most likely were never provisioned or using passthrough creds" actuator=aws cr=openshift-cloud-credential-operator/test-sts-creds-req
time="2023-06-12T13:44:45Z" level=info msg="target secret deleted successfully" controller=credreq cr=openshift-cloud-credential-operator/test-sts-creds-req secret=default/test-sts-secret targetSecret=default/test-sts-secret
time="2023-06-12T13:44:45Z" level=info msg="actuator deletion complete, removing finalizer" controller=credreq cr=openshift-cloud-credential-operator/test-sts-creds-req secret=default/test-sts-secret
time="2023-06-12T13:44:45Z" level=info msg="reconciling clusteroperator status"
time="2023-06-12T13:44:45Z" level=info msg="operator set to disabled / manual mode" controller=credreq cr=openshift-cloud-credential-operator/test-sts-creds-req
time="2023-06-12T13:44:45Z" level=info msg="syncing credentials request" controller=credreq cr=openshift-cloud-credential-operator/test-sts-creds-req
time="2023-06-12T13:44:45Z" level=info msg="clusteroperator status updated" controller=status
time="2023-06-12T13:44:56Z" level=info msg="calculating metrics for all CredentialsRequests" controller=metrics
time="2023-06-12T13:44:56Z" level=info msg="reconcile complete" controller=metrics elapsed=3.554297ms
time="2023-06-12T13:46:56Z" level=info msg="calculating metrics for all CredentialsRequests" controller=metrics
time="2023-06-12T13:46:56Z" level=info msg="reconcile complete" controller=metrics elapsed=4.054413ms
time="2023-06-12T13:46:56Z" level=info msg="reconciling clusteroperator status"
So I feel more confident the test is really passing.
|
|
||
| func onAdd(t *testing.T, cfg envconf.Config, ctx context.Context) func(obj interface{}) { | ||
| time.Sleep(2 * time.Minute) | ||
| if err := cfg.Client().Resources().Get(ctx, secretName, namespace, secret); err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is because of unsureness about Secret really existing here, despite this being a callback from the watch on the Secret.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The obj interface{} passed to this func is the object that was just added - perhaps use that? 2min wait should not be needed here
|
/assign @abutcher |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Couple small things. Wondering if some of the new Info logs should be moved to Debug based on their potential noise level but those can be moved or followed up on.
| } | ||
|
|
||
| func computeClusterOperatorVersions() string { | ||
| currentVersion := os.Getenv("RELEASE_VERSION") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if this is not set?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You don't get to use the feature gate? As featuregates.NewFeatureGateAccess likely doesn't do much for nil as the desired version.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A string cannot be nil in golang, it will instead be an empty string. So, in such a case it will be looking for the cluster version to be an empty string. In such a case, I expect a lot of other things will also be broken. The good news is RELEASE_VERSION is defined in the 03-deployment.yaml and should always be non-empty.
pkg/operator/utils/utils.go
Outdated
| logger.WithError(err).Error("error loading CCO configuration to determine mode") | ||
| return false, err | ||
| } | ||
| if credentialsMode != "Manual" { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
super-nit: is manual not a constant somewhere?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yikes! investigating I find that mode referred to as both "manual" and "Manual" in yamls, docs and code. Yes, there is a constant, that has it as "manual" (1b384cc#diff-f6d93feef6d59ca8ced14b54fe0058d41f14de44450baf6b723cc1c2ec280c54R25) but it doesn't seem to be used everywhere now....
So I'm a little leery to start using the constant version here until CCO team can review a bit more. I'd say it can wait until another day to get better?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
GitOperatorConfiguration() returns an operatorv1.CloudCredentialsMode which has a constant CloudCredentialsModeManual = "Manual".
|
|
||
| func onAdd(t *testing.T, cfg envconf.Config, ctx context.Context) func(obj interface{}) { | ||
| time.Sleep(2 * time.Minute) | ||
| if err := cfg.Client().Resources().Get(ctx, secretName, namespace, secret); err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The obj interface{} passed to this func is the object that was just added - perhaps use that? 2min wait should not be needed here
pkg/operator/credentialsrequest/credentialsrequest_controller.go
Outdated
Show resolved
Hide resolved
Changed all 3 instances to Debug in d6bb916 |
| return reconcile.Result{}, err | ||
| if !stsDetected { | ||
| logger.Infof("operator detects STS enabled cluster") | ||
| return reconcile.Result{}, err |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This return short circuits continuing through the controller to create the secret for the CredentialsRequest.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Worth adding a test to the credentials request controller tests to ensure we go through the motions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's the end of the day and it's been a long one for me, again, I'm not seeing a good way to create a test on the controller with STS enabled. There might be one, I'm just kind of lost at the moment. Maybe we should add a card to Jira for "Streamline Credentials Request Controller Unit Tests and add STS enabled and feature gate flags"?
| // The presence of an STSRoleARN within the AWSProviderSpec initiates creation of a secret containing IAM | ||
| // Role details necessary for assuming the IAM Role via Amazon's Secure Token Service. | ||
| // +optional | ||
| // +kubebuilder:validation:Pattern:="^arn:(?P<Partition>[^:\n]*):(?P<Service>[^:\n]*):(?P<Region>[^:\n]*):(?P<AccountID>[^:\n]*):(?P<Ignore>(?P<ResourceType>[^:\/\n]*)[:\/])?(?P<Resource>.*)$" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The role ARN validation is causing CRD generation to fail which for some reason fails silently when ran via make update. The silently failing part is possibly related to the openshift/api workaround.
'_output/tools/bin/controller-gen-v0.2.5' schemapatch:manifests="./manifests" paths="./pkg/apis/cloudcredential/v1" 'output:dir="./manifests"'
/home/abutcher/go/src/github.com/openshift/cloud-credential-operator/pkg/apis/cloudcredential/v1/types_aws.go:36:2: invalid char escape (at <input>:1:1)
/home/abutcher/go/src/github.com/openshift/cloud-credential-operator/pkg/apis/cloudcredential/v1/types_aws.go:36:2: invalid char escape (at <input>:1:1)
/home/abutcher/go/src/github.com/openshift/cloud-credential-operator/pkg/apis/cloudcredential/v1/types_aws.go:36:2: unable to parse string: invalid syntax (at <input>:1:1)
/home/abutcher/go/src/github.com/openshift/cloud-credential-operator/pkg/apis/cloudcredential/v1/types_aws.go:36:2: invalid char escape (at <input>:1:1)
/home/abutcher/go/src/github.com/openshift/cloud-credential-operator/pkg/apis/cloudcredential/v1/types_aws.go:36:2: invalid char escape (at <input>:1:1)
/home/abutcher/go/src/github.com/openshift/cloud-credential-operator/pkg/apis/cloudcredential/v1/types_aws.go:36:2: unable to parse string: invalid syntax (at <input>:1:1)
Error: not all generators ran successfully
This comment was marked as duplicate.
This comment was marked as duplicate.
…s for" This reverts commit 6561e65. Signed-off-by: Steve Kuznetsov <skuznets@redhat.com>
``` $ make update ``` Signed-off-by: Steve Kuznetsov <skuznets@redhat.com>
For whatever reason, when the upstream target calls controller-gen, the tool reads and writes *every* YAML file in the `./manifests` directory, even files that the tool is not writing to during the generator step. We copy our `00-config-custresdef.yaml` from the OpenShift API repo, which uses a bsepoke generator with different formatting on the output. Therefore, when `controller-gen` is run on the YAML we copy from the API repo, it re-formats it and causes the `git diff` check to fail. In this override, we simply copy back in the YAMLs from the API repo after generating, overwriting anything `controller-gen` may have done, so we don't have this spurious failure. Signed-off-by: Steve Kuznetsov <skuznets@redhat.com>
|
In reply to #542 (comment)
|
|
/test e2e-aws-manual-oidc |
|
Responding to #542 (comment) I agree, the constants file is confusing. I had to dig in to understand how it is used. It looks like they are used solely for keys (string) in a mapping in the metrics operator. The mapping appears to be more of a status use case than a desired configuration. There also appear to be additional status options compared to |
|
@bentito: all tests passed! Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
We spent some time in openshift#511 and again in openshift#542 trying to reconcile CRDs. The problem is that we want to *generate* the CredentialsRequest CRD from code in this repo, but *use* (copy) the CloudCredential CRD from openshift/api, which we vendor. But we invoke controller-gen through build-machinery-go, and it does unexpected things to the latter, which breaks validation. With this commit, we move the CredentialsRequest CRD to a `generated` subdirectory and the CloudCredential CRD to an `imported` subdirectory. This lets us go back to the simpler invocation of bmg's tooling while keeping everything in the shape we expect. One more quirk: Because build-machinery-go starts defining dependency chains for targets like `update`, we need to start defining that dependency chain *before* we import the bmg libs to ensure that we copy/generate CRDs *before* we include them in bindata.
|
/approve proxy for @abutcher |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: 2uasimojo, bentito, stevekuznetsov The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
/cherry-pick release-4.13 |
|
@stevekuznetsov: #542 failed to apply on top of branch "release-4.13": DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
e9f9cc6 (Add & logic - new token CredReq.spec.cred* fields, 2023-06-27, openshift#542) created the STS-specfic branch here, and shifted the pre-existing hasRecentlySynced check to the non-STS branch. But that's leading to hot update loops, as the reconciler bangs away bumping status.lastSyncTimestamp (which we've had since the initial cloud-cred operator pull request [1]). For example in this recent CI run [2]: $ curl -s https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/test-platform-results/pr-logs/pull/openshift_cloud-credential-operator/789/pull-ci-openshift-cloud-credential-operator-master-e2e-aws-manual-oidc/1864768460005838848/artifacts/e2e-aws-manual-oidc/gather-extra/artifacts/pods/openshift-cloud-credential-operator_cloud-credential-operator-64bcbb54cc-vzs5s_cloud-credential-operator.log | grep -1 ingress | tail -n13 -- time="2024-12-05T21:44:49Z" level=info msg="status has changed, updating" controller=credreq cr=openshift-cloud-credential-operator/aws-ebs-csi-driver-operator secret=openshift-cluster-csi-drivers/ebs-cloud-credentials time="2024-12-05T21:44:49Z" level=info msg="operator detects timed access token enabled cluster (STS, Workload Identity, etc.)" controller=credreq cr=openshift-cloud-credential-operator/openshift-ingress time="2024-12-05T21:44:49Z" level=info msg="syncing credentials request" controller=credreq cr=openshift-cloud-credential-operator/openshift-ingress time="2024-12-05T21:44:49Z" level=info msg="status has changed, updating" controller=credreq cr=openshift-cloud-credential-operator/openshift-ingress secret=openshift-ingress-operator/cloud-credentials time="2024-12-05T21:44:49Z" level=info msg="reconciling clusteroperator status" -- time="2024-12-05T21:44:51Z" level=info msg="reconciling clusteroperator status" time="2024-12-05T21:44:52Z" level=info msg="operator detects timed access token enabled cluster (STS, Workload Identity, etc.)" controller=credreq cr=openshift-cloud-credential-operator/openshift-ingress time="2024-12-05T21:44:52Z" level=info msg="syncing credentials request" controller=credreq cr=openshift-cloud-credential-operator/openshift-ingress time="2024-12-05T21:44:52Z" level=info msg="reconciling clusteroperator status" time="2024-12-05T21:44:52Z" level=info msg="status has changed, updating" controller=credreq cr=openshift-cloud-credential-operator/openshift-ingress secret=openshift-ingress-operator/cloud-credentials time="2024-12-05T21:44:52Z" level=info msg="operator detects timed access token enabled cluster (STS, Workload Identity, etc.)" controller=credreq cr=openshift-cloud-credential-operator/openshift-machine-api-aws With this commit, we'll perform the same back-off in the STS case too, to avoid flooding the Kube API server with status.lastSyncTimestamp updates. [1]: openshift@a6d385a#diff-69794ca0db76a04660e3355ba9b824f34e7af1030d0a8114903d11847201c410R46 [2]: https://prow.ci.openshift.org/view/gs/test-platform-results/pr-logs/pull/openshift_cloud-credential-operator/789/pull-ci-openshift-cloud-credential-operator-master-e2e-aws-manual-oidc/1864768460005838848
e9f9cc6 (Add & logic - new token CredReq.spec.cred* fields, 2023-06-27, openshift#542) created the STS-specfic branch here, and shifted the pre-existing hasRecentlySynced check to the non-STS branch. But that's leading to hot update loops, as the reconciler bangs away bumping status.lastSyncTimestamp (which we've had since the initial cloud-cred operator pull request [1]). For example in this recent CI run [2]: $ curl -s https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/test-platform-results/pr-logs/pull/openshift_cloud-credential-operator/789/pull-ci-openshift-cloud-credential-operator-master-e2e-aws-manual-oidc/1864768460005838848/artifacts/e2e-aws-manual-oidc/gather-extra/artifacts/pods/openshift-cloud-credential-operator_cloud-credential-operator-64bcbb54cc-vzs5s_cloud-credential-operator.log | grep -1 ingress | tail -n13 -- time="2024-12-05T21:44:49Z" level=info msg="status has changed, updating" controller=credreq cr=openshift-cloud-credential-operator/aws-ebs-csi-driver-operator secret=openshift-cluster-csi-drivers/ebs-cloud-credentials time="2024-12-05T21:44:49Z" level=info msg="operator detects timed access token enabled cluster (STS, Workload Identity, etc.)" controller=credreq cr=openshift-cloud-credential-operator/openshift-ingress time="2024-12-05T21:44:49Z" level=info msg="syncing credentials request" controller=credreq cr=openshift-cloud-credential-operator/openshift-ingress time="2024-12-05T21:44:49Z" level=info msg="status has changed, updating" controller=credreq cr=openshift-cloud-credential-operator/openshift-ingress secret=openshift-ingress-operator/cloud-credentials time="2024-12-05T21:44:49Z" level=info msg="reconciling clusteroperator status" -- time="2024-12-05T21:44:51Z" level=info msg="reconciling clusteroperator status" time="2024-12-05T21:44:52Z" level=info msg="operator detects timed access token enabled cluster (STS, Workload Identity, etc.)" controller=credreq cr=openshift-cloud-credential-operator/openshift-ingress time="2024-12-05T21:44:52Z" level=info msg="syncing credentials request" controller=credreq cr=openshift-cloud-credential-operator/openshift-ingress time="2024-12-05T21:44:52Z" level=info msg="reconciling clusteroperator status" time="2024-12-05T21:44:52Z" level=info msg="status has changed, updating" controller=credreq cr=openshift-cloud-credential-operator/openshift-ingress secret=openshift-ingress-operator/cloud-credentials time="2024-12-05T21:44:52Z" level=info msg="operator detects timed access token enabled cluster (STS, Workload Identity, etc.)" controller=credreq cr=openshift-cloud-credential-operator/openshift-machine-api-aws With this commit, we'll perform the same back-off in the STS case too, to avoid flooding the Kube API server with status.lastSyncTimestamp updates. [1]: openshift@a6d385a#diff-69794ca0db76a04660e3355ba9b824f34e7af1030d0a8114903d11847201c410R46 [2]: https://prow.ci.openshift.org/view/gs/test-platform-results/pr-logs/pull/openshift_cloud-credential-operator/789/pull-ci-openshift-cloud-credential-operator-master-e2e-aws-manual-oidc/1864768460005838848
e9f9cc6 (Add & logic - new token CredReq.spec.cred* fields, 2023-06-27, openshift#542) created the STS-specfic branch here, and shifted the pre-existing hasRecentlySynced check to the non-STS branch. But that's leading to hot update loops, as the reconciler bangs away bumping status.lastSyncTimestamp (which we've had since the initial cloud-cred operator pull request [1]). For example in this recent CI run [2]: $ curl -s https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/test-platform-results/pr-logs/pull/openshift_cloud-credential-operator/789/pull-ci-openshift-cloud-credential-operator-master-e2e-aws-manual-oidc/1864768460005838848/artifacts/e2e-aws-manual-oidc/gather-extra/artifacts/pods/openshift-cloud-credential-operator_cloud-credential-operator-64bcbb54cc-vzs5s_cloud-credential-operator.log | grep -1 ingress | tail -n13 -- time="2024-12-05T21:44:49Z" level=info msg="status has changed, updating" controller=credreq cr=openshift-cloud-credential-operator/aws-ebs-csi-driver-operator secret=openshift-cluster-csi-drivers/ebs-cloud-credentials time="2024-12-05T21:44:49Z" level=info msg="operator detects timed access token enabled cluster (STS, Workload Identity, etc.)" controller=credreq cr=openshift-cloud-credential-operator/openshift-ingress time="2024-12-05T21:44:49Z" level=info msg="syncing credentials request" controller=credreq cr=openshift-cloud-credential-operator/openshift-ingress time="2024-12-05T21:44:49Z" level=info msg="status has changed, updating" controller=credreq cr=openshift-cloud-credential-operator/openshift-ingress secret=openshift-ingress-operator/cloud-credentials time="2024-12-05T21:44:49Z" level=info msg="reconciling clusteroperator status" -- time="2024-12-05T21:44:51Z" level=info msg="reconciling clusteroperator status" time="2024-12-05T21:44:52Z" level=info msg="operator detects timed access token enabled cluster (STS, Workload Identity, etc.)" controller=credreq cr=openshift-cloud-credential-operator/openshift-ingress time="2024-12-05T21:44:52Z" level=info msg="syncing credentials request" controller=credreq cr=openshift-cloud-credential-operator/openshift-ingress time="2024-12-05T21:44:52Z" level=info msg="reconciling clusteroperator status" time="2024-12-05T21:44:52Z" level=info msg="status has changed, updating" controller=credreq cr=openshift-cloud-credential-operator/openshift-ingress secret=openshift-ingress-operator/cloud-credentials time="2024-12-05T21:44:52Z" level=info msg="operator detects timed access token enabled cluster (STS, Workload Identity, etc.)" controller=credreq cr=openshift-cloud-credential-operator/openshift-machine-api-aws With this commit, we'll perform the same back-off in the STS case too, to avoid flooding the Kube API server with status.lastSyncTimestamp updates. [1]: openshift@a6d385a#diff-69794ca0db76a04660e3355ba9b824f34e7af1030d0a8114903d11847201c410R46 [2]: https://prow.ci.openshift.org/view/gs/test-platform-results/pr-logs/pull/openshift_cloud-credential-operator/789/pull-ci-openshift-cloud-credential-operator-master-e2e-aws-manual-oidc/1864768460005838848
e9f9cc6 (Add & logic - new token CredReq.spec.cred* fields, 2023-06-27, openshift#542) created the STS-specfic branch here, and shifted the pre-existing hasRecentlySynced check to the non-STS branch. But that's leading to hot update loops, as the reconciler bangs away bumping status.lastSyncTimestamp (which we've had since the initial cloud-cred operator pull request [1]). For example in this recent CI run [2]: $ curl -s https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/test-platform-results/pr-logs/pull/openshift_cloud-credential-operator/789/pull-ci-openshift-cloud-credential-operator-master-e2e-aws-manual-oidc/1864768460005838848/artifacts/e2e-aws-manual-oidc/gather-extra/artifacts/pods/openshift-cloud-credential-operator_cloud-credential-operator-64bcbb54cc-vzs5s_cloud-credential-operator.log | grep -1 ingress | tail -n13 -- time="2024-12-05T21:44:49Z" level=info msg="status has changed, updating" controller=credreq cr=openshift-cloud-credential-operator/aws-ebs-csi-driver-operator secret=openshift-cluster-csi-drivers/ebs-cloud-credentials time="2024-12-05T21:44:49Z" level=info msg="operator detects timed access token enabled cluster (STS, Workload Identity, etc.)" controller=credreq cr=openshift-cloud-credential-operator/openshift-ingress time="2024-12-05T21:44:49Z" level=info msg="syncing credentials request" controller=credreq cr=openshift-cloud-credential-operator/openshift-ingress time="2024-12-05T21:44:49Z" level=info msg="status has changed, updating" controller=credreq cr=openshift-cloud-credential-operator/openshift-ingress secret=openshift-ingress-operator/cloud-credentials time="2024-12-05T21:44:49Z" level=info msg="reconciling clusteroperator status" -- time="2024-12-05T21:44:51Z" level=info msg="reconciling clusteroperator status" time="2024-12-05T21:44:52Z" level=info msg="operator detects timed access token enabled cluster (STS, Workload Identity, etc.)" controller=credreq cr=openshift-cloud-credential-operator/openshift-ingress time="2024-12-05T21:44:52Z" level=info msg="syncing credentials request" controller=credreq cr=openshift-cloud-credential-operator/openshift-ingress time="2024-12-05T21:44:52Z" level=info msg="reconciling clusteroperator status" time="2024-12-05T21:44:52Z" level=info msg="status has changed, updating" controller=credreq cr=openshift-cloud-credential-operator/openshift-ingress secret=openshift-ingress-operator/cloud-credentials time="2024-12-05T21:44:52Z" level=info msg="operator detects timed access token enabled cluster (STS, Workload Identity, etc.)" controller=credreq cr=openshift-cloud-credential-operator/openshift-machine-api-aws With this commit, we'll perform the same back-off in the STS case too, to avoid flooding the Kube API server with status.lastSyncTimestamp updates. I'm also adding a "NOT" to the logged cloudCredsSecretUpdated field, because the following 'if' condition is !cloudCredsSecretUpdated. The lack of "NOT" seems to have been accidental oversight when the logging fields were added in 0a0d849 (Changes to address PR comments from Steve ~3d ago, 2023-06-27, openshift#542). [1]: openshift@a6d385a#diff-69794ca0db76a04660e3355ba9b824f34e7af1030d0a8114903d11847201c410R46 [2]: https://prow.ci.openshift.org/view/gs/test-platform-results/pr-logs/pull/openshift_cloud-credential-operator/789/pull-ci-openshift-cloud-credential-operator-master-e2e-aws-manual-oidc/1864768460005838848
…condition Adding a "NOT" to the logged cloudCredsSecretUpdated field, because the following 'if' condition is !cloudCredsSecretUpdated. The lack of "NOT" seems to have been accidental oversight when the logging fields were added in 0a0d849 (Changes to address PR comments from Steve ~3d ago, 2023-06-27, openshift#542). I'm also adding isInfrastructureUpdated logging to catch up with cea55c6 (Added implementation for AWS Day2 Tag reconcilation Support, 2024-09-24, openshift#759), when it was added to the 'if' condition but overlooked in field logging.
e9f9cc6 (Add & logic - new token CredReq.spec.cred* fields, 2023-06-27, openshift#542) created the STS-specfic branch here, and shifted the pre-existing hasRecentlySynced check to the non-STS branch. But that's leading to hot update loops, as the reconciler bangs away bumping status.lastSyncTimestamp (which we've had since the initial cloud-cred operator pull request [1]). For example in this recent CI run [2]: $ curl -s https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/test-platform-results/pr-logs/pull/openshift_cloud-credential-operator/789/pull-ci-openshift-cloud-credential-operator-master-e2e-aws-manual-oidc/1864768460005838848/artifacts/e2e-aws-manual-oidc/gather-extra/artifacts/pods/openshift-cloud-credential-operator_cloud-credential-operator-64bcbb54cc-vzs5s_cloud-credential-operator.log | grep -1 ingress | tail -n13 -- time="2024-12-05T21:44:49Z" level=info msg="status has changed, updating" controller=credreq cr=openshift-cloud-credential-operator/aws-ebs-csi-driver-operator secret=openshift-cluster-csi-drivers/ebs-cloud-credentials time="2024-12-05T21:44:49Z" level=info msg="operator detects timed access token enabled cluster (STS, Workload Identity, etc.)" controller=credreq cr=openshift-cloud-credential-operator/openshift-ingress time="2024-12-05T21:44:49Z" level=info msg="syncing credentials request" controller=credreq cr=openshift-cloud-credential-operator/openshift-ingress time="2024-12-05T21:44:49Z" level=info msg="status has changed, updating" controller=credreq cr=openshift-cloud-credential-operator/openshift-ingress secret=openshift-ingress-operator/cloud-credentials time="2024-12-05T21:44:49Z" level=info msg="reconciling clusteroperator status" -- time="2024-12-05T21:44:51Z" level=info msg="reconciling clusteroperator status" time="2024-12-05T21:44:52Z" level=info msg="operator detects timed access token enabled cluster (STS, Workload Identity, etc.)" controller=credreq cr=openshift-cloud-credential-operator/openshift-ingress time="2024-12-05T21:44:52Z" level=info msg="syncing credentials request" controller=credreq cr=openshift-cloud-credential-operator/openshift-ingress time="2024-12-05T21:44:52Z" level=info msg="reconciling clusteroperator status" time="2024-12-05T21:44:52Z" level=info msg="status has changed, updating" controller=credreq cr=openshift-cloud-credential-operator/openshift-ingress secret=openshift-ingress-operator/cloud-credentials time="2024-12-05T21:44:52Z" level=info msg="operator detects timed access token enabled cluster (STS, Workload Identity, etc.)" controller=credreq cr=openshift-cloud-credential-operator/openshift-machine-api-aws With this commit, we'll perform the same back-off in the STS case too, to avoid flooding the Kube API server with status.lastSyncTimestamp updates. [1]: openshift@a6d385a#diff-69794ca0db76a04660e3355ba9b824f34e7af1030d0a8114903d11847201c410R46 [2]: https://prow.ci.openshift.org/view/gs/test-platform-results/pr-logs/pull/openshift_cloud-credential-operator/789/pull-ci-openshift-cloud-credential-operator-master-e2e-aws-manual-oidc/1864768460005838848
…condition Adding a "NOT" to the logged cloudCredsSecretUpdated field, because the following 'if' condition is !cloudCredsSecretUpdated. The lack of "NOT" seems to have been accidental oversight when the logging fields were added in 0a0d849 (Changes to address PR comments from Steve ~3d ago, 2023-06-27, openshift#542). I'm also adding isInfrastructureUpdated logging to catch up with cea55c6 (Added implementation for AWS Day2 Tag reconcilation Support, 2024-09-24, openshift#759), when it was added to the 'if' condition but overlooked in field logging.
e9f9cc6 (Add & logic - new token CredReq.spec.cred* fields, 2023-06-27, openshift#542) created the STS-specfic branch here, and shifted the pre-existing hasRecentlySynced check to the non-STS branch. But that's leading to hot update loops, as the reconciler bangs away bumping status.lastSyncTimestamp (which we've had since the initial cloud-cred operator pull request [1]). For example in this recent CI run [2]: $ curl -s https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/test-platform-results/pr-logs/pull/openshift_cloud-credential-operator/789/pull-ci-openshift-cloud-credential-operator-master-e2e-aws-manual-oidc/1864768460005838848/artifacts/e2e-aws-manual-oidc/gather-extra/artifacts/pods/openshift-cloud-credential-operator_cloud-credential-operator-64bcbb54cc-vzs5s_cloud-credential-operator.log | grep -1 ingress | tail -n13 -- time="2024-12-05T21:44:49Z" level=info msg="status has changed, updating" controller=credreq cr=openshift-cloud-credential-operator/aws-ebs-csi-driver-operator secret=openshift-cluster-csi-drivers/ebs-cloud-credentials time="2024-12-05T21:44:49Z" level=info msg="operator detects timed access token enabled cluster (STS, Workload Identity, etc.)" controller=credreq cr=openshift-cloud-credential-operator/openshift-ingress time="2024-12-05T21:44:49Z" level=info msg="syncing credentials request" controller=credreq cr=openshift-cloud-credential-operator/openshift-ingress time="2024-12-05T21:44:49Z" level=info msg="status has changed, updating" controller=credreq cr=openshift-cloud-credential-operator/openshift-ingress secret=openshift-ingress-operator/cloud-credentials time="2024-12-05T21:44:49Z" level=info msg="reconciling clusteroperator status" -- time="2024-12-05T21:44:51Z" level=info msg="reconciling clusteroperator status" time="2024-12-05T21:44:52Z" level=info msg="operator detects timed access token enabled cluster (STS, Workload Identity, etc.)" controller=credreq cr=openshift-cloud-credential-operator/openshift-ingress time="2024-12-05T21:44:52Z" level=info msg="syncing credentials request" controller=credreq cr=openshift-cloud-credential-operator/openshift-ingress time="2024-12-05T21:44:52Z" level=info msg="reconciling clusteroperator status" time="2024-12-05T21:44:52Z" level=info msg="status has changed, updating" controller=credreq cr=openshift-cloud-credential-operator/openshift-ingress secret=openshift-ingress-operator/cloud-credentials time="2024-12-05T21:44:52Z" level=info msg="operator detects timed access token enabled cluster (STS, Workload Identity, etc.)" controller=credreq cr=openshift-cloud-credential-operator/openshift-machine-api-aws With this commit, we'll perform the same back-off in the STS case too, to avoid flooding the Kube API server with status.lastSyncTimestamp updates. [1]: openshift@a6d385a#diff-69794ca0db76a04660e3355ba9b824f34e7af1030d0a8114903d11847201c410R46 [2]: https://prow.ci.openshift.org/view/gs/test-platform-results/pr-logs/pull/openshift_cloud-credential-operator/789/pull-ci-openshift-cloud-credential-operator-master-e2e-aws-manual-oidc/1864768460005838848
CCO-366 Add ability to detect AWS STS and behave accordingly
Adds capability to detect AWS STS enabled clusters and behave a bit like Mint mode when detected.
There is now a small unit test suite. And a single e2e test that uses an STS workflow.
Notes:
must-gathermake me unsure:Ah, it seems to be working, see inline comment with log lines showing Secret creation/deletion on CredentialsRequest
addition in the e2e test.