Skip to content

Conversation

@bentito
Copy link
Contributor

@bentito bentito commented May 19, 2023

Adds capability to detect AWS STS enabled clusters and behave a bit like Mint mode when detected.

There is now a small unit test suite. And a single e2e test that uses an STS workflow.

Notes:

  • Will need changes to allow for move of the TAT data to AWSProviderSpec;
  • Is the e2e really working? Lack of "STS detection" related logging and lack of created Secret in the must-gather make me unsure:
$  yq '.items[].metadata.name' /tmp/artifacts/registry*/namespaces/default/core/secrets.yaml
builder-dockercfg-l6mps
builder-token-6skc2
default-dockercfg-28hgj
default-token-pr62d
deployer-dockercfg-p248v
deployer-token-djlzm

Ah, it seems to be working, see inline comment with log lines showing Secret creation/deletion on CredentialsRequest
addition in the e2e test.

@openshift-ci openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label May 19, 2023
@openshift-ci
Copy link
Contributor

openshift-ci bot commented May 19, 2023

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@codecov
Copy link

codecov bot commented May 19, 2023

Codecov Report

Merging #542 (a3b71ca) into master (ee67cc6) will decrease coverage by 0.23%.
The diff coverage is 44.72%.

Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #542      +/-   ##
==========================================
- Coverage   48.06%   47.84%   -0.23%     
==========================================
  Files          93       93              
  Lines       11305    11488     +183     
==========================================
+ Hits         5434     5496      +62     
- Misses       5251     5359     +108     
- Partials      620      633      +13     
Impacted Files Coverage Δ
pkg/assets/bootstrap/bindata.go 23.85% <ø> (ø)
pkg/kubevirt/actuator.go 67.60% <0.00%> (-0.97%) ⬇️
pkg/openstack/actuator.go 0.00% <0.00%> (ø)
...awspodidentity/awspodidentitywebhook_controller.go 32.21% <0.00%> (-0.22%) ⬇️
pkg/operator/cleanup/cleanup_controller.go 52.32% <0.00%> (ø)
...g/operator/credentialsrequest/actuator/actuator.go 39.13% <0.00%> (-3.73%) ⬇️
pkg/operator/loglevel/controller.go 49.15% <0.00%> (ø)
pkg/operator/platform/platform.go 11.11% <0.00%> (-19.66%) ⬇️
pkg/operator/secretannotator/aws/reconciler.go 45.29% <0.00%> (ø)
pkg/operator/secretannotator/azure/reconciler.go 35.52% <0.00%> (ø)
... and 12 more

ErrReason: minterv1.InsufficientCloudCredentials,
Message: msg,
stsDetected, err := utils.IsTimedTokenCluster(a.Client, logger)
if stsDetected {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After this change e2e started working. I am kind of unsure about it though.

Copy link
Contributor Author

@bentito bentito Jun 12, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, here is logging from a Clusterbot cluster launched with workflow-launch openshift-e2e-aws-manual-oidc-sts https://github.com/openshift/cloud-credential-operator/pull/542 and then having run make test-e2e-sts manually:

time="2023-06-12T13:40:45Z" level=info msg="reconciling clusteroperator status"
time="2023-06-12T13:40:45Z" level=info msg="operator set to disabled / manual mode" controller=credreq cr=openshift-cloud-credential-operator/test-sts-creds-req
time="2023-06-12T13:40:45Z" level=info msg="syncing credentials request" controller=credreq cr=openshift-cloud-credential-operator/test-sts-creds-req
time="2023-06-12T13:40:45Z" level=info msg="adding finalizer: cloudcredential.openshift.io/deprovision" controller=credreq cr=openshift-cloud-credential-operator/test-sts-creds-req secret=default/test-sts-secret
time="2023-06-12T13:40:45Z" level=info msg="operator set to disabled / manual mode" controller=credreq cr=openshift-cloud-credential-operator/test-sts-creds-req
time="2023-06-12T13:40:45Z" level=info msg="syncing credentials request" controller=credreq cr=openshift-cloud-credential-operator/test-sts-creds-req
time="2023-06-12T13:40:45Z" level=info msg="timed token access cluster detected: true, so not trying to provision with root secret" controller=credreq cr=openshift-cloud-credential-operator/test-sts-creds-req secret=default/test-sts-secret
time="2023-06-12T13:40:45Z" level=info msg="actuator detected STS enabled cluster making secret" actuator=aws cr=openshift-cloud-credential-operator/test-sts-creds-req
time="2023-06-12T13:40:45Z" level=info msg="creating secret" actuator=aws cr=openshift-cloud-credential-operator/test-sts-creds-req
time="2023-06-12T13:40:45Z" level=info msg="clusteroperator status updated" controller=status
time="2023-06-12T13:40:45Z" level=info msg="reconciling clusteroperator status"
time="2023-06-12T13:40:45Z" level=info msg="secret created successfully" actuator=aws cr=openshift-cloud-credential-operator/test-sts-creds-req
time="2023-06-12T13:40:56Z" level=info msg="calculating metrics for all CredentialsRequests" controller=metrics
time="2023-06-12T13:40:56Z" level=info msg="reconcile complete" controller=metrics elapsed=4.220597ms
time="2023-06-12T13:41:56Z" level=info msg="reconciling clusteroperator status"
time="2023-06-12T13:42:56Z" level=info msg="calculating metrics for all CredentialsRequests" controller=metrics
time="2023-06-12T13:42:56Z" level=info msg="reconcile complete" controller=metrics elapsed=4.133929ms
time="2023-06-12T13:44:45Z" level=info msg="reconciling clusteroperator status"
time="2023-06-12T13:44:45Z" level=info msg="operator set to disabled / manual mode" controller=credreq cr=openshift-cloud-credential-operator/test-sts-creds-req
time="2023-06-12T13:44:45Z" level=info msg="syncing credentials request" controller=credreq cr=openshift-cloud-credential-operator/test-sts-creds-req
time="2023-06-12T13:44:45Z" level=warning msg="no user name set on credentials being deleted, most likely were never provisioned or using passthrough creds" actuator=aws cr=openshift-cloud-credential-operator/test-sts-creds-req
time="2023-06-12T13:44:45Z" level=info msg="target secret deleted successfully" controller=credreq cr=openshift-cloud-credential-operator/test-sts-creds-req secret=default/test-sts-secret targetSecret=default/test-sts-secret
time="2023-06-12T13:44:45Z" level=info msg="actuator deletion complete, removing finalizer" controller=credreq cr=openshift-cloud-credential-operator/test-sts-creds-req secret=default/test-sts-secret
time="2023-06-12T13:44:45Z" level=info msg="reconciling clusteroperator status"
time="2023-06-12T13:44:45Z" level=info msg="operator set to disabled / manual mode" controller=credreq cr=openshift-cloud-credential-operator/test-sts-creds-req
time="2023-06-12T13:44:45Z" level=info msg="syncing credentials request" controller=credreq cr=openshift-cloud-credential-operator/test-sts-creds-req
time="2023-06-12T13:44:45Z" level=info msg="clusteroperator status updated" controller=status
time="2023-06-12T13:44:56Z" level=info msg="calculating metrics for all CredentialsRequests" controller=metrics
time="2023-06-12T13:44:56Z" level=info msg="reconcile complete" controller=metrics elapsed=3.554297ms
time="2023-06-12T13:46:56Z" level=info msg="calculating metrics for all CredentialsRequests" controller=metrics
time="2023-06-12T13:46:56Z" level=info msg="reconcile complete" controller=metrics elapsed=4.054413ms
time="2023-06-12T13:46:56Z" level=info msg="reconciling clusteroperator status"

So I feel more confident the test is really passing.


func onAdd(t *testing.T, cfg envconf.Config, ctx context.Context) func(obj interface{}) {
time.Sleep(2 * time.Minute)
if err := cfg.Client().Resources().Get(ctx, secretName, namespace, secret); err != nil {
Copy link
Contributor Author

@bentito bentito Jun 11, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is because of unsureness about Secret really existing here, despite this being a callback from the watch on the Secret.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The obj interface{} passed to this func is the object that was just added - perhaps use that? 2min wait should not be needed here

@bentito bentito changed the title WIP: CCO-366 Add ability to detect AWS STS and behave accordingly CCO-366 Add ability to detect AWS STS and behave accordingly Jun 11, 2023
@bentito bentito marked this pull request as ready for review June 12, 2023 14:01
@openshift-ci openshift-ci bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jun 12, 2023
@bentito
Copy link
Contributor Author

bentito commented Jun 12, 2023

/assign @abutcher

@openshift-ci openshift-ci bot requested review from dlom and lleshchi June 12, 2023 14:07
@openshift-merge-robot openshift-merge-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jun 23, 2023
Copy link
Member

@abutcher abutcher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couple small things. Wondering if some of the new Info logs should be moved to Debug based on their potential noise level but those can be moved or followed up on.

@openshift-merge-robot openshift-merge-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jun 23, 2023
}

func computeClusterOperatorVersions() string {
currentVersion := os.Getenv("RELEASE_VERSION")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if this is not set?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You don't get to use the feature gate? As featuregates.NewFeatureGateAccess likely doesn't do much for nil as the desired version.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A string cannot be nil in golang, it will instead be an empty string. So, in such a case it will be looking for the cluster version to be an empty string. In such a case, I expect a lot of other things will also be broken. The good news is RELEASE_VERSION is defined in the 03-deployment.yaml and should always be non-empty.

logger.WithError(err).Error("error loading CCO configuration to determine mode")
return false, err
}
if credentialsMode != "Manual" {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

super-nit: is manual not a constant somewhere?

Copy link
Contributor Author

@bentito bentito Jun 26, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yikes! investigating I find that mode referred to as both "manual" and "Manual" in yamls, docs and code. Yes, there is a constant, that has it as "manual" (1b384cc#diff-f6d93feef6d59ca8ced14b54fe0058d41f14de44450baf6b723cc1c2ec280c54R25) but it doesn't seem to be used everywhere now....

So I'm a little leery to start using the constant version here until CCO team can review a bit more. I'd say it can wait until another day to get better?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

GitOperatorConfiguration() returns an operatorv1.CloudCredentialsMode which has a constant CloudCredentialsModeManual = "Manual".


func onAdd(t *testing.T, cfg envconf.Config, ctx context.Context) func(obj interface{}) {
time.Sleep(2 * time.Minute)
if err := cfg.Client().Resources().Get(ctx, secretName, namespace, secret); err != nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The obj interface{} passed to this func is the object that was just added - perhaps use that? 2min wait should not be needed here

@bentito
Copy link
Contributor Author

bentito commented Jun 23, 2023

Couple small things. Wondering if some of the new Info logs should be moved to Debug based on their potential noise level but those can be moved or followed up on.

Changed all 3 instances to Debug in d6bb916

return reconcile.Result{}, err
if !stsDetected {
logger.Infof("operator detects STS enabled cluster")
return reconcile.Result{}, err
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This return short circuits continuing through the controller to create the secret for the CredentialsRequest.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Worth adding a test to the credentials request controller tests to ensure we go through the motions.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's the end of the day and it's been a long one for me, again, I'm not seeing a good way to create a test on the controller with STS enabled. There might be one, I'm just kind of lost at the moment. Maybe we should add a card to Jira for "Streamline Credentials Request Controller Unit Tests and add STS enabled and feature gate flags"?

// The presence of an STSRoleARN within the AWSProviderSpec initiates creation of a secret containing IAM
// Role details necessary for assuming the IAM Role via Amazon's Secure Token Service.
// +optional
// +kubebuilder:validation:Pattern:="^arn:(?P<Partition>[^:\n]*):(?P<Service>[^:\n]*):(?P<Region>[^:\n]*):(?P<AccountID>[^:\n]*):(?P<Ignore>(?P<ResourceType>[^:\/\n]*)[:\/])?(?P<Resource>.*)$"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The role ARN validation is causing CRD generation to fail which for some reason fails silently when ran via make update. The silently failing part is possibly related to the openshift/api workaround.

'_output/tools/bin/controller-gen-v0.2.5' schemapatch:manifests="./manifests" paths="./pkg/apis/cloudcredential/v1" 'output:dir="./manifests"'
/home/abutcher/go/src/github.com/openshift/cloud-credential-operator/pkg/apis/cloudcredential/v1/types_aws.go:36:2: invalid char escape (at <input>:1:1)
/home/abutcher/go/src/github.com/openshift/cloud-credential-operator/pkg/apis/cloudcredential/v1/types_aws.go:36:2: invalid char escape (at <input>:1:1)
/home/abutcher/go/src/github.com/openshift/cloud-credential-operator/pkg/apis/cloudcredential/v1/types_aws.go:36:2: unable to parse string: invalid syntax (at <input>:1:1)
/home/abutcher/go/src/github.com/openshift/cloud-credential-operator/pkg/apis/cloudcredential/v1/types_aws.go:36:2: invalid char escape (at <input>:1:1)
/home/abutcher/go/src/github.com/openshift/cloud-credential-operator/pkg/apis/cloudcredential/v1/types_aws.go:36:2: invalid char escape (at <input>:1:1)
/home/abutcher/go/src/github.com/openshift/cloud-credential-operator/pkg/apis/cloudcredential/v1/types_aws.go:36:2: unable to parse string: invalid syntax (at <input>:1:1)
Error: not all generators ran successfully

@bentito

This comment was marked as duplicate.

stevekuznetsov and others added 4 commits June 27, 2023 12:59
…s for"

This reverts commit 6561e65.

Signed-off-by: Steve Kuznetsov <skuznets@redhat.com>
```
$ make update
```

Signed-off-by: Steve Kuznetsov <skuznets@redhat.com>
For whatever reason, when the upstream target calls controller-gen, the
tool reads and writes *every* YAML file in the `./manifests` directory,
even files that the tool is not writing to during the generator step. We
copy our `00-config-custresdef.yaml` from the OpenShift API repo, which
uses a bsepoke generator with different formatting on the output.
Therefore, when `controller-gen` is run on the YAML we copy from the API
repo, it re-formats it and causes the `git diff` check to fail. In this
override, we simply copy back in the YAMLs from the API repo after
generating, overwriting anything `controller-gen` may have done, so we
don't have this spurious failure.

Signed-off-by: Steve Kuznetsov <skuznets@redhat.com>
@bentito
Copy link
Contributor Author

bentito commented Jun 27, 2023

In reply to #542 (comment)
There is another constants file floating around this codebase and it's confusing:

pkg/operator/constants/constants.go has it as "manual"

@bentito
Copy link
Contributor Author

bentito commented Jun 27, 2023

/test e2e-aws-manual-oidc

@jstuever
Copy link
Contributor

Responding to #542 (comment)

I agree, the constants file is confusing. I had to dig in to understand how it is used. It looks like they are used solely for keys (string) in a mapping in the metrics operator. The mapping appears to be more of a status use case than a desired configuration. There also appear to be additional status options compared to CloudCredentialsMode type definition. Why the statuses were selected to be lowercase, I do not know.

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Jun 27, 2023

@bentito: all tests passed!

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

2uasimojo added a commit to 2uasimojo/cloud-credential-operator that referenced this pull request Jun 27, 2023
We spent some time in openshift#511 and again in openshift#542 trying to reconcile CRDs.
The problem is that we want to *generate* the CredentialsRequest CRD
from code in this repo, but *use* (copy) the CloudCredential CRD from
openshift/api, which we vendor. But we invoke controller-gen through
build-machinery-go, and it does unexpected things to the latter, which
breaks validation.

With this commit, we move the CredentialsRequest CRD to a `generated`
subdirectory and the CloudCredential CRD to an `imported` subdirectory.
This lets us go back to the simpler invocation of bmg's tooling while
keeping everything in the shape we expect.

One more quirk: Because build-machinery-go starts defining dependency
chains for targets like `update`, we need to start defining that
dependency chain *before* we import the bmg libs to ensure that we
copy/generate CRDs *before* we include them in bindata.
@stevekuznetsov
Copy link
Contributor

/lgtm

@abutcher gave his blessing last week, the new e2e are passing, @bentito let's do this!

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Jun 28, 2023
@2uasimojo
Copy link
Member

/approve

proxy for @abutcher

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Jun 28, 2023

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: 2uasimojo, bentito, stevekuznetsov

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jun 28, 2023
@openshift-merge-robot openshift-merge-robot merged commit 0c629a5 into openshift:master Jun 28, 2023
@stevekuznetsov
Copy link
Contributor

/cherry-pick release-4.13

@openshift-cherrypick-robot

@stevekuznetsov: #542 failed to apply on top of branch "release-4.13":

Applying: Add & logic - new token CredReq.spec.cred* fields
Applying: Add a timed access token detection capability
Applying: Add test suite and util funcs for detect STS
Applying: Add test - detect STS & new token fields present
Applying: Add e2e AWS STS Secret creation test
Using index info to reconstruct a base tree...
M	Makefile
A	test/e2e/aws/sts/actutator_e2e_test.go
Falling back to patching base and 3-way merge...
CONFLICT (modify/delete): test/e2e/aws/sts/actutator_e2e_test.go deleted in HEAD and modified in Add e2e AWS STS Secret creation test. Version Add e2e AWS STS Secret creation test of test/e2e/aws/sts/actutator_e2e_test.go left in tree.
Auto-merging Makefile
CONFLICT (content): Merge conflict in Makefile
error: Failed to merge in the changes.
hint: Use 'git am --show-current-patch=diff' to see the failed patch
Patch failed at 0005 Add e2e AWS STS Secret creation test
When you have resolved this problem, run "git am --continue".
If you prefer to skip this patch, run "git am --skip" instead.
To restore the original branch and stop patching, run "git am --abort".

Details

In response to this:

/cherry-pick release-4.13

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

wking added a commit to wking/cloud-credential-operator that referenced this pull request Dec 20, 2024
e9f9cc6 (Add & logic - new token CredReq.spec.cred* fields,
2023-06-27, openshift#542) created the STS-specfic branch here, and shifted the
pre-existing hasRecentlySynced check to the non-STS branch.  But
that's leading to hot update loops, as the reconciler bangs away
bumping status.lastSyncTimestamp (which we've had since the initial
cloud-cred operator pull request [1]).  For example in this recent CI
run [2]:

  $ curl -s https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/test-platform-results/pr-logs/pull/openshift_cloud-credential-operator/789/pull-ci-openshift-cloud-credential-operator-master-e2e-aws-manual-oidc/1864768460005838848/artifacts/e2e-aws-manual-oidc/gather-extra/artifacts/pods/openshift-cloud-credential-operator_cloud-credential-operator-64bcbb54cc-vzs5s_cloud-credential-operator.log | grep -1 ingress | tail -n13
  --
  time="2024-12-05T21:44:49Z" level=info msg="status has changed, updating" controller=credreq cr=openshift-cloud-credential-operator/aws-ebs-csi-driver-operator secret=openshift-cluster-csi-drivers/ebs-cloud-credentials
  time="2024-12-05T21:44:49Z" level=info msg="operator detects timed access token enabled cluster (STS, Workload Identity, etc.)" controller=credreq cr=openshift-cloud-credential-operator/openshift-ingress
  time="2024-12-05T21:44:49Z" level=info msg="syncing credentials request" controller=credreq cr=openshift-cloud-credential-operator/openshift-ingress
  time="2024-12-05T21:44:49Z" level=info msg="status has changed, updating" controller=credreq cr=openshift-cloud-credential-operator/openshift-ingress secret=openshift-ingress-operator/cloud-credentials
  time="2024-12-05T21:44:49Z" level=info msg="reconciling clusteroperator status"
  --
  time="2024-12-05T21:44:51Z" level=info msg="reconciling clusteroperator status"
  time="2024-12-05T21:44:52Z" level=info msg="operator detects timed access token enabled cluster (STS, Workload Identity, etc.)" controller=credreq cr=openshift-cloud-credential-operator/openshift-ingress
  time="2024-12-05T21:44:52Z" level=info msg="syncing credentials request" controller=credreq cr=openshift-cloud-credential-operator/openshift-ingress
  time="2024-12-05T21:44:52Z" level=info msg="reconciling clusteroperator status"
  time="2024-12-05T21:44:52Z" level=info msg="status has changed, updating" controller=credreq cr=openshift-cloud-credential-operator/openshift-ingress secret=openshift-ingress-operator/cloud-credentials
  time="2024-12-05T21:44:52Z" level=info msg="operator detects timed access token enabled cluster (STS, Workload Identity, etc.)" controller=credreq cr=openshift-cloud-credential-operator/openshift-machine-api-aws

With this commit, we'll perform the same back-off in the STS case too,
to avoid flooding the Kube API server with status.lastSyncTimestamp
updates.

[1]: openshift@a6d385a#diff-69794ca0db76a04660e3355ba9b824f34e7af1030d0a8114903d11847201c410R46
[2]: https://prow.ci.openshift.org/view/gs/test-platform-results/pr-logs/pull/openshift_cloud-credential-operator/789/pull-ci-openshift-cloud-credential-operator-master-e2e-aws-manual-oidc/1864768460005838848
wking added a commit to wking/cloud-credential-operator that referenced this pull request Dec 20, 2024
e9f9cc6 (Add & logic - new token CredReq.spec.cred* fields,
2023-06-27, openshift#542) created the STS-specfic branch here, and shifted the
pre-existing hasRecentlySynced check to the non-STS branch.  But
that's leading to hot update loops, as the reconciler bangs away
bumping status.lastSyncTimestamp (which we've had since the initial
cloud-cred operator pull request [1]).  For example in this recent CI
run [2]:

  $ curl -s https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/test-platform-results/pr-logs/pull/openshift_cloud-credential-operator/789/pull-ci-openshift-cloud-credential-operator-master-e2e-aws-manual-oidc/1864768460005838848/artifacts/e2e-aws-manual-oidc/gather-extra/artifacts/pods/openshift-cloud-credential-operator_cloud-credential-operator-64bcbb54cc-vzs5s_cloud-credential-operator.log | grep -1 ingress | tail -n13
  --
  time="2024-12-05T21:44:49Z" level=info msg="status has changed, updating" controller=credreq cr=openshift-cloud-credential-operator/aws-ebs-csi-driver-operator secret=openshift-cluster-csi-drivers/ebs-cloud-credentials
  time="2024-12-05T21:44:49Z" level=info msg="operator detects timed access token enabled cluster (STS, Workload Identity, etc.)" controller=credreq cr=openshift-cloud-credential-operator/openshift-ingress
  time="2024-12-05T21:44:49Z" level=info msg="syncing credentials request" controller=credreq cr=openshift-cloud-credential-operator/openshift-ingress
  time="2024-12-05T21:44:49Z" level=info msg="status has changed, updating" controller=credreq cr=openshift-cloud-credential-operator/openshift-ingress secret=openshift-ingress-operator/cloud-credentials
  time="2024-12-05T21:44:49Z" level=info msg="reconciling clusteroperator status"
  --
  time="2024-12-05T21:44:51Z" level=info msg="reconciling clusteroperator status"
  time="2024-12-05T21:44:52Z" level=info msg="operator detects timed access token enabled cluster (STS, Workload Identity, etc.)" controller=credreq cr=openshift-cloud-credential-operator/openshift-ingress
  time="2024-12-05T21:44:52Z" level=info msg="syncing credentials request" controller=credreq cr=openshift-cloud-credential-operator/openshift-ingress
  time="2024-12-05T21:44:52Z" level=info msg="reconciling clusteroperator status"
  time="2024-12-05T21:44:52Z" level=info msg="status has changed, updating" controller=credreq cr=openshift-cloud-credential-operator/openshift-ingress secret=openshift-ingress-operator/cloud-credentials
  time="2024-12-05T21:44:52Z" level=info msg="operator detects timed access token enabled cluster (STS, Workload Identity, etc.)" controller=credreq cr=openshift-cloud-credential-operator/openshift-machine-api-aws

With this commit, we'll perform the same back-off in the STS case too,
to avoid flooding the Kube API server with status.lastSyncTimestamp
updates.

[1]: openshift@a6d385a#diff-69794ca0db76a04660e3355ba9b824f34e7af1030d0a8114903d11847201c410R46
[2]: https://prow.ci.openshift.org/view/gs/test-platform-results/pr-logs/pull/openshift_cloud-credential-operator/789/pull-ci-openshift-cloud-credential-operator-master-e2e-aws-manual-oidc/1864768460005838848
wking added a commit to wking/cloud-credential-operator that referenced this pull request Dec 26, 2024
e9f9cc6 (Add & logic - new token CredReq.spec.cred* fields,
2023-06-27, openshift#542) created the STS-specfic branch here, and shifted the
pre-existing hasRecentlySynced check to the non-STS branch.  But
that's leading to hot update loops, as the reconciler bangs away
bumping status.lastSyncTimestamp (which we've had since the initial
cloud-cred operator pull request [1]).  For example in this recent CI
run [2]:

  $ curl -s https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/test-platform-results/pr-logs/pull/openshift_cloud-credential-operator/789/pull-ci-openshift-cloud-credential-operator-master-e2e-aws-manual-oidc/1864768460005838848/artifacts/e2e-aws-manual-oidc/gather-extra/artifacts/pods/openshift-cloud-credential-operator_cloud-credential-operator-64bcbb54cc-vzs5s_cloud-credential-operator.log | grep -1 ingress | tail -n13
  --
  time="2024-12-05T21:44:49Z" level=info msg="status has changed, updating" controller=credreq cr=openshift-cloud-credential-operator/aws-ebs-csi-driver-operator secret=openshift-cluster-csi-drivers/ebs-cloud-credentials
  time="2024-12-05T21:44:49Z" level=info msg="operator detects timed access token enabled cluster (STS, Workload Identity, etc.)" controller=credreq cr=openshift-cloud-credential-operator/openshift-ingress
  time="2024-12-05T21:44:49Z" level=info msg="syncing credentials request" controller=credreq cr=openshift-cloud-credential-operator/openshift-ingress
  time="2024-12-05T21:44:49Z" level=info msg="status has changed, updating" controller=credreq cr=openshift-cloud-credential-operator/openshift-ingress secret=openshift-ingress-operator/cloud-credentials
  time="2024-12-05T21:44:49Z" level=info msg="reconciling clusteroperator status"
  --
  time="2024-12-05T21:44:51Z" level=info msg="reconciling clusteroperator status"
  time="2024-12-05T21:44:52Z" level=info msg="operator detects timed access token enabled cluster (STS, Workload Identity, etc.)" controller=credreq cr=openshift-cloud-credential-operator/openshift-ingress
  time="2024-12-05T21:44:52Z" level=info msg="syncing credentials request" controller=credreq cr=openshift-cloud-credential-operator/openshift-ingress
  time="2024-12-05T21:44:52Z" level=info msg="reconciling clusteroperator status"
  time="2024-12-05T21:44:52Z" level=info msg="status has changed, updating" controller=credreq cr=openshift-cloud-credential-operator/openshift-ingress secret=openshift-ingress-operator/cloud-credentials
  time="2024-12-05T21:44:52Z" level=info msg="operator detects timed access token enabled cluster (STS, Workload Identity, etc.)" controller=credreq cr=openshift-cloud-credential-operator/openshift-machine-api-aws

With this commit, we'll perform the same back-off in the STS case too,
to avoid flooding the Kube API server with status.lastSyncTimestamp
updates.

[1]: openshift@a6d385a#diff-69794ca0db76a04660e3355ba9b824f34e7af1030d0a8114903d11847201c410R46
[2]: https://prow.ci.openshift.org/view/gs/test-platform-results/pr-logs/pull/openshift_cloud-credential-operator/789/pull-ci-openshift-cloud-credential-operator-master-e2e-aws-manual-oidc/1864768460005838848
wking added a commit to wking/cloud-credential-operator that referenced this pull request Dec 26, 2024
e9f9cc6 (Add & logic - new token CredReq.spec.cred* fields,
2023-06-27, openshift#542) created the STS-specfic branch here, and shifted the
pre-existing hasRecentlySynced check to the non-STS branch.  But
that's leading to hot update loops, as the reconciler bangs away
bumping status.lastSyncTimestamp (which we've had since the initial
cloud-cred operator pull request [1]).  For example in this recent CI
run [2]:

  $ curl -s https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/test-platform-results/pr-logs/pull/openshift_cloud-credential-operator/789/pull-ci-openshift-cloud-credential-operator-master-e2e-aws-manual-oidc/1864768460005838848/artifacts/e2e-aws-manual-oidc/gather-extra/artifacts/pods/openshift-cloud-credential-operator_cloud-credential-operator-64bcbb54cc-vzs5s_cloud-credential-operator.log | grep -1 ingress | tail -n13
  --
  time="2024-12-05T21:44:49Z" level=info msg="status has changed, updating" controller=credreq cr=openshift-cloud-credential-operator/aws-ebs-csi-driver-operator secret=openshift-cluster-csi-drivers/ebs-cloud-credentials
  time="2024-12-05T21:44:49Z" level=info msg="operator detects timed access token enabled cluster (STS, Workload Identity, etc.)" controller=credreq cr=openshift-cloud-credential-operator/openshift-ingress
  time="2024-12-05T21:44:49Z" level=info msg="syncing credentials request" controller=credreq cr=openshift-cloud-credential-operator/openshift-ingress
  time="2024-12-05T21:44:49Z" level=info msg="status has changed, updating" controller=credreq cr=openshift-cloud-credential-operator/openshift-ingress secret=openshift-ingress-operator/cloud-credentials
  time="2024-12-05T21:44:49Z" level=info msg="reconciling clusteroperator status"
  --
  time="2024-12-05T21:44:51Z" level=info msg="reconciling clusteroperator status"
  time="2024-12-05T21:44:52Z" level=info msg="operator detects timed access token enabled cluster (STS, Workload Identity, etc.)" controller=credreq cr=openshift-cloud-credential-operator/openshift-ingress
  time="2024-12-05T21:44:52Z" level=info msg="syncing credentials request" controller=credreq cr=openshift-cloud-credential-operator/openshift-ingress
  time="2024-12-05T21:44:52Z" level=info msg="reconciling clusteroperator status"
  time="2024-12-05T21:44:52Z" level=info msg="status has changed, updating" controller=credreq cr=openshift-cloud-credential-operator/openshift-ingress secret=openshift-ingress-operator/cloud-credentials
  time="2024-12-05T21:44:52Z" level=info msg="operator detects timed access token enabled cluster (STS, Workload Identity, etc.)" controller=credreq cr=openshift-cloud-credential-operator/openshift-machine-api-aws

With this commit, we'll perform the same back-off in the STS case too,
to avoid flooding the Kube API server with status.lastSyncTimestamp
updates.

I'm also adding a "NOT" to the logged cloudCredsSecretUpdated field,
because the following 'if' condition is !cloudCredsSecretUpdated.  The
lack of "NOT" seems to have been accidental oversight when the logging
fields were added in 0a0d849 (Changes to address PR comments from
Steve ~3d ago, 2023-06-27, openshift#542).

[1]: openshift@a6d385a#diff-69794ca0db76a04660e3355ba9b824f34e7af1030d0a8114903d11847201c410R46
[2]: https://prow.ci.openshift.org/view/gs/test-platform-results/pr-logs/pull/openshift_cloud-credential-operator/789/pull-ci-openshift-cloud-credential-operator-master-e2e-aws-manual-oidc/1864768460005838848
wking added a commit to wking/cloud-credential-operator that referenced this pull request Jan 3, 2025
…condition

Adding a "NOT" to the logged cloudCredsSecretUpdated field, because
the following 'if' condition is !cloudCredsSecretUpdated.  The lack of
"NOT" seems to have been accidental oversight when the logging fields
were added in 0a0d849 (Changes to address PR comments from Steve
~3d ago, 2023-06-27, openshift#542).

I'm also adding isInfrastructureUpdated logging to catch up with
cea55c6 (Added implementation for AWS Day2 Tag reconcilation
Support, 2024-09-24, openshift#759), when it was added to the 'if' condition
but overlooked in field logging.
wking added a commit to wking/cloud-credential-operator that referenced this pull request Jan 3, 2025
e9f9cc6 (Add & logic - new token CredReq.spec.cred* fields,
2023-06-27, openshift#542) created the STS-specfic branch here, and shifted the
pre-existing hasRecentlySynced check to the non-STS branch.  But
that's leading to hot update loops, as the reconciler bangs away
bumping status.lastSyncTimestamp (which we've had since the initial
cloud-cred operator pull request [1]).  For example in this recent CI
run [2]:

  $ curl -s https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/test-platform-results/pr-logs/pull/openshift_cloud-credential-operator/789/pull-ci-openshift-cloud-credential-operator-master-e2e-aws-manual-oidc/1864768460005838848/artifacts/e2e-aws-manual-oidc/gather-extra/artifacts/pods/openshift-cloud-credential-operator_cloud-credential-operator-64bcbb54cc-vzs5s_cloud-credential-operator.log | grep -1 ingress | tail -n13
  --
  time="2024-12-05T21:44:49Z" level=info msg="status has changed, updating" controller=credreq cr=openshift-cloud-credential-operator/aws-ebs-csi-driver-operator secret=openshift-cluster-csi-drivers/ebs-cloud-credentials
  time="2024-12-05T21:44:49Z" level=info msg="operator detects timed access token enabled cluster (STS, Workload Identity, etc.)" controller=credreq cr=openshift-cloud-credential-operator/openshift-ingress
  time="2024-12-05T21:44:49Z" level=info msg="syncing credentials request" controller=credreq cr=openshift-cloud-credential-operator/openshift-ingress
  time="2024-12-05T21:44:49Z" level=info msg="status has changed, updating" controller=credreq cr=openshift-cloud-credential-operator/openshift-ingress secret=openshift-ingress-operator/cloud-credentials
  time="2024-12-05T21:44:49Z" level=info msg="reconciling clusteroperator status"
  --
  time="2024-12-05T21:44:51Z" level=info msg="reconciling clusteroperator status"
  time="2024-12-05T21:44:52Z" level=info msg="operator detects timed access token enabled cluster (STS, Workload Identity, etc.)" controller=credreq cr=openshift-cloud-credential-operator/openshift-ingress
  time="2024-12-05T21:44:52Z" level=info msg="syncing credentials request" controller=credreq cr=openshift-cloud-credential-operator/openshift-ingress
  time="2024-12-05T21:44:52Z" level=info msg="reconciling clusteroperator status"
  time="2024-12-05T21:44:52Z" level=info msg="status has changed, updating" controller=credreq cr=openshift-cloud-credential-operator/openshift-ingress secret=openshift-ingress-operator/cloud-credentials
  time="2024-12-05T21:44:52Z" level=info msg="operator detects timed access token enabled cluster (STS, Workload Identity, etc.)" controller=credreq cr=openshift-cloud-credential-operator/openshift-machine-api-aws

With this commit, we'll perform the same back-off in the STS case too,
to avoid flooding the Kube API server with status.lastSyncTimestamp
updates.

[1]: openshift@a6d385a#diff-69794ca0db76a04660e3355ba9b824f34e7af1030d0a8114903d11847201c410R46
[2]: https://prow.ci.openshift.org/view/gs/test-platform-results/pr-logs/pull/openshift_cloud-credential-operator/789/pull-ci-openshift-cloud-credential-operator-master-e2e-aws-manual-oidc/1864768460005838848
wking added a commit to wking/cloud-credential-operator that referenced this pull request Jan 3, 2025
…condition

Adding a "NOT" to the logged cloudCredsSecretUpdated field, because
the following 'if' condition is !cloudCredsSecretUpdated.  The lack of
"NOT" seems to have been accidental oversight when the logging fields
were added in 0a0d849 (Changes to address PR comments from Steve
~3d ago, 2023-06-27, openshift#542).

I'm also adding isInfrastructureUpdated logging to catch up with
cea55c6 (Added implementation for AWS Day2 Tag reconcilation
Support, 2024-09-24, openshift#759), when it was added to the 'if' condition
but overlooked in field logging.
wking added a commit to wking/cloud-credential-operator that referenced this pull request Jan 3, 2025
e9f9cc6 (Add & logic - new token CredReq.spec.cred* fields,
2023-06-27, openshift#542) created the STS-specfic branch here, and shifted the
pre-existing hasRecentlySynced check to the non-STS branch.  But
that's leading to hot update loops, as the reconciler bangs away
bumping status.lastSyncTimestamp (which we've had since the initial
cloud-cred operator pull request [1]).  For example in this recent CI
run [2]:

  $ curl -s https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/test-platform-results/pr-logs/pull/openshift_cloud-credential-operator/789/pull-ci-openshift-cloud-credential-operator-master-e2e-aws-manual-oidc/1864768460005838848/artifacts/e2e-aws-manual-oidc/gather-extra/artifacts/pods/openshift-cloud-credential-operator_cloud-credential-operator-64bcbb54cc-vzs5s_cloud-credential-operator.log | grep -1 ingress | tail -n13
  --
  time="2024-12-05T21:44:49Z" level=info msg="status has changed, updating" controller=credreq cr=openshift-cloud-credential-operator/aws-ebs-csi-driver-operator secret=openshift-cluster-csi-drivers/ebs-cloud-credentials
  time="2024-12-05T21:44:49Z" level=info msg="operator detects timed access token enabled cluster (STS, Workload Identity, etc.)" controller=credreq cr=openshift-cloud-credential-operator/openshift-ingress
  time="2024-12-05T21:44:49Z" level=info msg="syncing credentials request" controller=credreq cr=openshift-cloud-credential-operator/openshift-ingress
  time="2024-12-05T21:44:49Z" level=info msg="status has changed, updating" controller=credreq cr=openshift-cloud-credential-operator/openshift-ingress secret=openshift-ingress-operator/cloud-credentials
  time="2024-12-05T21:44:49Z" level=info msg="reconciling clusteroperator status"
  --
  time="2024-12-05T21:44:51Z" level=info msg="reconciling clusteroperator status"
  time="2024-12-05T21:44:52Z" level=info msg="operator detects timed access token enabled cluster (STS, Workload Identity, etc.)" controller=credreq cr=openshift-cloud-credential-operator/openshift-ingress
  time="2024-12-05T21:44:52Z" level=info msg="syncing credentials request" controller=credreq cr=openshift-cloud-credential-operator/openshift-ingress
  time="2024-12-05T21:44:52Z" level=info msg="reconciling clusteroperator status"
  time="2024-12-05T21:44:52Z" level=info msg="status has changed, updating" controller=credreq cr=openshift-cloud-credential-operator/openshift-ingress secret=openshift-ingress-operator/cloud-credentials
  time="2024-12-05T21:44:52Z" level=info msg="operator detects timed access token enabled cluster (STS, Workload Identity, etc.)" controller=credreq cr=openshift-cloud-credential-operator/openshift-machine-api-aws

With this commit, we'll perform the same back-off in the STS case too,
to avoid flooding the Kube API server with status.lastSyncTimestamp
updates.

[1]: openshift@a6d385a#diff-69794ca0db76a04660e3355ba9b824f34e7af1030d0a8114903d11847201c410R46
[2]: https://prow.ci.openshift.org/view/gs/test-platform-results/pr-logs/pull/openshift_cloud-credential-operator/789/pull-ci-openshift-cloud-credential-operator-master-e2e-aws-manual-oidc/1864768460005838848
ming1013 pushed a commit to ming1013/cloud-credential-operator that referenced this pull request Dec 15, 2025
CCO-366 Add ability to detect AWS STS and behave accordingly
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants