-
Notifications
You must be signed in to change notification settings - Fork 160
OCPBUGS-15365: *: use a filtered LIST + WATCH on Secrets for AWS STS #545
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
OCPBUGS-15365: *: use a filtered LIST + WATCH on Secrets for AWS STS #545
Conversation
|
Would like some guidance for what kinds of tests we want to add to this :) |
pkg/aws/actuator/actuator.go
Outdated
|
|
||
| // userPolicy param empty because in passthrough mode this doesn't really have any meaning | ||
| err = a.syncAccessKeySecret(cr, accessKeyID, secretAccessKey, existingSecret, "", logger) | ||
| err = a.syncAccessKeySecret(ctx, cr, accessKeyID, secretAccessKey, existingSecret, "", logger) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These changes were not strictly necessary but it seemed like latent tech debt and since I was touching the client calls, I figured we could fix them.
pkg/operator/controller.go
Outdated
|
|
||
| // AddToManager adds all Controllers to the Manager | ||
| func AddToManager(m manager.Manager, explicitKubeconfig string) error { | ||
| rules := clientcmd.NewDefaultClientConfigLoadingRules() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Controller-runtime does not have support for server-side-apply yet, and by far the best way for us to add labels to existing objects surgically, and without conflict problems, is using server-side-apply. I create a client-go client and thread it through in order to allow that.
| }) | ||
|
|
||
| if _, err := r.mutatingClient.Secrets(secret.Namespace).Apply(ctx, applyConfig, metav1.ApplyOptions{ | ||
| Force: true, // we're the authoritative owner of this field and should not allow anyone to stomp it |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@deads2k would love a gut-check that this does what I think it does, given that the applyConfig is set up the way it is - is there a single owner for all of metadata.labels?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Labels are divided by name, so you only own the labels that you're trying to set, not all the labels available. At least that's my memory of it. Is the reality different?
|
/hold It is not clear to me if labeling the root credentials in |
Codecov Report
Additional details and impacted files@@ Coverage Diff @@
## master #545 +/- ##
==========================================
+ Coverage 47.84% 48.43% +0.59%
==========================================
Files 93 93
Lines 11488 11958 +470
==========================================
+ Hits 5496 5792 +296
- Misses 5359 5538 +179
+ Partials 633 628 -5
|
|
/retitle PORTENABLE-526: *: label the secrets we interact with |
|
After some chats with @abutcher and @deads2k :
Since STS feature flag does not exist before that PR merges, we don't have to worry about pulling in old data / upgrades / etc. |
|
sgtm but why can't we mutate the kube-system cred (just to label it)? because it may be user provided/user-managed? alternatively, if it's not associated w/ credreq, why do we need to label it? |
|
It's user-provided, so David says we can't touch it. And we would want to label it because controller-runtime makes it exceedingly difficult to have a consistent experience when every object you want to GET is not in your cache, and the factoring today would require us to have two caches |
|
@stevekuznetsov: This pull request references Jira Issue OCPBUGS-15365, which is invalid:
Comment The bug has been updated to refer to the pull request using the external bug tracker. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
|
/jira refresh |
|
@stevekuznetsov: This pull request references Jira Issue OCPBUGS-15365, which is valid. The bug has been moved to the POST state. 3 validation(s) were run on this bug
Requesting review from QA contact: DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
|
/jira refresh |
|
@stevekuznetsov: This pull request references Jira Issue OCPBUGS-15365, which is valid. 3 validation(s) were run on this bug
Requesting review from QA contact: DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
|
/test e2e-aws-ovn |
e5706ed to
ae2828f
Compare
|
Was sending a label selector instead of a field selector, oops. Should be good now. |
|
/test e2e-aws-manual-oidc |
Signed-off-by: Steve Kuznetsov <skuznets@redhat.com>
1f3d8b4 to
caf857f
Compare
|
/test e2e-aws-manual-oidc |
Signed-off-by: Steve Kuznetsov <skuznets@redhat.com>
8336770 to
c0b4a13
Compare
|
/test e2e-aws-manual-oidc |
|
/lgtm |
|
/hold cancel |
|
@abutcher will need an approval as well! |
|
/approve |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: abutcher, stevekuznetsov The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
@stevekuznetsov: all tests passed! Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
|
@stevekuznetsov: Jira Issue OCPBUGS-15365: All pull requests linked via external trackers have merged:
Jira Issue OCPBUGS-15365 has been moved to the MODIFIED state. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
Even when awsSTSIAMRoleARN is empty, we want the label so that pkg/cmd/operator's NewOperator's filteredWatchPossible label-selector can find these Secrets. Then the controller will notice if they're deleted (so it can update the CredentialsRequest status to point that out) or when they haven't been changed (so it can avoid "I can't find the Secret!" overly-frequent bumping in the hasRecentlySynced calculation, because it thinks crSecretExists=false). And we want the annotation, so it's clear why the Secret needs to exist (because of the annotation-referenced CredentialsRequest). The risk here is that we might end up contending over label/annotation presence with the external controller that is populating the 'credentials' data inside the Secret. But the alternative of an unfiltered Secret informer in the client is still too resource-intensive, as described in the filteredWatchPossible comment and the a58a09c (*: use a filtered LIST + WATCH on Secrets for AWS STS, 2023-06-29, openshift#545) commit that added the filteredWatchPossible logic. Additional labels and annotations are properties that external controllers should be able to accept. For example, [1] has ArgoCD discussing: apiVersion: argoproj.io/v1alpha1 kind: ApplicationSet spec: # (...) preservedFields: annotations: ["my-custom-annotation"] labels: ["my-custom-label"] to ignore annotations and labels injected by external-to-ArgoCD controllers, which is what the CCO-specific annotation/label I'm touching now would be. [1]: https://argo-cd.readthedocs.io/en/release-2.13/operator-manual/applicationset/Controlling-Resource-Modification/#preserving-changes-made-to-an-applications-annotations-and-labels
Even when awsSTSIAMRoleARN is empty, we want the label so that pkg/cmd/operator's NewOperator's filteredWatchPossible label-selector can find these Secrets. Then the controller will notice if they're deleted (so it can update the CredentialsRequest status to point that out) or when they haven't been changed (so it can avoid "I can't find the Secret!" overly-frequent bumping in the hasRecentlySynced calculation, because it thinks crSecretExists=false). And we want the annotation, so it's clear why the Secret needs to exist (because of the annotation-referenced CredentialsRequest). The risk here is that we might end up contending over label/annotation presence with the external controller that is populating the 'credentials' data inside the Secret. But the alternative of an unfiltered Secret informer in the client is still too resource-intensive, as described in the filteredWatchPossible comment and the a58a09c (*: use a filtered LIST + WATCH on Secrets for AWS STS, 2023-06-29, openshift#545) commit that added the filteredWatchPossible logic. Additional labels and annotations are properties that external controllers should be able to accept. For example, [1] has ArgoCD discussing: apiVersion: argoproj.io/v1alpha1 kind: ApplicationSet spec: # (...) preservedFields: annotations: ["my-custom-annotation"] labels: ["my-custom-label"] to ignore annotations and labels injected by external-to-ArgoCD controllers, which is what the CCO-specific annotation/label I'm touching now would be. Moving to 48d6ccc (pkg/operator: correctly fetch CA for AWS minter, 2023-07-19, openshift#575)'s LiveClient avoids confusing CreateOrPatch. With the cached .Client, it would have: 1. Failed to retrive an unlabeled Secret, because the externally-created Secret lacked the label that the Client's filteredWatchPossible informer is filtered on. 2. Thought that it should Create a new Secret. 3. Had that Create attempt fail on 'secrets "$NAME" already exists'. With the LiveClient, that becomes: 1. Successfully retrived an unlabeled Secret, with the uncached reader. 2. Thought that it should Patch the Secret. 3. Successfully Patch the Secret. 4. Once the Patch sets the label, future attempts to Get the Secret through the filtered informer cache will succeed. [1]: https://argo-cd.readthedocs.io/en/release-2.13/operator-manual/applicationset/Controlling-Resource-Modification/#preserving-changes-made-to-an-applications-annotations-and-labels
Even when awsSTSIAMRoleARN is empty, we want the label so that pkg/cmd/operator's NewOperator's filteredWatchPossible label-selector can find these Secrets. Then the controller will notice if they're deleted (so it can update the CredentialsRequest status to point that out) or when they haven't been changed (so it can avoid "I can't find the Secret!" overly-frequent bumping in the hasRecentlySynced calculation, because it thinks crSecretExists=false). And we want the annotation, so it's clear why the Secret needs to exist (because of the annotation-referenced CredentialsRequest). The risk here is that we might end up contending over label/annotation presence with the external controller that is populating the 'credentials' data inside the Secret. But the alternative of an unfiltered Secret informer in the client is still too resource-intensive, as described in the filteredWatchPossible comment and the a58a09c (*: use a filtered LIST + WATCH on Secrets for AWS STS, 2023-06-29, openshift#545) commit that added the filteredWatchPossible logic. Additional labels and annotations are properties that external controllers should be able to accept. For example, [1] has ArgoCD discussing: apiVersion: argoproj.io/v1alpha1 kind: ApplicationSet spec: # (...) preservedFields: annotations: ["my-custom-annotation"] labels: ["my-custom-label"] to ignore annotations and labels injected by external-to-ArgoCD controllers, which is what the CCO-specific annotation/label I'm touching now would be. Moving to 48d6ccc (pkg/operator: correctly fetch CA for AWS minter, 2023-07-19, openshift#575)'s LiveClient avoids confusing CreateOrPatch. With the cached .Client, it would have: 1. Failed to retrive an unlabeled Secret, because the externally-created Secret lacked the label that the Client's filteredWatchPossible informer is filtered on. 2. Thought that it should Create a new Secret. 3. Had that Create attempt fail on 'secrets "$NAME" already exists'. With the LiveClient, that becomes: 1. Successfully retrived an unlabeled Secret, with the uncached reader. 2. Thought that it should Patch the Secret. 3. Successfully Patch the Secret. 4. Once the Patch sets the label, future attempts to Get the Secret through the filtered informer cache will succeed. [1]: https://argo-cd.readthedocs.io/en/release-2.13/operator-manual/applicationset/Controlling-Resource-Modification/#preserving-changes-made-to-an-applications-annotations-and-labels
Even when awsSTSIAMRoleARN is empty, we want the label so that pkg/cmd/operator's NewOperator's filteredWatchPossible label-selector can find these Secrets. Then the controller will notice if they're deleted (so it can update the CredentialsRequest status to point that out) or when they haven't been changed (so it can avoid "I can't find the Secret!" overly-frequent bumping in the hasRecentlySynced calculation, because it thinks crSecretExists=false). And we want the annotation, so it's clear why the Secret needs to exist (because of the annotation-referenced CredentialsRequest). The risk here is that we might end up contending over label/annotation presence with the external controller that is populating the 'credentials' data inside the Secret. But the alternative of an unfiltered Secret informer in the client is still too resource-intensive, as described in the filteredWatchPossible comment and the a58a09c (*: use a filtered LIST + WATCH on Secrets for AWS STS, 2023-06-29, openshift#545) commit that added the filteredWatchPossible logic. Additional labels and annotations are properties that external controllers should be able to accept. For example, [1] has ArgoCD discussing: apiVersion: argoproj.io/v1alpha1 kind: ApplicationSet spec: # (...) preservedFields: annotations: ["my-custom-annotation"] labels: ["my-custom-label"] to ignore annotations and labels injected by external-to-ArgoCD controllers, which is what the CCO-specific annotation/label I'm touching now would be. Moving to 48d6ccc (pkg/operator: correctly fetch CA for AWS minter, 2023-07-19, openshift#575)'s LiveClient avoids confusing CreateOrPatch. With the cached .Client, it would have: 1. Failed to retrive an unlabeled Secret, because the externally-created Secret lacked the label that the Client's filteredWatchPossible informer is filtered on. 2. Thought that it should Create a new Secret. 3. Had that Create attempt fail on 'secrets "$NAME" already exists'. With the LiveClient, that becomes: 1. Successfully retrived an unlabeled Secret, with the uncached reader. 2. Thought that it should Patch the Secret. 3. Successfully Patch the Secret. 4. Once the Patch sets the label, future attempts to Get the Secret through the filtered informer cache will succeed. [1]: https://argo-cd.readthedocs.io/en/release-2.13/operator-manual/applicationset/Controlling-Resource-Modification/#preserving-changes-made-to-an-applications-annotations-and-labels
…ecrets OCPBUGS-15365: *: use a filtered LIST + WATCH on Secrets for AWS STS
The status quo for this controller is to LIST + WATCH all Secrets on the cluster. This consumes
more resources than necessary on clusters where users put other data in Secrets themselves, as we
hold that data in our cache and never do anything with it. The reconcilers mainly need to react to
changes in Secrets created for CredentialRequests, which they control and can label, allowing us
to filter the LIST + WATCH down and hold the minimal set of data in memory. However, two caveats:
and we need to watch those, but we can't label them
Secrets labelled
We could solve the second issue with an interim release of this controller that labels all previous
Secrets, but does not restrict the watch stream.
Due to the way that controller-runtime closes over the client/cache concepts, it's difficult to
solve the first issue, though, since we'd need two sets of clients and caches, both for Secrets,
and ensure that we use one for client access to Secrets we're creating or mutating and the other
when we're interacting with admin credentials. Not impossible to do, but tricky to implement and
complex.
Until we undertake that effort, we apply a simplification to the space: only when AWS STS mode is
enabled, we will try to filter the LIST + WATCH. This mode is brand new, so we can be reasonably
sure that there are no previous secrets on the cluster, and, we make the filtering best-effort
in order to check if that assumption held. Second, AWS STS mode only runs in clusters without
admin credentials, so if we apply the filter, we should not see failures downstream from clients
that hope to see those objects but can't.