-
Notifications
You must be signed in to change notification settings - Fork 7k
feat: respecting rbac for resource exclusions/inclusions proposal #13479
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
d0ec891
7f7b7ae
6975db6
35c096d
0052a8a
277079b
199e5bd
5ba4ace
08628b8
1e7fbc8
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,74 @@ | ||
| --- | ||
| title: Respect RBAC for Resource Inclusions/Exclusions | ||
|
|
||
| authors: | ||
| - "@gdsoumya" | ||
| - "@alexmt" | ||
|
|
||
| sponsors: | ||
| - TBD | ||
|
|
||
| reviewers: | ||
| - @jannfis | ||
|
|
||
| approvers: | ||
| - @jannfis | ||
|
|
||
| creation-date: 2023-05-03 | ||
|
|
||
| --- | ||
|
|
||
| # Enhancement Idea | ||
|
|
||
| This is a proposal to provide the ability to configure argocd controller, to respect the current RBAC permissions | ||
| when handling resources besides the already existing resource inclusions and exclusions. | ||
|
|
||
| ## Summary | ||
|
|
||
| Argo CD administrator will be able to configure in `argocd-cm`, whether to enable or disable(default) the feature where the controller will | ||
| only monitor resources that the current service account allows it to read. | ||
|
|
||
| ## Motivation | ||
|
|
||
| Some users restrict the access of the argocd to specific resources using rbac and this feature will enable them to continue | ||
| using argocd without having to manually configure resource exclusions for all the resources that they don't want argocd to be managing. | ||
|
|
||
| ## Proposal | ||
|
|
||
| The configuration for this will be present in the `argocd-cm`, we will add new boolean field `resource.respectRBAC` in the | ||
| cm which can be set to `true` to enable this feature, by default the feature is disabled. | ||
|
|
||
| For the implementation there are 3 proposals : | ||
|
|
||
| 1. Modify `gitops-engine` pkg to make a `SelfSubjectAccessReview` request before adding any resource to the watch list, in this approach we are making an extra | ||
| api server call to check if controller has access to the resource, this does increase the no. of kubeapi calls made but is more accurate. | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Re: The number of kubeapi calls. In clusters with a lot of APIs (e.g. OpenShift comes with 200 APIs installed by default), that's already at least a burst of 200 API calls in a very short time frame. The default QPS for the client-side rate limiter in the kube API client is 50 with a burst of 100 (2 * QPS). With additional SelfSubjectAccessReview, this numbers would increase to a burst of 400 calls. Things get even worse when you consider namespace-scoped mode. Because the watches would be established for each of the APIs per managed namespaces. Considering the 200+ APIs example, this would result in at least 400 calls per managed namespace. This seems to be numbers that can easily break the API server.
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
The default in kube has actually been increased to either 300 or 500. But Argo's default is currently set lower. We should set ours to the new default
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yeah, agreed. 50/100 is a very conservative number. I know that in the OpenShift client, this has been changed for a while (I think to 350 for the burst at least), and if upstream K8s has adopted a similar increase by default, we should do it as well. People can still fine-tune it using ARGOCD_K8S_CLIENT_QPS and ARGOCD_K8S_CLIENT_BURST when they want more conservative rate limits, but I guess the better user experience is to have them increased by default.
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @jessesuen it seems that the default in the kube client is 5 and 10 respectively: https://pkg.go.dev/k8s.io/client-go@v0.27.3/rest#pkg-constants (https://github.com/kubernetes/client-go/blob/ebad5fbb0d96ae12547fd19468316a604bce5ccf/rest/config.go#L43-L46) |
||
| 2. Modify `gitops-engine` pkg to check for forbidden/unauthorized errors when listing for resources, this is more efficient approach as the | ||
| no. of kubeapi calls made does not change, but there is a chance of false positives as similar errors can be returned from kubeapi server or env specific proxies in other situations | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I wonder what kind of false positives that would be? And, if there's a false-positive, chances are that SelfSubjectAccessReview will return positive, but the API server will reply with a 403 when trying to establish the watch, then again resulting in a hard error? |
||
| 3. Combine approaches 1 and 2, in this controller will check the api response for the list call, and if it receives forbidden/unauthorized it will make the `SelfSubjectAccessReview` call. | ||
| This approach is accurate and at the same time, only makes extra api calls if the list calls fail in the first place. | ||
|
Comment on lines
+47
to
+48
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This strongly depends on the use-case of the user and what kind of permissions they are willing to give to a specific Argo CD instance. For example, if there's an instance that is only allowed to manage 10 out of 200 APIs, you would still have 190 additional calls to the SelfSubjectAccessReview.
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Can we maybe make the use of SubjectSelfAccessReview configurable? Then people can have the trade-off they want - on clusters with a low amount of APIs, they can configure the mechanism to the highest precision while on clusters with high amount of APIs, they trade precision against speed.
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Sure I think we can have another config option that enables the stricter check. Will update the proposal with this
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Updated. |
||
|
|
||
| In all solutions, once controller determines that it does not have access to the resource it will stop monitoring it. | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. What if the user adds/removes access later ?
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. A controller restart will be necessary
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Controller automatically does full "resync" every day. So it will auto-discover RBAC change once a day
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Ah yes ^^ though if you want it instantly a restart is needed.
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @alexmt The watches will be re-established every 10 minutes as well, right? I think part of that is also rediscovery changes in available APIs. I guess with that, auto-discovery of RBAC changes would also happen all 10 minutes instead of only after cache expiry?
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think controller is retrying only in-progress watches every 10 minutes. So it will notice if RBAC no longer allows accessing resource but won't notice if RBAC is allowing new resources
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. But if you add new CRDs to the cluster, Argo CD will pick those up without the need for a restart. Is that due to an informer on the CRD API? As for the restart, I think a cluster-cache resync (which can be triggered via Argo CD's API) should be sufficient? Maybe we can add a new functionality that Argo CD will also re-try (probe) the APIs it received a 401 for previously? In a configurable interval, which defaults to the same duration as the watch timeout (10 minutes)? Anyway, I think this is implementation detail and could even be added later on. Just keeping this here as a potential idea. |
||
|
|
||
| ### Implementation decision | ||
|
|
||
| It was decided that we will go with approach 3 from the above list, but instead of a boolean flag we will have the `resource.respectRBAC` take 3 configuration options for the users : | ||
| - `strict` : This will perform both the checks i.e. whether the list call response is forbidden/unauthorized and if it is make the `SelfSubjectAccessReview` call to confirm. | ||
| - `normal` : This will only check whether the list call response is forbidden/unauthorized and skip `SelfSubjectAccessReview` call. | ||
| - unset/empty : This will disable the feature and controller will continue to monitor all resources. | ||
|
|
||
| NOTE: By default `resource.respectRBAC` will be unset or `""` which disables the feature | ||
|
|
||
| Users who are okay with an increase in kube api server calls can opt for strict option while users who are concerned with higher api calls can compromise on the accuracy and opt for the normal option. | ||
|
|
||
| ## Security Considerations and Risks | ||
|
|
||
| There are no particular security risks associated with this change, this proposal rather improves the argocd controller | ||
| to not access/monitor resources that it does not have permission to access. | ||
|
|
||
| ## Upgrade / Downgrade Strategy | ||
|
|
||
| There is no special upgrade strategy needed, all existing argocd configmaps will continue to work | ||
| and old configs without the `resource.respectRBAC` config will cause no change in argocd controllers behavior. | ||
|
|
||
| While downgrading to older version, if the user had configured `resource.respectRBAC` previously this would be ignored completely | ||
| and argocd would revert to its default behavior of trying to monitor all resources. | ||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wouldn't this be safe to enable by default? If the controller doesn't have access to a resource, respecting RBAC will only help it avoid 403s.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can keep it enabled by default, but that might suddenly change the behavior of the controller when a user upgrades which I thought might not be welcomed by some users.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense! I think we'd want to everywhere target enabling by default. But agreed, best to move cautiously.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would also be voting for kind of a cautious move here. Make it a feature toggle, test it extensively, and then move forward to enable it by default.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea I think by default for now at least this would be disabled and opt in for the user.