Skip to content
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
149 changes: 149 additions & 0 deletions docs/proposals/004-cluster-name-as-id.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,149 @@
---
title: Cluster Name as ID
authors:
- "@denysvitali" # Authors' github accounts here.
sponsors:
- TBD # List all interested parties here.
reviewers:
- "@alexmt"
- TBD
approvers:
- "@alexmt"
- TBD

creation-date: 2023-03-07
last-updated: 2023-03-07
---

# Cluster Name as unique identifier


<!-- ## Open Questions [optional] -->


## Summary

Currently, the clusters are identified by their API URL. Pretty much everywhere in the codebase, we assume that the API
URL is unique within an ArgoCD instance, and thus this is used as an identifier.
With this proposal, we suggest to switch to a (Namespaced) Cluster Name oriented approach, so that one can use the same API URL multiple times
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When we say Cluster Name, does it mean the name of the cluster specified in the .data.name of the secret? Or is it the name of the secret resource metadata.name? Since .data.name is not a mandatory field it could be difficult to check for the uniqueness of the key. However, the secret name is mandatory, and combining it with the namespace will result in a unique key.

but with different service accounts / different namespaces.

The result could be similar to [Traefik Service Names](https://doc.traefik.io/traefik/routing/services/) (e.g: `cluster-1@argo-cd`) to achieve namespaced unique identifiers / clusters.

## Motivation

There are [many different ways to achieve multi-tenancy in Kubernetes](https://kubernetes.io/blog/2021/04/15/three-tenancy-models-for-kubernetes/), mainly divided into "Cluster/Control Plane as a Service" and "Namespaces as a Service".

For clarification purposes, we'll consider the following examples:
- One cluster per team, per environment (e.g: `https://team1-dev.kubernetes.example.com`, `default`)
- One cluster per team, namespaced (e.g: `https://team1.kubernetes.example.com`, with namespaces `dev`, `staging` and `prod`)
- One huge cluster, divided by namespaces (e.g: `https://kubernetes.example.com`, `team1-dev`, `team1-staging`, ...)

For this proposal, we're going to tackle an issue with the last two cases, assuming different credentials for each namespace. These last two cases can be seen as specific implementations of the "Namespace as a Service "case.

Consider the situation where a team is given three (or more) namespaces, for
example:

- `team1-dev`
- `team1-staging`
- `team1-prod`

Assume that for security purposes, Service Accounts are only allowed to act on one namespace.
This effectively means that in the example above `team-1` would have 3 service accounts to deploy to Kubernetes, one for each environment.

With the current implementation of ArgoCD (2.6.3), it is not possible
without ugly workarounds (e.g: reverse proxy, subdomains), to support this use-case.

The problem is due to the fact that, in the case of multi-environment and multi-tenant clusters with different service accounts per environment,
ArgoCD won't work reliably.

Due to the current indexing by API URL (e.g: `https://shared-cluster.kubernetes.example.com`), one cannot use the following clusters:

```yaml
apiVersion: v1
kind: Secret
metadata:
name: "cluster-a"
namespace: argo-cd
labels:
argocd.argoproj.io/secret-type: cluster
type: Opaque
stringData:
name: "cluster-a"
namespaces: "team1-dev"
server: "https://kubernetes.example.com"
config: "{}" # tlsClientConfig for team1-dev
---
apiVersion: v1
kind: Secret
metadata:
name: "cluster-b"
namespace: argo-cd
labels:
argocd.argoproj.io/secret-type: cluster
type: Opaque
stringData:
name: "cluster-b"
namespaces: "team1-staging"
server: "https://kubernetes.example.com"
config: "{}" # tlsClientConfig for team1-staging
```

The issue that the user will experience is non-deterministic. Since the list of clusters
is indexed by the cluster API URL (`https://kubernetes.example.com`), some times ArgoCD will try
to deploy / verify the resources on the correct cluster (e.g: checking `team1-dev` with `cluster-a`), but other times it might use the correct credentials (e.g: checking `team1-dev` with `cluster-b`).

### Goals

- Users are able to use multiple "clusters" with the same API URL

### Non-Goals

<!-- TODO: Define Non-Goals ? -->

## Proposal

- Indexing is performed on `name@namespace` (for example `cluster-a@argo-cd`)
Copy link
Copy Markdown
Member

@blakepettersson blakepettersson May 28, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could the default not be to still be using the cluster url? If we were to add a namespace field to the cluster secret, on subsequent lookups with db.GetCluster we could add a namespace parameter.

m.db.GetCluster(context.Background(), app.Spec.Destination.Server, app.Spec.Destination.Namespace)

In GetCluster we could retrieve all the clusters by server and determine if any of them has a matching namespace field in the secret. If any of the cluster secrets has a matching namespace attribute that cluster would be returned.

If none of the cluster secrets in the list has a namespace attribute, the same behaviour as today would happen (i.e to return the first entry from the cluster secret list).

If any of the cluster secrets has a namespace attribute which does not match the namespace passed in GetCluster, those secrets gets excluded from the cluster secret list. Then the first entry in the list gets returned (i.e same behaviour as today, but clusters with namespaces set are excluded from selection).

The challenge that remains is how to handle caching...

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will still come with limitations imposed by the fact that we're indexing by API URL.

Ideally, you should be able to connect N times to the same "cluster" (Kubernetes API) with a different set of credentials to perform different actions - maybe on the exact same namespace.

If you just assume that adding a "namespace" discriminant is enough, I'm sure other corner cases will pop-up.

I think the basic idea of having 1 Cluster = 1 API URL might be wrong.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally, you should be able to connect N times to the same "cluster" (Kubernetes API) with a different set of credentials to perform different actions - maybe on the exact same namespace.

If you just assume that adding a "namespace" discriminant is enough, I'm sure other corner cases will pop-up.

In the context of Argo CD, do you have any concrete use cases where having separate cluster credentials per namespace would not be enough?

This in theory could be extended to cover more dimensions than just namespaces, but IMO I don't see a reason why this shouldn't be enough.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One that I can think of at the moment, but we haven't used yet, is the ability to differentiate between the two operations:

  • Read state of the cluster (R/O credentials)
  • Write state to the cluster (R/W credentials)

Now, this might be nitpicking again, but it just shows how the assumption "one credential per namespace" could be wrong.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We'll have to agree to disagree 😄 In case @crenshaw-dev or @jannfis does not chime in before then, I'd recommend you to join the contributors meeting on Thursday (17.15 CET) where we can discuss this in more detail.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't worry, I'm not strongly opinionated towards my approach - I just find it might be more flexible for the future to have them indexed by name instead.

As long as our use case is covered (the one I've opened a bug about), I'm all good 😊.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There was a discussion about this proposal a bit more than a week ago; the gist of that discussion was that having a more fleshed out proposal (and implementation) of #7689 would be preferable (@alexmt can give more context to give as to why).

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have a multi-tenant cluster where tenants have a "deployer" service account in their namespaces. These tenants want to use ArgoCD to deploy to their respective namespaces, but this is currently impossible due to the issue described.

The cluster is being run by a separate team than the team that managed ArgoCD, so it is not allowed to make ArgoCD a cluster-admin on the cluster and allow everyone to deploy to their namespaces.

As long as ArgoCD can match on the name of the cluster, then this would be a non-issue as we can add the same cluster as different "ArgoCD clusters" and applications would point to the correct cluster name. Maybe "destination" is a better work than "cluster" in this case :)

- The code only assumes the `name@namespace` to be unique
- The code assumes an API URL can be repeated

### Use cases


#### Use case 1:

As a user, I would use multiple clusters with the same API URL and different credentials

### Implementation Details/Notes/Constraints [optional]



### Detailed examples

### Security Considerations

N/A

### Risks and Mitigations


**Using the name as the identifier might cause problems in case a cluster with the same name is specified in another namespace.**

Mitigation: do what Traefik is doing, generate unique identifiers in the form `name@namespace`


### Upgrade / Downgrade Strategy

As we assume the `name@namespace` are unique within a cluster, and they're enforced by Kubernetes already (you can't have two resources of the same type in the same namespace), the upgrade strategy is pretty straight-forward.

Since we're not breaking any functionality for the exsiting users,
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that this wouldn't break any existing features. But we are changing the cache key and I wonder how it would affect the users when they upgrade. Would it invalidate the existing caches and result in the re-syncing of resources?

downgrading would not cause any issues for those who are using ArgoCD
as it currently is as of v2.6.3.

## Drawbacks

None

## Alternatives

- Users can use a reverse proxy so that their API URL effectively becomes unique. This is an ugly solution but would work.