Conversation
6cea8ab to
e494e6f
Compare
klizhentas
left a comment
There was a problem hiding this comment.
@hugoShaka @r0mant @reedloden your token based approach makes sense, but the design and the code have to be audited.
There was a problem hiding this comment.
I looked into Kubernetes joining some time ago and the approach described under "Approach 1" roughly lines up with what I concluded was the best path forward.
I have a few thoughts here as this work is quite similar to the CircleCI and GitHub joining work I've completed recently.
One thing that occurs to me, is that we shouldn't rule this out as not being useful for Teleport Cloud customers. It is very possible that with some minor tweaks to this RFD, it would be possible to let Kubernetes workloads on clusters where the Auth Server is not running to join a cluster.
In cases where their Kubernetes API server is publically exposed, we could allow them to configure the address of the API server within the ProvisionToken spec, if this address is not configured, then the Auth Server could fall back to the k8s API detectable from its environment.
In reality, many Kubernetes API servers are not publically exposed in their entirety, however, the Kubernetes documentation on Service Account tokens does mention that just exposing the discovery endpoints using some kind of service is a valid option:
In many cases, Kubernetes API servers are not available on the public internet, but public endpoints that serve cached responses from the API server can be made available by users or service providers. In these cases, it is possible to override the jwks_uri in the OpenID Provider Configuration so that it points to the public endpoint, rather than the API server's address, by passing the --service-account-jwks-uri flag to the API server. Like the issuer URL, the JWKS URI is required to use the https scheme.
Even if we decide against supporting this in the initial version, I think it's worthy us acknowledging this within the RFD and stating why we don't want to support it.
There was a problem hiding this comment.
We should aim for consistency with the CircleCI and GitHub work - and pull the configuration for Kubernetes joining into a separate block rather than reusing the overloaded spec.allow e.g
spec:
k8s:
kube_api_server: ""
allow:
- service_account: "my-namespace:my-service-account"I'd also suggest ( as shown ) that we allow the URL of a Kubernetes API server to be directly configured. When this value is omitted, the Auth Server should try and detect this from its environment, but by making it configurable, we make it possible for automatic joining from Kubernetes clusters that the auth server is not within.
There was a problem hiding this comment.
I think we want to rely on Kubernetes to validate the tokens instead of using the OIDC discovery endpoint and validating this on our own. Kubernetes does much more than just checking validity, audience and expiry and we cannot benefit from that without valid Kubernetes credentials.
In this context, making the apiserver configurable makes sense, but this would imply:
- having a way to trust the remote Kubernetes certs. Either adding them to the trust store or passing them in the resource.
- a way to pass Kubernetes credentials (I was planning on only using the in-cluster kube client because it made everything easy)
We cannot rely on env vars for this because this would not work if the user create different tokens targeting distinct Kubernetes clusters. We would either have to extend the token definition to contain a working kubeconfig, or to have the user provide kubeconfigs file on all auth nodes. We might also encounter extra issues with cloud-provider-specific auth methods.
I totally see the value of being able top trust additional clusters though. If you're OK I suggest we mention this as a possible next step in the RFD but keep the "remote apiserver" feature out of the first implementation to keep it focused on our current Helm chart issues.
There was a problem hiding this comment.
Awesome, I wasn't sure how much additional work the special k8s API offered on token validation, but it definitely sounds like its bringing value.
I think I definitely agree w/ keeping this as a next step - just useful to make sure we record whats in and out of scope.
cab94c1 to
0d5b6a1
Compare
strideynet
left a comment
There was a problem hiding this comment.
Looks good - just a few random nits.
Co-authored-by: Noah Stride <noah.stride@goteleport.com>
r0mant
left a comment
There was a problem hiding this comment.
lgtm with a couple of comments
| - As a Teleport power user, I want to easily join my custom sets of Teleport | ||
| nodes on Kubernetes so that I have less join automation to build and maintain. |
There was a problem hiding this comment.
I don't know if deploying nodes as in "SSH nodes" in the same Kube cluster has much of a value tbh since we have Kube Access, I would maybe say "Teleport services like Machine ID or plugins".
There was a problem hiding this comment.
I was thinking about kubernetes_service, app_service, db_service, discovery_service, windows_desktop_service. I'll explicit the list.
| Joining nodes from outside the Kubernetes cluster would imply generating and | ||
| exporting a Kubernetes token from the service account. While this pattern is | ||
| feasible, it does not provide much added value compared to generating a static | ||
| token from Teleport directly. |
There was a problem hiding this comment.
Would it make sense to provide more details on how it would work, maybe in a "future work" section? I just feel like having a "Kubernetes" join method and not supporting agents joining from other Kube clusters is strange. Like, if you run everything on Kubernetes, then you'd probably want to use the same join method even if your stuff is in different clusters? For comparison, IAM join method for example supports nodes joining from different AWS accounts.
There was a problem hiding this comment.
It should work the same way, except Teleport uses the kubeconfig from the resource instead of the inCluster config.
My biggest concern is that managed kubernetes hostings like GKE, EKS require more than a kubeconfig to interact with them. Suddenly we would have to support all the ways to pass service accounts to Teleport + support each cloud's auth provider in the go code. The spec would look like:
kind: token
version: v2
metadata:
name: proxy-token
expires: "3000-01-01T00:00:00Z"
spec:
roles: ["Proxy"]
k8s:
allow:
- service_account: "my-namespace:my-service-account"
# empty authType defaults to "inCluster" for compatibility
authType: "inCluster|kubeconfig|gcp|aws|azure|rancher|..."
kubeconfig:
certificate-authority-data: base64
server: http://1.2.3.4
# either set token or user && client-*
token: my-secret-token
user: client-sa-name
client-certificate-data: base64
client-key-data: base64
gcp:
# optional SA account key/name
# if they are empty, teleport should try to use ambient credentials (see google.DefaultTokenSource)
service-account-key: base64
service-account-name: "foo@sa-project.iam.gserviceaccount.com"
# mandatory fields to identify the cluster
project: cluster-project
location: us-east1
cluster-name: my-cluster
aws:
# optional SA account id/secret
# if they are empty, Teleport should try to use ambient credentials
key-id: ASDFGHJK
key-secret: QWERTYUIOP
# mandatory
region: us-east-1
account-id: 1234567890
cluster-name: my-cluster
azure:
# [...]
rancher:
# [...]The users would have to map the cloud's service account to a kubernetes user/group and create a rolebinding to allow this group to use the TokenReview API. This step is cloud-specific and can be straightforward (as in GKE) or complex (hello AWS).
I'll add those details to the RFD so they can be reviewed at the same time by the auditors.
reedloden
left a comment
There was a problem hiding this comment.
Approving based on #18659 (review)
This commit adds a new joinMethod as described in #17905 This method allow pods running in the same Kubernetes cluster than the auth servers to join the Teleport cluster. It relies on Kubernetes tokens to establish trust. The goal is to be able to deploy proxies and auths separately and join them in a single cluser. Pre Kubernetes 1.20, the tokens are static, long-lived, not bound to pods. We support them for compatibility reasons. Starting with Kubernetes 1.20, tokens are bound to pods (and starting with 1.21 they can be mounted through projected volumes). Starting with 1.21 we should only accept bound tokens. The chart will ensure tokens are properly mounted with projected volumes so we can benefit from the 1h to 10min token lifetime.
This commit adds a new joinMethod as described in #17905 This method allow pods running in the same Kubernetes cluster than the auth servers to join the Teleport cluster. It relies on Kubernetes tokens to establish trust. The goal is to be able to deploy proxies and auths separately and join them in a single cluser. Pre Kubernetes 1.20, the tokens are static, long-lived, not bound to pods. We support them for compatibility reasons. Starting with Kubernetes 1.20, tokens are bound to pods (and starting with 1.21 they can be mounted through projected volumes). Starting with 1.21 we should only accept bound tokens. The chart will ensure tokens are properly mounted with projected volumes so we can benefit from the 1h to 10min token lifetime.
This commit adds a new joinMethod as described in #17905 This method allow pods running in the same Kubernetes cluster than the auth servers to join the Teleport cluster. It relies on Kubernetes tokens to establish trust. The goal is to be able to deploy proxies and auths separately and join them in a single cluser. Pre Kubernetes 1.20, the tokens are static, long-lived, not bound to pods. We support them for compatibility reasons. Starting with Kubernetes 1.20, tokens are bound to pods (and starting with 1.21 they can be mounted through projected volumes). Starting with 1.21 we should only accept bound tokens. The chart will ensure tokens are properly mounted with projected volumes so we can benefit from the 1h to 10min token lifetime.
This commit adds a new joinMethod as described in #17905 This method allow pods running in the same Kubernetes cluster than the auth servers to join the Teleport cluster. It relies on Kubernetes tokens to establish trust. The goal is to be able to deploy proxies and auths separately and join them in a single cluser. Pre Kubernetes 1.20, the tokens are static, long-lived, not bound to pods. We support them for compatibility reasons. Starting with Kubernetes 1.20, tokens are bound to pods (and starting with 1.21 they can be mounted through projected volumes). Starting with 1.21 we should only accept bound tokens. The chart will ensure tokens are properly mounted with projected volumes so we can benefit from the 1h to 10min token lifetime.
Backport #18659 ### About the backport This backport is here to provide a rollback path from v12 to v11. Helm users upgrading to v12 end up with Kubernetes tokens automatically created. Currently, if they rollback to v11, the token is unknown and Teleport crashes during cache warming. Once this will be backported, users will have a v11 version they can rollback to if the v12 upgrade fails. ### Original commit message This commit adds a new joinMethod as described in #17905 This method allow pods running in the same Kubernetes cluster than the auth servers to join the Teleport cluster. It relies on Kubernetes tokens to establish trust. The goal is to be able to deploy proxies and auths separately and join them in a single cluser. Pre Kubernetes 1.20, the tokens are static, long-lived, not bound to pods. We support them for compatibility reasons. Starting with Kubernetes 1.20, tokens are bound to pods (and starting with 1.21 they can be mounted through projected volumes). Starting with 1.21 we should only accept bound tokens. The chart will ensure tokens are properly mounted with projected volumes so we can benefit from the 1h to 10min token lifetime.
Rendered version
This RFD proposes a mechanism to join Teleport nodes living in the same Kubernetes cluster than the auth nodes. This is part of our Q4 efforts to improve the Helm experience. This might also benefit the cloud hosting, I did not focus on this use-case but asked a couple of questions here.