Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFD 5: Kubernetes Service #4455

Merged
merged 1 commit into from
Oct 28, 2020
Merged

RFD 5: Kubernetes Service #4455

merged 1 commit into from
Oct 28, 2020

Conversation

awly
Copy link
Contributor

@awly awly commented Oct 5, 2020

Proposal for a standalone kubernetes_service, separate from proxy_service.

Updates #3952

```yaml
# New format:
kubernetes_service:
enabled: yes # default "no"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With 5.0, if you recommend users set Admin starts a Teleport agent within each Kubernetes cluster, with the following, this would mean that the Auth/Proxy on the root could have this set to False? ( this would a similar to ssh_service being false ) so it would follow the same pattern.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Admin can set it to false.
Proxy still needs to listed on a k8s port to forward requests though

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is somewhat confusing - the idea that you could start a kubernetes_service and give it a public_addr but it still isn't actually the thing that's responsible for doing the listening. It doesn't work without a corresponding proxy, right?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ping @awly for comment here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, missed this.
A proxy_service with kube_listend_addr set is a user-facing endpoint.
A kubernetes_service is a gateway for a single k8s cluster from a teleport cluster, that is only reachable through a proxy.

Some scenarios:

  • single teleport cluster, single k8s cluster - use both kubernetes_service and proxy_service with kube_listen_addr
  • single teleport cluster, multiple k8s clusters - auth_service and proxy_service on one box, separate pods with only kubernetes_service in each cluster
  • root teleport cluster without local k8s cluster, leaf teleport clusters with k8s clusters - root proxy needs kube_listen_addr but not kubernetes_service; leaf proxies need kube_listen_addr, with kubernetes_service in one process or separate pods running just kubernetes_service

There's a consistent requirement - if you want k8s integration, your proxies must set kube_listen_addr and clients only talk to kubernetes_service through a proxy.


The Kubernetes service implements all the common features:
- connecting to an Auth server directly or via a Proxy tunnel
- registration using join tokens
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would we want to split out node token, or keep it generic for Kubernetes.

$ tctl nodes add
The invite token: 3abaf5364b23483b29b18d23091e2397
This token will expire in 30 minutes

Run this on the new node to join the cluster:

> teleport start \
++   --roles=node \
   --token=3abaf5364b23483b29b18d23091e2397 \
   --ca-pin=sha256:47e50cb6138cfa11587508e334d99fa26b9832a030a7b20c9ab7b2b9e77f4206 \
   --auth-server=172.31.1.91:3025

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, it would use a different role for registration: tctl tokens add --type=kubernetes

Copy link
Contributor

@webvictim webvictim left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a reasonable suggestion. I have some concerns about migrations and how we can persuade people to do the right thing, particularly when many have already built complicated setups to work around the fact that we've always had the hard "one k8s cluster = one proxy" requirement in the past. This is good news for them in theory, but it will require work to migrate. Some people have designed their entire architecture around the knowledge that you need an auth/proxy pair (and thus stateful storage) in every k8s leaf cluster.

Possible mitigations:

  • Existing proxy_service setups inside pods which link back as leaf clusters could maybe add themselves as kubernetes_service instead with the same credentials once upgraded to v5.0.0, or at least use a transparent rotation-type mechanism to get new credentials with a Kubernetes type instead of a Proxy type. It's going to be hard to automatically do the right thing for many though.
  • Another mitigation is to put a lot of work into our Helm chart and make sure that our Helm flow for starting a lone kubernetes_service inside a pod and linking it back to an existing Teleport cluster is really really simple - I think this would be a fairly common use case.

Other concerns:

  • At the moment the majority of people don't just deploy Teleport running proxy_service inside pods - they have to deploy both auth and proxy (with stateful storage) because this is how you link a leaf cluster back.

    • Is the idea that kubernetes_service will encapsulate the entire reverse tunnel authz/n part as well, so that you could go from running auth_service and proxy_service to just running kubernetes_service with exactly the same experience?
      • If we can encourage people to migrate early without any downtime (and remove the need for them to provide stateful storage for every leaf) then the whole idea becomes more compelling.
    • If you still wanted SSH access as well as k8s, would you also need to deploy both auth_service and proxy_service along with kubernetes_service to register as a leaf cluster get this?
    • How would this integrate with the existing leaf cluster model?
  • We are releasing AAP with v5.0.0 which is also going to add app_service to /etc/teleport.yaml - this would potentially have it happen in the same release, and make the minimal config more complicated:

From:

teleport:
  auth_token: blah
  ca_pin: sha256:blah
  log:
    severity: INFO
    output: stderr
auth_service:
  enabled: no
proxy_service:
  enabled: no
ssh_service:
  enabled: yes

To:

teleport:
  auth_token: blah
  ca_pin: sha256:blah
  log:
    severity: INFO
    output: stderr
auth_service:
  enabled: no
app_service:
  enabled: no
kubernetes_service:
  enabled: no
proxy_service:
  enabled: no
ssh_service:
  enabled: yes

This isn't necessarily a deal-breaker, but it's worth noting that historically, the default behaviour of all *_service flags in the config is to set enabled: yes unless you explicitly specify: enabled: no. This could lead to a situation where people on v4.4.x config files are having a couple of new services enabled on them after upgrading to v5.0.0 without any notification.

  • I don't know what the mitigation for this is - I'm just saying that there's a lot of cruft linked into decoupling the k8s part from the proxy part. I think it's worth exploring but we should be careful not to add legacy config migrations/paths that we'll need to support until the end of the time.

```yaml
# New format:
kubernetes_service:
enabled: yes # default "no"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is somewhat confusing - the idea that you could start a kubernetes_service and give it a public_addr but it still isn't actually the thing that's responsible for doing the listening. It doesn't work without a corresponding proxy, right?


To encourage users to migrate, all new config fields (`k8s_cluster_name` and
`labels`) will be added to the new service definition only.
`kubconfig_file` in the old section will behave as before - only extracting
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
`kubconfig_file` in the old section will behave as before - only extracting
`kubeconfig_file` in the old section will behave as before - only extracting

Comment on lines 88 to 89
`kubconfig_file` in the old section will behave as before - only extracting
`current-context` and registering that as `k8s_cluster_name`. In the new
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we'd have to come up with some logic for what happens if two services (one proxy_service, one kubernetes_service) try to register with the same name - to handle potential migration scenarios where people have an existing working setup but want to start switching over to the new style.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There can be multiple teleport binaries reporting the same k8s cluster in heartbeats (e.g. HA proxy setup inside the cluster).
When routing requests, we can send the request to any endpoint that claims to support a given k8s cluster name.

there's still a lot of authn/z and audit complexity).

This RFD complements the [Kubernetes 5.0 enhancements
design](https://docs.google.com/document/d/1cS6J2d_xBcJMWPewWPjdOZyrDHLKdqmgF1QLLo4E1YI).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please port the rest of the doc in this RFD too? I was re-reading today, and found new commands tsh kube clusters and tsh kube login that I wanted to discuss.

This will help to have a full picture of what you are proposing here.

@awly

@awly awly force-pushed the rfd/5-kubernetes-service branch from b44f581 to a6500f2 Compare October 6, 2020 22:54
@awly
Copy link
Contributor Author

awly commented Oct 6, 2020

@klizhentas moved most of the doc contents into this RFD.

@webvictim

  • migration
    • existing user setups should work as before with no change in behavior; if they have leaf clusters and 1:1 mapping, it should stay that way and keep working without registering as kubernetes_service in root
    • the motivation to migrate is:
      • simpler config (no trusted clusters)
      • 1:N mapping from teleport binary to k8s clusters (lower resource usage)
      • RBAC
      • no need for in-cluster persistence (storage cost)
    • also, @klizhentas suggested providing a helm one-liner to set up an in-cluster agent
  • mixing ssh and k8s access
    • you can either set up a leaf cluster inside k8s as before
    • or you can have your SSH nodes join up to the root cluster directly
  • minimal teleport.yaml
    • both application_service and kubernetes_service will have enbled: no as default
    • this is for backwards-compatibility, an upgrade to 5.0 without config changes shouldn't enable new non-trivial features

@awly awly mentioned this pull request Oct 7, 2020

#### Non-k8s proxy

A separate "gateway" proxy can run with `proxy_service.kubernetes.enabled: yes`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why use this instead of new kubernetes_service?

@awly
Copy link
Contributor Author

awly commented Oct 7, 2020

TODO(me):

  • add kube_listen_addr to proxy_service
  • examples for common config scenarios
  • clearly describe what proxy_service and kubernetes_service do (at a high level)
  • when kubernetes_service doesn't have listen_addr set, only do reverse tunneling

@awly awly mentioned this pull request Oct 8, 2020
16 tasks
@awly awly force-pushed the rfd/5-kubernetes-service branch from a6500f2 to 836ab0a Compare October 8, 2020 18:03
@awly
Copy link
Contributor Author

awly commented Oct 8, 2020

Updated and added more config examples.
PTAL

awly pushed a commit that referenced this pull request Oct 8, 2020
This plumbs config fields only, they have no effect yet.

Also, remove `cluster_name` from `proxy_config.kubernetes`. This field
will only exist under `kubernetes_service` per
#4455
@awly awly force-pushed the rfd/5-kubernetes-service branch from 836ab0a to 037626b Compare October 19, 2020 17:10
@awly
Copy link
Contributor Author

awly commented Oct 19, 2020

Ping @klizhentas @russjones @benarent, still need approval here

awly pushed a commit that referenced this pull request Oct 19, 2020
This plumbs config fields only, they have no effect yet.

Also, remove `cluster_name` from `proxy_config.kubernetes`. This field
will only exist under `kubernetes_service` per
#4455
awly pushed a commit that referenced this pull request Oct 19, 2020
* Fix local etcd test failures when etcd is not running

* Add kubernetes_service to teleport.yaml

This plumbs config fields only, they have no effect yet.

Also, remove `cluster_name` from `proxy_config.kubernetes`. This field
will only exist under `kubernetes_service` per
#4455

* Handle IPv6 in kubernetes_service and rename label fields

* Disable k8s cluster name defaulting in user TLS certs

Need to implement service registration first.
# New format:
kubernetes_service:
enabled: yes
public_addr: [k8s.example.com:3026]
Copy link
Contributor

@webvictim webvictim Oct 19, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we still be able to specify a public_addr here? Is it needed when the proxy is intended to be responsible for providing the endpoint that kubectl connects to and routing traffic to the appropriate cluster?

Just trying to figure out what the use case is for having a separate public_addr when the proxy is the inbound point for the traffic.

Edit: Finished the document and I now understand that there's a case for having public_addr and listen_addr set when the kubernetes_service is connected directly to the auth server (as they're needed to route inbound traffic)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct, public_addr is for a proxy to figure out where this service lives

in the `auth_servers` field of `teleport.yaml`.

When connecting over a reverse tunnel, `kubernetes_service` will not listen on
any local port, unless its `listen_addr` is set.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is different to the way that ssh_service works currently - I don't believe ssh_service ever listens when connected over a reverse tunnel, meaning there's no way to make an inbound connection to port 3022. I feel like the behaviour of kubernetes_service when connected over a reverse tunnel should be the same, unless there's a really compelling use case to make it listen locally?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So you mean that listen_addr should be disallowed when connecting over a tunnel?
I can see 2 small (but not deal-breaker) problems:

  • listen_addr is validated at startup, before the process knows whether it's talking to an auth server or a proxy
  • an admin may choose to set listen_addr on all instances, but only set auth_server to a proxy on some of them; on tunneled instances, listener will remain active but unused; on regular instances it will be dialed by the proxies; no need to template listen_addr in your config mangement

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I start writing relevant code, I understand what you mean better.
I'll disallow both connecting via proxy tunnel and using a local listen_addr because:

  • it's simpler to implement (no need to merge connections from multiple listeners)
  • it behaves more like the ssh_service

```

```sh
$ tsh kube clusters
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do end users need to do anything else with kubectl for multiple cluster? or does it just work out of the box? https://kubernetes.io/docs/tasks/access-application-cluster/configure-access-multiple-clusters/

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tsh login will configure all known clusters as kubectl contexts.
to switch between them, end users have to call either tsh kube login $cluster_name or kubectl config use-context $cluster_name.

awly pushed a commit that referenced this pull request Oct 21, 2020
This is a shorthand for the larger kubernetes section:
```
proxy_service:
  kube_listen_addr: "0.0.0.0:3026"
```
if equivalent to:
```
proxy_service:
  kubernetes:
    enabled: yes
    listen_addr: "0.0.0.0:3026"
```

This shorthand is meant to be used with the new `kubernetes_service`:
#4455
It reduces confusion when both `proxy_service` and `kubernetes_service`
are configured in the same process.
@awly awly force-pushed the rfd/5-kubernetes-service branch from 037626b to 9696d75 Compare October 27, 2020 20:39
@awly awly force-pushed the rfd/5-kubernetes-service branch from 9696d75 to 5a813d2 Compare October 28, 2020 17:43
@awly awly merged commit 025143d into master Oct 28, 2020
@awly awly deleted the rfd/5-kubernetes-service branch October 28, 2020 17:55
awly pushed a commit that referenced this pull request Oct 28, 2020
This is a shorthand for the larger kubernetes section:
```
proxy_service:
  kube_listen_addr: "0.0.0.0:3026"
```
if equivalent to:
```
proxy_service:
  kubernetes:
    enabled: yes
    listen_addr: "0.0.0.0:3026"
```

This shorthand is meant to be used with the new `kubernetes_service`:
#4455
It reduces confusion when both `proxy_service` and `kubernetes_service`
are configured in the same process.
awly pushed a commit that referenced this pull request Oct 28, 2020
This is a shorthand for the larger kubernetes section:
```
proxy_service:
  kube_listen_addr: "0.0.0.0:3026"
```
if equivalent to:
```
proxy_service:
  kubernetes:
    enabled: yes
    listen_addr: "0.0.0.0:3026"
```

This shorthand is meant to be used with the new `kubernetes_service`:
#4455
It reduces confusion when both `proxy_service` and `kubernetes_service`
are configured in the same process.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants