tsh doesn't target the correct Kubernetes cluster if logging directly into a non-root cluster #3693

webvictim · 2020-05-11T20:30:55Z

What happened: I ran tsh login --proxy=example-main.gravitational.co:3080 gus-test-cluster-seven and logged into a Teleport cluster. This caused my ~/.tsh/profile to get updated:

web_proxy_addr: example-main.gravitational.co:3080
ssh_proxy_addr: example-main.gravitational.co:3023
kube_proxy_addr: example-main.gravitational.co:3026
user: [email protected]
cluster: gus-test-cluster-seven

Any tsh operation correctly targets the leaf cluster - gus-test-cluster-seven. Any kubectl operation actually targets the root cluster (example-main.gravitational.co) instead:

# shows leaf cluster, correct
$ tsh login --proxy example-main.gravitational.co gus-test-cluster-seven
If browser window does not open automatically, open it by clicking on the link:
 http://127.0.0.1:45231/92fdaadb-e589-47f6-9b18-51996768b0e1
> Profile URL:  https://example-main.gravitational.co:3080
  Logged in as: [email protected]
  Cluster:      gus-test-cluster-seven
  Roles:        clusteradmin*
  Logins:       root
  Valid until:  2020-05-12 04:59:53 -0300 ADT [valid for 12h0m0s]
  Extensions:   permit-agent-forwarding, permit-port-forwarding, permit-pty


* RBAC is only available in Teleport Enterprise
  https://gravitational.com/teleport/docs/enterprise

# Teleport nodes in the leaf cluster, correct
$ tsh ls
Node Name                Address         Labels                                                                                                                            
------------------------ --------------- --------------------------------------------------------------------------------------------------------------------------------- 
ip-172-20-44-170         100.96.7.1:3022 kubelet-labels=kops.k8s.io/instancegroup=nodes|kubernetes.io/role=node|node-role.kubernetes.io/node=, node-type=kubernetes-worker 
                                         running-on=host                                                                                                                   
ip-172-20-46-62          100.96.6.0:3022 kubelet-labels=kops.k8s.io/instancegroup=nodes|kubernetes.io/role=node|node-role.kubernetes.io/node=, node-type=kubernetes-worker 
                                         running-on=host                                                                                                                   
teleport-c7bf6647d-fwrgg 127.0.0.1:3022                    

# shows leaf cluster, correct
$ kubectl config current-context
gus-test-cluster-seven

# this k8s worker is in the root cluster, not the leaf (INCORRECT)
$ kubectl -v=7 get nodes
I0511 17:01:03.752529 2447430 loader.go:375] Config loaded from file:  /home/gus/.kube/config
I0511 17:01:03.756955 2447430 round_trippers.go:420] GET https://example-main.gravitational.co:3026/api/v1/nodes?limit=500
I0511 17:01:03.756967 2447430 round_trippers.go:427] Request Headers:
I0511 17:01:03.756972 2447430 round_trippers.go:431]     User-Agent: kubectl/v1.16.2 (linux/amd64) kubernetes/c97fe50
I0511 17:01:03.756976 2447430 round_trippers.go:431]     Accept: application/json;as=Table;v=v1beta1;g=meta.k8s.io, application/json
I0511 17:01:03.891339 2447430 round_trippers.go:446] Response Status: 200 OK in 134 milliseconds
NAME                                             STATUS   ROLES    AGE    VERSION
gke-teleport-demo-cluster-pool-1-afbf7cbd-x5ck   Ready    <none>   403d   v1.14.10-gke.27

Rerunning the login command does not fix this. I have to explicitly switch to a different cluster (tsh login --proxy=example-main.gravitational.co:3080 example-main.gravitational.co) and back again (tsh login --proxy=example-main.gravitational.co:3080 gus-test-cluster-seven) to fix the issue.

What you expected to happen: Directly logging into a cluster with tsh login --proxy=root-cluster leaf-cluster should correctly target all tsh and kubectl operations at the correct cluster.

How to reproduce it (as minimally and precisely as possible): See above.

Manually clearing out all unrelated entries in my ~/.kube/config doesn't seem to help. Deleting ~/.tsh/profile doesn't help.

Environment

Teleport version (use teleport version): Teleport Enterprise v4.2.9git:v4.2.9-0-ga4bd6c36 go1.13.2
Tsh version (use tsh version): Teleport v4.2.9 git:v4.2.9-0-ga4bd6c36 go1.13.2 (also happens with tsh 4.2.8)
OS (e.g. from /etc/os-release): Fedora 31
Where are you running Teleport? (e.g. AWS, GCP, Dedicated Hardware): Root in GKE, leaf in AWS (created by kops)

The text was updated successfully, but these errors were encountered:

benarent · 2020-05-12T19:37:23Z

@russjones This can lead to some pretty bad UX. e.g. Accidentally shutting down a production cluster, when you think it's staging. I can see this being an Issue for S, should we put it into 4.3?

benarent · 2020-05-18T16:21:09Z

@awly Do you have any thoughts on this issue?

awly · 2020-05-18T20:44:37Z

#3639 looks related, since we use tc.SiteName for k8s cluster address, but that should be fixed in 4.2.9.
Let me set up a repro case...

awly · 2020-05-18T20:54:50Z

Hmm, actually, the issue might be different.
By design, ~/.kube/config targets the root cluster proxy. Metadata in the client TLS cert should tell the root proxy to forward traffic to the leaf proxy (instead of sending requests to root k8s cluster). This re-routing is failing, it seems.

awly · 2020-05-19T00:49:09Z

OK, I think I got it working.
We have one way we issue TLS certs at login (over HTTP/JSON API), and a different way we re-issue certs later (via gRPC it seems).
Those two flows are different all the way, including how they build TLS certs.

As a stopgap solution, I updated the login flow to support the same k8s-specific stuff we support when re-issuing: RouteToCluster (to indicate where to forward k8s requests), KubernetesUsers/Groups (to populate the correct Subject fields in the cert). PR incoming.

In the long term, we should try to refactor and unify all this stuff. I'm still hazy on all the moving pieces and how it should ideally look like.

awly · 2020-05-26T18:45:56Z

Reopening to backport to 4.2

awly · 2020-06-26T23:50:17Z

This is not working for SSO logins yet unfortunately.
I'll need to do the plumbing through all the OIDC/SAML/Github callback hell.
My bad on not testing that flow.

benarent · 2020-07-01T20:57:05Z

Closing this issue as I myself and @awly tested this with SSO.

webvictim changed the title ~~tsh doesn't target the correct Kubernetes cluster if there is a cluster set in ~/.tsh/profile~~ tsh doesn't target the correct Kubernetes cluster if logging into is a cluster set in ~/.tsh/profile May 11, 2020

webvictim changed the title ~~tsh doesn't target the correct Kubernetes cluster if logging into is a cluster set in ~/.tsh/profile~~ tsh doesn't target the correct Kubernetes cluster if logging into a non-root cluster May 11, 2020

webvictim changed the title ~~tsh doesn't target the correct Kubernetes cluster if logging into a non-root cluster~~ tsh doesn't target the correct Kubernetes cluster if logging directly into a non-root cluster May 11, 2020

webvictim mentioned this issue May 11, 2020

Not able to use kubectl against leaf k8s cluster #3694

Closed

benarent added kubernetes-access tsh tsh - Teleport's command line tool for logging into nodes running Teleport. labels May 11, 2020

webvictim added the bug label May 12, 2020

benarent added R2 c-sn Internal Customer Reference labels May 12, 2020

awly self-assigned this May 18, 2020

awly mentioned this issue May 19, 2020

Add missing trusted cluster info to TLS certs created at tsh login #3735

Merged

This was referenced May 19, 2020

kubectl config Error from server (BadRequest) on leaf K8s #3716

Closed

Only update kubernetes context if flag is provided #3744

Closed

awly closed this as completed in #3735 May 21, 2020

awly reopened this May 26, 2020

This was referenced May 27, 2020

backport: add missing trusted cluster info to TLS certs created at tsh login #3769

Closed

backport: add missing trusted cluster info to TLS certs created at tsh login #3770

Merged

awly closed this as completed May 27, 2020

awly reopened this Jun 26, 2020

awly added this to the 4.3 "Oceanside" milestone Jun 26, 2020

awly mentioned this issue Jun 29, 2020

Plumb RouteToCluster through SSO login flows #3939

Merged

benarent closed this as completed Jul 1, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

tsh doesn't target the correct Kubernetes cluster if logging directly into a non-root cluster #3693

tsh doesn't target the correct Kubernetes cluster if logging directly into a non-root cluster #3693

webvictim commented May 11, 2020

benarent commented May 12, 2020

benarent commented May 18, 2020

awly commented May 18, 2020

awly commented May 18, 2020

awly commented May 19, 2020

awly commented May 26, 2020

awly commented Jun 26, 2020

benarent commented Jul 1, 2020

tsh doesn't target the correct Kubernetes cluster if logging directly into a non-root cluster #3693

tsh doesn't target the correct Kubernetes cluster if logging directly into a non-root cluster #3693

Comments

webvictim commented May 11, 2020

Environment

benarent commented May 12, 2020

benarent commented May 18, 2020

awly commented May 18, 2020

awly commented May 18, 2020

awly commented May 19, 2020

awly commented May 26, 2020

awly commented Jun 26, 2020

benarent commented Jul 1, 2020