add support for running agent helm chart on persistent volume by knisbet · Pull Request #7123 · gravitational/teleport

knisbet · 2021-06-01T14:03:15Z

Problem: In teleport cloud we are setup to automatically rotate the join tokens to the teleport cluster. The helm chart isn't currently compatible with this as the join token gets added as a kubernetes secret but is not kept in sync with the cluster.

This is an addition to optionally allow the pods to be deployed as a statefulset with a persistent volume from kubernetes/cloud to maintain the state directory through pod restarts. I believe the Teleport team is also looking at using the kubernetes API to store this state, but I don't know if/when that will be available, so this seemed like a reasonable solution.

Submitting it here if the core team is interested, otherwise we'll have to fork the chart for use in the cloud.

I originally approached this as a bunch of replacements on the deployment file, changing the API and other parameters inline, but the approach seemed fragile with the number of tweaks. So I went with this approach to disable the deployment and add the statefulset if persistent storage is enabled, which largely duplicates the object definition that will require certain updates to be duplicated in both places.

Also, not super familiar with the linting approach used with helm, so I think I added the relevant tests.

webvictim · 2021-06-01T14:23:23Z

#7096 does a similar thing but seems a lot simpler (although doesn't add linting as your PR does etc) - what's the incentive for using a StatefulSet instead?

knisbet · 2021-06-01T14:43:49Z

It's not totally clear to me how #7096 is meant to work, if it's attempting to use a localDir for persistence, or some other way to map a single volume. Since the deployment controller doesn't normally account for persistence, I'm not really sure if this is a recommended model, and won't run into problems when say bumping the container tag where the deployment controller would then want to replace the pod with a new one by temporarily using two instances using the same persistence store.

The StatefulSet controller is more geared towards handling persistence, so in this approach the prereq is for the cluster to be connected to a cloud environment / storage controller with a storage class defined, and the volumes will be provisioned by kubernetes in the cloud environment by installing the chart. The statefulset controller also understands and won't try and run more than one pod connected to the same volume, or schedule pods in other AZs than the volume. Will probably also be more appropriate if we want to try and create redundant deployments through increasing the replicas.

the TLDR, StatefulSet is geared towards applications with state, Deployments are geared towards stateless (or more specifically ephemeral storage for pods).

webvictim · 2021-06-01T17:34:10Z

@klizhentas @awly What are your thoughts on this from a product standpoint? Are we looking to prioritise the work to use the Kubernetes API for certificate storage, or should we look to merge a chart-based solution for this in the short term?

awly · 2021-06-03T23:28:44Z

Storing state in k8s API (#4832) is not on the horizon any time soon.
Adding support for persistent volumes in our chart in the meantime makes sense.
The only reason I didn't add it originally was cross-platform compatibility.

@knisbet will you change work across at least AWS/GCP/Azure?

knisbet · 2021-06-04T01:15:20Z

@knisbet will you change work across at least AWS/GCP/Azure?

It should...

This really just leverages the kubernetes support for persistent volumes, which can integrate with various cloud providers and different plugins. So it just requires having or configuring a storageclass (https://kubernetes.io/docs/concepts/storage/storage-classes/) that matches the particular cloud provider and when installing the helm chart point at the storageclass created by the customer that will then have cloud provider specific parameters.

awly

Overall I like this change.
To release it as a part of our public Helm chart we should at least document how to set this up (maybe link to cloud provider-specific docs if available).
Also, if it's possible to migrate from non-stateful setup, we should document how. If it's not possible, we need a warning somewhere.

awly · 2021-06-08T16:05:53Z

+storage:
+  enabled: false
+  storageClassName: ""
+  requests: 128Mi


I actually have no idea how big the teleport storage needs are.
Do you have any numbers on the real-world size?
If this needs to be increased after deployment, will existing data be preserved?

I actually have no idea how big the teleport storage needs are.
Do you have any numbers on the real-world size?

Last 7 days I've had this deployed / idle it's used 19MiB. I may have assumed incorrectly that because this is a node agent and logs/audit/etc are meant to be centralized that the default for this volume can be relatively small.

Is there something I might be missing that would suggest a larger default may be needed here?

If this needs to be increased after deployment, will existing data be preserved?

It depends on how the storage class is setup and presumably whether the block driver supports it. The storage class has a boolean field called allowVolumeExpansion, and if it's set, the volume can be expanded without data loss. It will interrupt the pod temporarily since the volume needs to be unmapped and the filesystem expanded (Edit: On AWS, IDK other systems might have an online expansion).

I think this volume can be fairly small - no logs are really kept locally (they all go to Docker/containerd) and the sqlite database for the agent itself should only be holding a keypair and some cache data.

knisbet · 2021-06-08T17:35:32Z

Overall I like this change.
To release it as a part of our public Helm chart we should at least document how to set this up (maybe link to cloud provider-specific docs if available).

Are these agent helm charts documented somewhere in general? I see a few references to the other chart in the docs, and I guess there is a values reference for this chart which is very confusing to read https://goteleport.com/docs/kubernetes-access/helm/reference/#teleport-kube-agent but I don't see any specific docs on the agent charts.

Also, if it's possible to migrate from non-stateful setup, we should document how. If it's not possible, we need a warning somewhere.

It's not really a migration, when not using the volume, every startup of teleport is a new installation. When adding the persistent volume, the first start of the pod will then "install" teleport by persisting the /var/lib/teleport directories, so a subsequent creation of the pod now just has the data from the last startup. So I don't think there is really a migration here other than you no longer need long lived tokens (or like us, have to manually reset the token every time the pod restarts).

…tateful2

webvictim

There aren't decent docs for the teleport-kube-agent chart AFAIK - the only real reference is under https://goteleport.com/docs/kubernetes-access/controls/#kubernetes-labels.

@inertial-frame @benarent Do we have plans to write guides for using teleport-kube-agent similar to those for teleport-cluster?

I think that to merge this, we would at least need to add the additional values to the chart reference - unclear or not, the reference is the canonical source for all the chart's options.

webvictim · 2021-06-09T15:44:16Z

+storage:
+  enabled: false
+  storageClassName: ""
+  requests: 128Mi


I think this volume can be fairly small - no logs are really kept locally (they all go to Docker/containerd) and the sqlite database for the agent itself should only be holding a keypair and some cache data.

awly · 2021-06-09T16:45:43Z

@knisbet @webvictim yeah, we can just expand comments in values.yaml and add a section in https://github.com/gravitational/teleport/blob/master/examples/chart/teleport-kube-agent/README.md

knisbet · 2021-06-10T15:24:58Z

@knisbet @webvictim yeah, we can just expand comments in values.yaml and add a section in https://github.com/gravitational/teleport/blob/master/examples/chart/teleport-kube-agent/README.md

Added some more text in 25533c5

Mainly trying to follow how it's already setup.

…tateful2

webvictim · 2021-07-12T16:57:52Z

+# nodeSelector to apply for pod assignment
+# https://kubernetes.io/docs/concepts/scheduling-eviction/assign-pod-node/
+nodeSelector: {}


Please add this to the correct section of https://github.com/gravitational/teleport/blob/master/docs/pages/kubernetes-access/helm/reference.mdx as well to keep everything documented. You can just copy a section and change the appropriate parts.

Alright, I added to the reference doc.

Co-authored-by: Gus Luxton <gus@goteleport.com>

…o kevin/master/helm-stateful2

…tateful2

webvictim

LGTM with a few nits

…o kevin/master/helm-stateful2

…tateful2

russjones

Bot.

* add support for running agent on persistent volume * address review feedback * add more comments on the fields * update statefulset to be compatabile with upstream change

* Add abilty to configure postStart handler for teleport-cluster chart (#7168) * Add support for PDB with the teleport-cluster helm chart (#7138) * Allow teleport-cluster-agent chart to use an existing volume for the data directory (#7096) * Mount teleport-tls to the init container for the teleport-cluster helm chart (#7166) * add support for running agent helm chart on persistent volume (#7123) * helm: Make auth type configurable (#7508) * Add imagePullSecrets in kube-agent chart (#6941) Co-authored-by: Gus Luxton <gus@goteleport.com> Co-authored-by: Stefan Sedich <stefan.sedich@gmail.com> Co-authored-by: Kevin Nisbet <kevin@gravitational.com> Co-authored-by: Antonio Gurgel <66754096+antonio-te@users.noreply.github.com> Co-authored-by: Aymen Memni <7290430+amemni@users.noreply.github.com>

add support for running agent on persistent volume

b71b5e6

knisbet requested review from klizhentas, r0mant and russjones as code owners June 1, 2021 14:03

knisbet requested review from awly and webvictim June 1, 2021 14:03

webvictim assigned klizhentas and awly Jun 1, 2021

russjones added the helm label Jun 3, 2021

awly reviewed Jun 8, 2021

View reviewed changes

Kevin Nisbet added 2 commits June 8, 2021 13:36

Merge remote-tracking branch 'origin/master' into kevin/master/helm-s…

68dba0d

…tateful2

address review feedback

1fa70ac

webvictim reviewed Jun 9, 2021

View reviewed changes

add more comments on the fields

25533c5

awly approved these changes Jun 16, 2021

View reviewed changes

Kevin Nisbet added 2 commits June 22, 2021 11:51

Merge remote-tracking branch 'origin/master' into kevin/master/helm-s…

db079a4

…tateful2

update statefulset to be compatabile with upstream change

091e499

knisbet enabled auto-merge (squash) June 22, 2021 16:03

stefansedich mentioned this pull request Jun 25, 2021

Allow teleport-cluster-agent chart to use an existing volume for the data directory #7096

Merged

Merge branch 'master' into kevin/master/helm-stateful2

52b3dd6

webvictim reviewed Jul 12, 2021

View reviewed changes

Update examples/chart/teleport-kube-agent/values.yaml

56fa108

Co-authored-by: Gus Luxton <gus@goteleport.com>

Kevin Nisbet and others added 4 commits July 19, 2021 10:31

Update examples/chart/teleport-kube-agent/values.yaml

83e5f3b

Co-authored-by: Gus Luxton <gus@goteleport.com>

Merge remote-tracking branch 'origin/kevin/master/helm-stateful2' int…

589984a

…o kevin/master/helm-stateful2

add kubernetes reference to helm chart changes

4221926

Merge remote-tracking branch 'origin/master' into kevin/master/helm-s…

a944cd6

…tateful2

knisbet requested a review from webvictim July 21, 2021 14:46

Merge branch 'master' into kevin/master/helm-stateful2

b42b6fe

webvictim approved these changes Jul 21, 2021

View reviewed changes

Comment thread docs/pages/kubernetes-access/helm/reference.mdx Outdated

Comment thread docs/pages/kubernetes-access/helm/reference.mdx Outdated

Comment thread docs/pages/kubernetes-access/helm/reference.mdx Outdated

Comment thread docs/pages/kubernetes-access/helm/reference.mdx Outdated

Kevin Nisbet added 3 commits July 22, 2021 11:08

address last round of review feedback

68e3e34

Merge remote-tracking branch 'origin/kevin/master/helm-stateful2' int…

d4ba192

…o kevin/master/helm-stateful2

Merge remote-tracking branch 'origin/master' into kevin/master/helm-s…

d78fa1a

…tateful2

corkrean added the c-bi Internal Customer Reference label Jul 22, 2021

knisbet merged commit 4cacc2c into master Jul 22, 2021

russjones approved these changes Jul 22, 2021

View reviewed changes

knisbet deleted the kevin/master/helm-stateful2 branch July 22, 2021 16:26

This was referenced Jul 28, 2021

Allow teleport-kube-agent Helm Chart to use dynamic token #5585

Closed

use the same static secret with the teleport-kube-agent chart #6619

Closed

webvictim mentioned this pull request Aug 20, 2021

branch/v7: Helm backports #8007

Merged

pschisa mentioned this pull request Nov 16, 2021

Update the teleport-cluster Helm chart to use StatefulSets with PVs #9020

Closed

Conversation

knisbet commented Jun 1, 2021

Uh oh!

webvictim commented Jun 1, 2021

Uh oh!

knisbet commented Jun 1, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

webvictim commented Jun 1, 2021

Uh oh!

awly commented Jun 3, 2021

Uh oh!

knisbet commented Jun 4, 2021

Uh oh!

awly left a comment

Choose a reason for hiding this comment

Uh oh!

awly Jun 8, 2021

Choose a reason for hiding this comment

Uh oh!

knisbet Jun 8, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

webvictim Jun 9, 2021

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

knisbet commented Jun 8, 2021

Uh oh!

webvictim left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

webvictim Jun 9, 2021

Choose a reason for hiding this comment

Uh oh!

awly commented Jun 9, 2021

Uh oh!

knisbet commented Jun 10, 2021

Uh oh!

Uh oh!

Uh oh!

webvictim Jul 12, 2021

Choose a reason for hiding this comment

Uh oh!

knisbet Jul 19, 2021

Choose a reason for hiding this comment

Uh oh!

webvictim left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

russjones left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

knisbet commented Jun 1, 2021 •

edited

Loading

knisbet Jun 8, 2021 •

edited

Loading