RFD 0167: Automatic updates change proposal by bernardjkim · Pull Request #39217 · gravitational/teleport

bernardjkim · 2024-03-11T22:23:35Z

This RFD proposes a change to the automatic updates design. The design has a number of limitations that are incompatible with the needs of Teleport Cloud. This RFD provides an overview of the current issues and some potential solutions. The RFD includes minimal implementation details. A separate execution plan with more more implementation details will be created, if the change proposals are approved.

Related Issues:

github-actions · 2024-03-11T22:24:10Z

The PR changelog entry failed validation: Changelog entry not found in the PR body. Please add a "no-changelog" label to the PR, or changelog lines starting with changelog: followed by the changelog entries for the PR.

github-actions · 2024-03-11T22:34:05Z

The PR changelog entry failed validation: Changelog entry not found in the PR body. Please add a "no-changelog" label to the PR, or changelog lines starting with changelog: followed by the changelog entries for the PR.

- Describe goals of the change - Add more details about reducing installtion complexity

fheinecke · 2024-03-13T06:29:38Z

+
+Yum has a similar feature that can exclude packages from a system update. This can be done by specifying teleport-ent to be excluded in the /etc/yum.conf file.
+
+Step 3: The Teleport installation process should no longer rely on the package manager to download teleport-ent packages. Instead, the Teleport proxy will now serve the latest compatible version of the teleport-ent package. The Teleport updater will then be responsible for downloading and installing the teleport-ent packages from the Teleport proxy.


What's the plan for getting the artifacts securely delivered to the proxy?

I'm not sure we want to tunnel 500MiB blobs through the proxy to 10k agents in the same maintenance window. We'll have a bad availability, slow speed, pay a lot of bandwidth.

I would rather have the artifacts stored into an S3 bucket/CDN/whatever and the proxy to serve the bucket address and desired version.

If we part from the package manager, we must solve the trust issue, exactly like we had to do for Kubernetes. The artifacts must be signed. The public key can be baked into the updater, or retrieved dynamically, maybe from the proxy. Serving the key from the proxy offers less guarantees if someone can MITM with valid TLS, but it would allow us to rotate the signing key without contacting all customers to redeploy.

I would rather have the artifacts stored into an S3 bucket/CDN/whatever and the proxy to serve the bucket address and desired version.

Being able to curl off the proxy would be nice, but it should redirect to a CDN.

I'm not sure we want to send the CDN URL from the proxy. This trades a lot of security guarantees against a bit of usability (if we own the domain, we can host whatever we want on it).

if the agent reads the CDN URL from the proxy redirect: someone can takeover the proxy and have the agents install arbitrary things, especially if the proxy also serves the signature keys

if the agent reads the version from the proxy but has a built-in CDN URL, someone taking over the proxy cannot force agents to install arbitrary content. Only content hosted in the teleport CDN can be downloaded. So it would require taking over the CDN as well (else the only thing you can do is a downgrade attack)

I think a middle ground would be having the proxy return the artifact name and the updater set the domain. This way, we can move artifacts and change the CDN structure remotely, but agents are not installing something random. It would allow users to configure the updater to pull through a proxy/mirror/cache if they have company policies in place regarding downloads.

However I'd love the install script from curl https://goteleport.com/static/install.sh | bash -s 15.1.4 to be served by the proxy.

So the installation method would be curl https://mytenant.teleport.sh/something/install.sh | bash

What SLO do we have for auto updates and/or cloud? What SLO do we need to have for the release service (currently don't have any)?

if the agent reads the CDN URL from the proxy redirect: someone can takeover the proxy and have the agents install arbitrary things, especially if the proxy also serves the signature keys

So the installation method would be curl https://mytenant.teleport.sh/something/install.sh | bash

Difference here is just that it's less scary for initial install?

If I take over the proxy, I can serve arbitrary install and discover scripts.

If I take over the proxy, I can serve arbitrary install and discover scripts.

Installation time and update time are not equivalent. If you take over a proxy, you can compromise new agents by changing the bash script served at install time. However, you cannot compromise existing agents, that's not a 0-click escalation.

This is already the case for regular joins, the agent just trusts whatever auth cert is returned by the proxy during theinitial join. The agent can be stolen if you have valid TLS certs. However, an already enrolled agent will check the cluster CAs and cannot be stolen. The automatic update mechanism should provide the same security guarantees.

Proving the bash script authenticity is hard because:

we don't have established trust except via the OS trust store, which is kinda weak

the dynamic part injected by the proxy (version, URL, automatic updates, oss/ent, architecture, package manager, ...) blocks us from hashing the bash script

I'm not sure we have to solve the "bootstrapping in a compromised environment" problem as Teleport doesn't even solve it for a regular agent join (CA pins are not used when joining via proxy).

sclevine · 2024-03-13T06:43:15Z

+### Deprecate the stable/cloud teleport-ent package
+Currently, the teleport-ent-updater package requires the teleport-ent package as a dependency. This means that the user must install the latest version of the teleport-ent package which may or may not be compatible with their Teleport control plane, or they must first specify a compatible version of teleport-ent to install. This puts unnecessary burden on the user, and complicates the installation process.
+
+Step 1: To remove this burden from the user and simplify the installation process, the Teleport updater will support an install command. The install command accepts the necessary configuration and then installs the latest compatible version of the teleport-ent package for the user.


We discussed removing the package entirely. Are you still working on this section?

hugoShaka · 2024-03-13T15:39:35Z

+## Current Limitations
+


I would like to add another current limitation: the automatic update process only covers the agents. Not the integrations.

Since v15 we have Teleport Operator users against Cloud instances, we already had users self-hosting plugins (see slack link).

I suspect that as Teleport Cloud grows and operator adoption increases, similar issues will arise. I think the proposed solution should be designed to be extended in the future to cover the integrations, at least self-hosted plugins and the operator.

hugoShaka · 2024-03-13T15:53:58Z

+![publishing](assets/0167-auto-updates-change-proposal/kube-deployment.png)
+
+## Current Limitations
+


Another limitation Cloud is not facing directly is that automatic updates are cloud-only. Because AUs are not properly supported and adopted everywhere, in Teleport code we have a special execution path "if Cloud". This increases the difference between cloud and self-hosted, the probability of a bug happening. This path is typically less tested and expensive to maintain.

Getting rid of the special case would prevent incidents like the discovery one from 2 weeks ago.

- Add details about downloads from Teleport CDN - Add details about updating the Teleport updater

- Add limitations for self-hosted - Add limitations for integrations

- Add suggestion to remove schedule from updater - Add suggestion simplify agent version management - Add suggestion to serve install script from proxy

sclevine · 2024-04-01T17:39:09Z

+
+Step 2: Teleport must provide an alternate method of securely downloading the latest compatible version of Teleport. The Teleport CDN already serves Teleport artifacts and their SHA256 checksums. To ensure authenticity of these Teleport downloads, Teleport will now need to sign these artifacts and provide the digital signatures. The public key required to verify the digital signature will be baked into the Teleport updater.
+
+Step 3: The Teleport updater must control all Teleport updates. To ensure this, the Teleport updater will no longer rely on the system package manager to install/update Teleport. Users will no longer be able to manually update Teleport through the system package manager. Instead, the Teleport updater will now download the Teleport artifacts from the Teleport CDN along with the SHA256 checksum and the digital signature. The Teleport updater will verify the Teleport artifact and install Teleport into the /var/lib/teleport directory.


digital signature

I think @russjones had some thoughts about scope and key management for this part.

sclevine · 2024-04-01T17:46:32Z

+
+Step 3: The Teleport updater must control all Teleport updates. To ensure this, the Teleport updater will no longer rely on the system package manager to install/update Teleport. Users will no longer be able to manually update Teleport through the system package manager. Instead, the Teleport updater will now download the Teleport artifacts from the Teleport CDN along with the SHA256 checksum and the digital signature. The Teleport updater will verify the Teleport artifact and install Teleport into the /var/lib/teleport directory.
+
+The Teleport updater must be able to resolve conflict for these two situations:


A few questions here:

For existing installations, would it make sense to run teleport-upgrade install automatically in post-install scripts, given that we can detect the proxy address?

Should we remove the existing teleport package if it exists, or just ensure that the auto-upgrader version of teleport is executed by the systemd service?

For existing installations, would it make sense to run teleport-upgrade install automatically in post-install scripts, given that we can detect the proxy address?

I was thinking the install script would run teleport-upgrade install right after installing the updater. Is what you're thinking a bit different? What would be the benefit of running teleport-upgrade install automatically in a post-install script?

Should we remove the existing teleport package if it exists, or just ensure that the auto-upgrader version of teleport is executed by the systemd service?

Hmm, I'm thinking the second option would be more resilient? I'm thinking if the teleport package is still available for installation, there is probably a good chance that users reinstall the package unintentionally?

sclevine · 2024-04-01T17:54:03Z

+The agent version should be made easier to configure using the Kubernetes or Cloud API. Modifying the agent version should not require reconciliation to be paused, and it should not require the Teleport proxy to be redeployed.
+
+A simple solution is to configure the Teleport proxies to now read the agent version from a monitored file on disk. Teleport Cloud will be able to easily and dynamically modify the agent version via the Kubernetes API.


Chatting with @russjones, we should centralize this configuration to a Teleport resource. For Cloud, we can schedule work to allow our controllers to make the auth changes directly.

Are we thinking we add configuration to this autoupdate_version resource proposed in the client tools rfd?

kind: autoupdate_version spec: # tools_version is the version of client tools the cluster will advertise. # Can be auto (match the version of the proxy) or an exact semver formatted # version. tools_version: auto|X.Y.Z

sclevine · 2024-04-01T18:01:06Z

+
+Step 4: The Teleport documentation should be updated to include a new section with instructions about how a user can build their own update automation.
+
+### Agent Version Management


Can you add the proposed API for this?

Seems like we need at least three fields:

Version

Upgrade time OR immediate (can be a specific timestamp -- Cloud can translate window -> time)

Jitter duration

And out-of-scope for this RFD, but eventually:

Client bucket ID, for staged rollouts (upgrader holds off until the ID matches locally configured value)

sclevine · 2025-03-04T20:21:38Z

Superseded by #39805.
Nearly all of the suggestions (including install script changes) were released in v17.3.

@bernardjkim closing this for now, but feel free to reopen if you have any concerns

Add rfd 0167

77a20fb

bernardjkim requested review from fheinecke, fspmarshall, hugoShaka and sclevine March 11, 2024 22:23

github-actions Bot requested a review from kimlisa March 11, 2024 22:24

github-actions Bot added rfd Request for Discussion size/md labels Mar 11, 2024

github-actions Bot requested a review from tcsc March 11, 2024 22:24

fheinecke added the no-changelog Indicates that a PR does not require a changelog entry label Mar 11, 2024

kimlisa removed their request for review March 12, 2024 07:12

Update

a4949b8

- Describe goals of the change - Add more details about reducing installtion complexity

fheinecke reviewed Mar 13, 2024

View reviewed changes

sclevine reviewed Mar 13, 2024

View reviewed changes

hugoShaka reviewed Mar 13, 2024

View reviewed changes

bernardjkim added 3 commits March 13, 2024 17:01

Update

c66ce2e

- Add details about downloads from Teleport CDN - Add details about updating the Teleport updater

Update

551e15a

- Add limitations for self-hosted - Add limitations for integrations

Add details about signing Teleport artifacts

6694303

sclevine reviewed Mar 14, 2024

View reviewed changes

Comment thread rfd/0167-auto-updates-change-proposal.md

Update

9820388

- Add suggestion to remove schedule from updater - Add suggestion simplify agent version management - Add suggestion to serve install script from proxy

jentfoo self-requested a review March 29, 2024 21:34

sclevine reviewed Apr 1, 2024

View reviewed changes

Update version management

b97bb96

sclevine mentioned this pull request Apr 3, 2024

Added RFD 0144 - Client Tools Updates #39805

Merged

sclevine closed this Mar 4, 2025


		Yum has a similar feature that can exclude packages from a system update. This can be done by specifying teleport-ent to be excluded in the /etc/yum.conf file.

		Step 3: The Teleport installation process should no longer rely on the package manager to download teleport-ent packages. Instead, the Teleport proxy will now serve the latest compatible version of the teleport-ent package. The Teleport updater will then be responsible for downloading and installing the teleport-ent packages from the Teleport proxy.

		![publishing](assets/0167-auto-updates-change-proposal/kube-deployment.png)

		## Current Limitations


		Step 2: Teleport must provide an alternate method of securely downloading the latest compatible version of Teleport. The Teleport CDN already serves Teleport artifacts and their SHA256 checksums. To ensure authenticity of these Teleport downloads, Teleport will now need to sign these artifacts and provide the digital signatures. The public key required to verify the digital signature will be baked into the Teleport updater.

		Step 3: The Teleport updater must control all Teleport updates. To ensure this, the Teleport updater will no longer rely on the system package manager to install/update Teleport. Users will no longer be able to manually update Teleport through the system package manager. Instead, the Teleport updater will now download the Teleport artifacts from the Teleport CDN along with the SHA256 checksum and the digital signature. The Teleport updater will verify the Teleport artifact and install Teleport into the /var/lib/teleport directory.


		Step 3: The Teleport updater must control all Teleport updates. To ensure this, the Teleport updater will no longer rely on the system package manager to install/update Teleport. Users will no longer be able to manually update Teleport through the system package manager. Instead, the Teleport updater will now download the Teleport artifacts from the Teleport CDN along with the SHA256 checksum and the digital signature. The Teleport updater will verify the Teleport artifact and install Teleport into the /var/lib/teleport directory.

		The Teleport updater must be able to resolve conflict for these two situations:

		The agent version should be made easier to configure using the Kubernetes or Cloud API. Modifying the agent version should not require reconciliation to be paused, and it should not require the Teleport proxy to be redeployed.

		A simple solution is to configure the Teleport proxies to now read the agent version from a monitored file on disk. Teleport Cloud will be able to easily and dynamically modify the agent version via the Kubernetes API.


		Step 4: The Teleport documentation should be updated to include a new section with instructions about how a user can build their own update automation.

		### Agent Version Management

Conversation

bernardjkim commented Mar 11, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions Bot commented Mar 11, 2024

Uh oh!

github-actions Bot commented Mar 11, 2024

Uh oh!

Choose a reason for hiding this comment

Uh oh!

hugoShaka Mar 13, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

hugoShaka Mar 13, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

hugoShaka Mar 13, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

sclevine commented Mar 4, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

bernardjkim commented Mar 11, 2024 •

edited

Loading

hugoShaka Mar 13, 2024 •

edited

Loading

hugoShaka Mar 13, 2024 •

edited

Loading

hugoShaka Mar 13, 2024 •

edited

Loading