Skip to content

RFD 0167: Automatic updates change proposal#39217

Closed
bernardjkim wants to merge 7 commits intomasterfrom
rfd/0167-auto-updates-change-proposal
Closed

RFD 0167: Automatic updates change proposal#39217
bernardjkim wants to merge 7 commits intomasterfrom
rfd/0167-auto-updates-change-proposal

Conversation

@bernardjkim
Copy link
Copy Markdown
Contributor

@bernardjkim bernardjkim commented Mar 11, 2024

Rendered

This RFD proposes a change to the automatic updates design. The design has a number of limitations that are incompatible with the needs of Teleport Cloud. This RFD provides an overview of the current issues and some potential solutions. The RFD includes minimal implementation details. A separate execution plan with more more implementation details will be created, if the change proposals are approved.

Related Issues:

@github-actions
Copy link
Copy Markdown
Contributor

The PR changelog entry failed validation: Changelog entry not found in the PR body. Please add a "no-changelog" label to the PR, or changelog lines starting with changelog: followed by the changelog entries for the PR.

@github-actions github-actions Bot requested a review from kimlisa March 11, 2024 22:24
@github-actions github-actions Bot added rfd Request for Discussion size/md labels Mar 11, 2024
@github-actions github-actions Bot requested a review from tcsc March 11, 2024 22:24
@github-actions
Copy link
Copy Markdown
Contributor

The PR changelog entry failed validation: Changelog entry not found in the PR body. Please add a "no-changelog" label to the PR, or changelog lines starting with changelog: followed by the changelog entries for the PR.

@fheinecke fheinecke added the no-changelog Indicates that a PR does not require a changelog entry label Mar 11, 2024
@kimlisa kimlisa removed their request for review March 12, 2024 07:12
- Describe goals of the change
- Add more details about reducing installtion complexity

Yum has a similar feature that can exclude packages from a system update. This can be done by specifying teleport-ent to be excluded in the /etc/yum.conf file.

Step 3: The Teleport installation process should no longer rely on the package manager to download teleport-ent packages. Instead, the Teleport proxy will now serve the latest compatible version of the teleport-ent package. The Teleport updater will then be responsible for downloading and installing the teleport-ent packages from the Teleport proxy.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the plan for getting the artifacts securely delivered to the proxy?

Copy link
Copy Markdown
Contributor

@hugoShaka hugoShaka Mar 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure we want to tunnel 500MiB blobs through the proxy to 10k agents in the same maintenance window. We'll have a bad availability, slow speed, pay a lot of bandwidth.

I would rather have the artifacts stored into an S3 bucket/CDN/whatever and the proxy to serve the bucket address and desired version.

If we part from the package manager, we must solve the trust issue, exactly like we had to do for Kubernetes. The artifacts must be signed. The public key can be baked into the updater, or retrieved dynamically, maybe from the proxy. Serving the key from the proxy offers less guarantees if someone can MITM with valid TLS, but it would allow us to rotate the signing key without contacting all customers to redeploy.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would rather have the artifacts stored into an S3 bucket/CDN/whatever and the proxy to serve the bucket address and desired version.

Being able to curl off the proxy would be nice, but it should redirect to a CDN.

Copy link
Copy Markdown
Contributor

@hugoShaka hugoShaka Mar 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure we want to send the CDN URL from the proxy. This trades a lot of security guarantees against a bit of usability (if we own the domain, we can host whatever we want on it).

  • if the agent reads the CDN URL from the proxy redirect: someone can takeover the proxy and have the agents install arbitrary things, especially if the proxy also serves the signature keys
  • if the agent reads the version from the proxy but has a built-in CDN URL, someone taking over the proxy cannot force agents to install arbitrary content. Only content hosted in the teleport CDN can be downloaded. So it would require taking over the CDN as well (else the only thing you can do is a downgrade attack)

I think a middle ground would be having the proxy return the artifact name and the updater set the domain. This way, we can move artifacts and change the CDN structure remotely, but agents are not installing something random. It would allow users to configure the updater to pull through a proxy/mirror/cache if they have company policies in place regarding downloads.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

However I'd love the install script from curl https://goteleport.com/static/install.sh | bash -s 15.1.4 to be served by the proxy.

So the installation method would be curl https://mytenant.teleport.sh/something/install.sh | bash

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What SLO do we have for auto updates and/or cloud? What SLO do we need to have for the release service (currently don't have any)?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if the agent reads the CDN URL from the proxy redirect: someone can takeover the proxy and have the agents install arbitrary things, especially if the proxy also serves the signature keys

So the installation method would be curl https://mytenant.teleport.sh/something/install.sh | bash

Difference here is just that it's less scary for initial install?

If I take over the proxy, I can serve arbitrary install and discover scripts.

Copy link
Copy Markdown
Contributor

@hugoShaka hugoShaka Mar 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I take over the proxy, I can serve arbitrary install and discover scripts.

Installation time and update time are not equivalent. If you take over a proxy, you can compromise new agents by changing the bash script served at install time. However, you cannot compromise existing agents, that's not a 0-click escalation.

This is already the case for regular joins, the agent just trusts whatever auth cert is returned by the proxy during theinitial join. The agent can be stolen if you have valid TLS certs. However, an already enrolled agent will check the cluster CAs and cannot be stolen. The automatic update mechanism should provide the same security guarantees.

Proving the bash script authenticity is hard because:

  • we don't have established trust except via the OS trust store, which is kinda weak
  • the dynamic part injected by the proxy (version, URL, automatic updates, oss/ent, architecture, package manager, ...) blocks us from hashing the bash script

I'm not sure we have to solve the "bootstrapping in a compromised environment" problem as Teleport doesn't even solve it for a regular agent join (CA pins are not used when joining via proxy).

### Deprecate the stable/cloud teleport-ent package
Currently, the teleport-ent-updater package requires the teleport-ent package as a dependency. This means that the user must install the latest version of the teleport-ent package which may or may not be compatible with their Teleport control plane, or they must first specify a compatible version of teleport-ent to install. This puts unnecessary burden on the user, and complicates the installation process.

Step 1: To remove this burden from the user and simplify the installation process, the Teleport updater will support an install command. The install command accepts the necessary configuration and then installs the latest compatible version of the teleport-ent package for the user.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We discussed removing the package entirely. Are you still working on this section?

Comment on lines +75 to +76
## Current Limitations

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would like to add another current limitation: the automatic update process only covers the agents. Not the integrations.

Since v15 we have Teleport Operator users against Cloud instances, we already had users self-hosting plugins (see slack link).

I suspect that as Teleport Cloud grows and operator adoption increases, similar issues will arise. I think the proposed solution should be designed to be extended in the future to cover the integrations, at least self-hosted plugins and the operator.

![publishing](assets/0167-auto-updates-change-proposal/kube-deployment.png)

## Current Limitations

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another limitation Cloud is not facing directly is that automatic updates are cloud-only. Because AUs are not properly supported and adopted everywhere, in Teleport code we have a special execution path "if Cloud". This increases the difference between cloud and self-hosted, the probability of a bug happening. This path is typically less tested and expensive to maintain.

Getting rid of the special case would prevent incidents like the discovery one from 2 weeks ago.

- Add details about downloads from Teleport CDN
- Add details about updating the Teleport updater
- Add limitations for self-hosted
- Add limitations for integrations
Comment thread rfd/0167-auto-updates-change-proposal.md
- Add suggestion to remove schedule from updater
- Add suggestion simplify agent version management
- Add suggestion to serve install script from proxy
@jentfoo jentfoo self-requested a review March 29, 2024 21:34

Step 2: Teleport must provide an alternate method of securely downloading the latest compatible version of Teleport. The Teleport CDN already serves Teleport artifacts and their SHA256 checksums. To ensure authenticity of these Teleport downloads, Teleport will now need to sign these artifacts and provide the digital signatures. The public key required to verify the digital signature will be baked into the Teleport updater.

Step 3: The Teleport updater must control all Teleport updates. To ensure this, the Teleport updater will no longer rely on the system package manager to install/update Teleport. Users will no longer be able to manually update Teleport through the system package manager. Instead, the Teleport updater will now download the Teleport artifacts from the Teleport CDN along with the SHA256 checksum and the digital signature. The Teleport updater will verify the Teleport artifact and install Teleport into the /var/lib/teleport directory.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

digital signature

I think @russjones had some thoughts about scope and key management for this part.


Step 3: The Teleport updater must control all Teleport updates. To ensure this, the Teleport updater will no longer rely on the system package manager to install/update Teleport. Users will no longer be able to manually update Teleport through the system package manager. Instead, the Teleport updater will now download the Teleport artifacts from the Teleport CDN along with the SHA256 checksum and the digital signature. The Teleport updater will verify the Teleport artifact and install Teleport into the /var/lib/teleport directory.

The Teleport updater must be able to resolve conflict for these two situations:
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few questions here:

  • For existing installations, would it make sense to run teleport-upgrade install automatically in post-install scripts, given that we can detect the proxy address?
  • Should we remove the existing teleport package if it exists, or just ensure that the auto-upgrader version of teleport is executed by the systemd service?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For existing installations, would it make sense to run teleport-upgrade install automatically in post-install scripts, given that we can detect the proxy address?

I was thinking the install script would run teleport-upgrade install right after installing the updater. Is what you're thinking a bit different? What would be the benefit of running teleport-upgrade install automatically in a post-install script?

Should we remove the existing teleport package if it exists, or just ensure that the auto-upgrader version of teleport is executed by the systemd service?

Hmm, I'm thinking the second option would be more resilient? I'm thinking if the teleport package is still available for installation, there is probably a good chance that users reinstall the package unintentionally?

Comment on lines +144 to +146
The agent version should be made easier to configure using the Kubernetes or Cloud API. Modifying the agent version should not require reconciliation to be paused, and it should not require the Teleport proxy to be redeployed.

A simple solution is to configure the Teleport proxies to now read the agent version from a monitored file on disk. Teleport Cloud will be able to easily and dynamically modify the agent version via the Kubernetes API.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Chatting with @russjones, we should centralize this configuration to a Teleport resource. For Cloud, we can schedule work to allow our controllers to make the auth changes directly.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we thinking we add configuration to this autoupdate_version resource proposed in the client tools rfd?

kind: autoupdate_version
spec:
  # tools_version is the version of client tools the cluster will advertise.
  # Can be auto (match the version of the proxy) or an exact semver formatted
  # version.
  tools_version: auto|X.Y.Z


Step 4: The Teleport documentation should be updated to include a new section with instructions about how a user can build their own update automation.

### Agent Version Management
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add the proposed API for this?

Seems like we need at least three fields:

  • Version
  • Upgrade time OR immediate (can be a specific timestamp -- Cloud can translate window -> time)
  • Jitter duration

And out-of-scope for this RFD, but eventually:

  • Client bucket ID, for staged rollouts (upgrader holds off until the ID matches locally configured value)

@sclevine
Copy link
Copy Markdown
Member

sclevine commented Mar 4, 2025

Superseded by #39805.
Nearly all of the suggestions (including install script changes) were released in v17.3.

@bernardjkim closing this for now, but feel free to reopen if you have any concerns

@sclevine sclevine closed this Mar 4, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

no-changelog Indicates that a PR does not require a changelog entry rfd Request for Discussion size/md

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants