Skip to content

RFD: cloud agent auto upgrades ux#24865

Closed
xinding33 wants to merge 1 commit intomasterfrom
xinding33/rfd-cloud-auto-upgrades
Closed

RFD: cloud agent auto upgrades ux#24865
xinding33 wants to merge 1 commit intomasterfrom
xinding33/rfd-cloud-auto-upgrades

Conversation

@xinding33
Copy link
Copy Markdown
Contributor

@xinding33 xinding33 commented Apr 20, 2023

Teleport will add support for Agent Auto Upgrades in v13.0.0. This PR adds an RFD that details tactical UX changes to expose the Agent Auto Upgrade capabilities to Teleport Cloud Admins.

@github-actions github-actions Bot requested review from jimbishopp and zmb3 April 20, 2023 00:16
@github-actions github-actions Bot added rfd Request for Discussion size/sm labels Apr 20, 2023
@xinding33 xinding33 requested a review from r0mant April 20, 2023 00:17
@zmb3 zmb3 changed the title Add rfd for cloud agent auto upgrades ux RFD: cloud agent auto upgrades ux Apr 20, 2023
Comment on lines +54 to +56
2. Notify Admins via Cluster Alerts when Unaccompanied Agents are detected. The
Cluster Alert should also present a docs link informing Admins how to
properly add an auto upgrader service.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By design, we can't really know if this is the case. The updater is teleport-idependant so that the agent can fail and still be updated. The agent only knows if it should export its maintenance schedule, not if there's an updater reading it.

We can detect unaccompanied agents in two ways:

  • the agent is not configured to export its maintenance schedule. As described earlier this is not 100% equivalent to being unaccompanied: you can be accompanied without maintenance schedule (update asap) or export the schedule but have a broken/suspended updater or no updater at all.
  • the agent is not running the right version. This second approach might closer to what we want to achieve: don't have to deal with version skews.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@xinding33 Can we just update our existing "your agents are outdated" alerts with a link to docs on how to enable auto-upgrades for Cloud customers?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@hugoShaka You're right, we likely don't need to technically actually identify Unaccompanied Agents. Admins will be happy as long as we solve the following:

  1. Notify Admins when Agents are not running the latest version.
  2. Help Admins identify exactly which Agent(s) is/are problematic.
  3. Help Admins "fix" said Agent(s) once and for all by providing instructions on 1) how to upgrade the Agent and 2) how to add an Agent Auto Upgrader Service to the Agent.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@r0mant IMO, that's not sufficient because it doesn't satisfy 2 and 3 of the comment above.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@xinding33 we've started discussing implementation of this RFD and need some clarification on a few points:

Help Admins identify exactly which Agent(s) is/are problematic.

Do we actually want users to be able to list all agents that are older than a given version and/or aren't configured to use an upgrader, or do we just want the alert message to include some examples of offending agents to help users catch agents that fell through the cracks?

Actually being able to list offending agents is obviously nicer, but is also a much more complex ask since it requires us to either implement the logic for tracking and filtering on upgrader status for each service type (brittle and annoying to maintain), or to enable the per-agent "instance" heartbeat which we switched to disabled by default when we opted to use the more weakly coordinated upgrade system (known to cause scalability issues for some etcd-based deployments). Both options are doable, but neither is really ideal.

Including a few hostnames in an alert is comparatively easy, since each auth server can directly observe information about the agents connected to it, and write them to a resource. But that has the downside of only really being useful in the case where agents are generally OK, but one or two fell through the cracks.

Help Admins "fix" said Agent(s) once and for all by providing instructions on 1) how to upgrade the Agent and 2) how to add an Agent Auto Upgrader Service to the Agent.

Are you thinking of something more than just pointing a user at the docs? If so what?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@fspmarshall

Do we actually want users to be able to list all agents that are older than a given version and/or aren't configured to use an upgrader, or do we just want the alert message to include some examples of offending agents to help users catch agents that fell through the cracks?

Yes, we want users to be able to list all agents that are out-of-date. That's where the real value is because then users can take that list and fix everything. I don't have opinions on how we should go about the implementation.

Are you thinking of something more than just pointing a user at the docs? If so what?

We definitely want to link to docs but if we can provide a one-liner that helps users fix the problem 90% of the time, that'd be an excellent addition.

Comment on lines +28 to +29
* (Teleport) Agent: The `teleport` process which can run one more more Teleport
services.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: this definition covers teleport processes running auth and proxy, even if cloud users are not supposed to run them.

reduce the significant workload associated with manually managing and upgrading
a fleet of Teleport Agents.

## How
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This section doesn't differentiate between tenants enrolled in automatic updates and tenants who are not. Even if in the end we want to enroll everyone, we might want to explain that this only applies to the tenants enrolled in this new mode.

Comment on lines +67 to +68
* An easy (ideally one command) way to add an auto upgrader service to an
Unaccompanied Teleport Agent. No newline at end of file
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is already doable. apt install teleport-ent-updater and helm upgrade --reuse-values my-agent-release teleport/teleport-kube-agent --set updater.enabled=true.

My biggest question is how do we expose this command to the user? Through teleport discover? Docs-only? Can we have something Teleport Assist-style that runs the command on the behalf of the users?

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@fspmarshall @fheinecke Will installing an updater package if you already have teleport installed be sufficient for enabling auto-upgrades?

@hugoShaka I think we should include this in the guides you're working on, for starters.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, installing the updater package on an existing teleport agent is all you need to do. The updater package's install process automatically configures teleport to start exporting schedules and restarts it.

Comment on lines +61 to +62
* Any commands exposed in the Teleport Web UI, including those in Teleport
Discover.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think most of the work was already done in: #22731, especially for Helm.

The only remaining thing would be to change the package name for apt/yum installs.

services.
* (Teleport) Cluster Alerts: Messages in the Teleport Web UI and appropriate
Teleport CLIs that alert Practitioners of relevant concerns.
* Unaccompanied Agent: A Teleport Agent deployed without an accompanying auto upgrader
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It feels like we've defined this as the inverse of an accompanied agent, but we haven't defined what an accompanied agent or what an auto upgrader service is yet.

* Unaccompanied Agent: A Teleport Agent deployed without an accompanying auto upgrader
service.

## Why
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would mention the driver for making it easy to keep agents up to date - so that customers agents stay compatible with their control plane that we host.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we happy with a customer not relying on our automatic updates but with strong automation quickly aligning their agents version with our target version?

Copy link
Copy Markdown
Contributor Author

@xinding33 xinding33 Apr 27, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@hugoShaka Theoretically, we're ok with that. But in practice, very few customers actually do that. We want to make it easy for customers to adopt best practice with the least amount of friction, so we want them to do it our way unless they know exactly what they're doing.


## How

There are two key tactical UX change that will increase the adoption of Agent
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
There are two key tactical UX change that will increase the adoption of Agent
There are two key tactical UX changes that will increase the adoption of Agent

Comment on lines +54 to +56
2. Notify Admins via Cluster Alerts when Unaccompanied Agents are detected. The
Cluster Alert should also present a docs link informing Admins how to
properly add an auto upgrader service.
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@xinding33 Can we just update our existing "your agents are outdated" alerts with a link to docs on how to enable auto-upgrades for Cloud customers?


The first tactical change requires modifications to:

* https://goteleport.com/download/.
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have we decided how we want to determine whether it's a cloud or self-hosted user that's downloading from the Downloads page? IIRC that was the main question. Do we want to add "scopes" to the Downloads page similar to how we have scopes in the documentation? Or something else?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@r0mant IMO, we shouldn't add scope to the Downloads page. Rather, we should add required options/flags in the package manager commands that differentiate between cloud and self-hosted.

Comment on lines +67 to +68
* An easy (ideally one command) way to add an auto upgrader service to an
Unaccompanied Teleport Agent. No newline at end of file
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@fspmarshall @fheinecke Will installing an updater package if you already have teleport installed be sufficient for enabling auto-upgrades?

@hugoShaka I think we should include this in the guides you're working on, for starters.

@r0mant r0mant requested review from fheinecke and fspmarshall May 9, 2023 17:13

1. Where possible, push Admins to deploy Teleport Agents via supported package
managers (i.e. `helm`, `apt`, and `yum`) as this will deploy the
`teleport-ent-cloud-updater` package alongside the `teleport` package. It's
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
`teleport-ent-cloud-updater` package alongside the `teleport` package. It's
`teleport-ent-cloud-updater` package alongside the `teleport-ent` package. It's

Comment on lines +47 to +49
1. Where possible, push Admins to deploy Teleport Agents via supported package
managers (i.e. `helm`, `apt`, and `yum`) as this will deploy the
`teleport-ent-cloud-updater` package alongside the `teleport` package. It's
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as this will deploy the
teleport-ent-cloud-updater package alongside the teleport package

To be clear, installing teleport still only installs teleport, but installing the updater causes teleport to also be installed if it wasn't already.

@russjones
Copy link
Copy Markdown
Contributor

I am going to close this one out since it's been superseded by Managed Updates.

@russjones russjones closed this Aug 27, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

rfd Request for Discussion size/sm

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants