-
Notifications
You must be signed in to change notification settings - Fork 2.1k
RFD: cloud agent auto upgrades ux #24865
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change | ||||
|---|---|---|---|---|---|---|
| @@ -0,0 +1,68 @@ | ||||||
| --- | ||||||
| authors: Xin Ding (xin@goteleport.com) | ||||||
| state: draft | ||||||
| --- | ||||||
|
|
||||||
| # RFD 120 - Cloud Agent Auto Upgrades UX | ||||||
|
|
||||||
| ## Required Approvals | ||||||
|
|
||||||
| * Engineering: @jimbishopp && @r0mant | ||||||
| * Product: @klizhentas | ||||||
| * Security: @reedloden | ||||||
|
|
||||||
| ## What | ||||||
|
|
||||||
| Teleport will add support for Agent Auto Upgrades in v13.0.0. This document | ||||||
| details tactical UX changes to expose the Agent Auto Upgrade capabilities to | ||||||
| Teleport Cloud Admins. | ||||||
|
|
||||||
| ### Terminology | ||||||
|
|
||||||
| * (Teleport) Practitioner: People who are actively using Teleport in their | ||||||
| day-to-day. | ||||||
| * (Teleport) Admins: A subset of Practitioners. Admins deploy, configure, and | ||||||
| otherwise set up Teleport. | ||||||
| * (Teleport) End Users: A subset of Practitioners. End Users use Teleport to | ||||||
| access infrastructure resources. | ||||||
| * (Teleport) Agent: The `teleport` process which can run one more more Teleport | ||||||
| services. | ||||||
| * (Teleport) Cluster Alerts: Messages in the Teleport Web UI and appropriate | ||||||
| Teleport CLIs that alert Practitioners of relevant concerns. | ||||||
| * Unaccompanied Agent: A Teleport Agent deployed without an accompanying auto upgrader | ||||||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It feels like we've defined this as the inverse of an accompanied agent, but we haven't defined what an accompanied agent or what an auto upgrader service is yet. |
||||||
| service. | ||||||
|
|
||||||
| ## Why | ||||||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I would mention the driver for making it easy to keep agents up to date - so that customers agents stay compatible with their control plane that we host.
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Are we happy with a customer not relying on our automatic updates but with strong automation quickly aligning their agents version with our target version?
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @hugoShaka Theoretically, we're ok with that. But in practice, very few customers actually do that. We want to make it easy for customers to adopt best practice with the least amount of friction, so we want them to do it our way unless they know exactly what they're doing. |
||||||
|
|
||||||
| Simply adding support for Agent Auto Upgrades doesn't help if Admins don't adopt | ||||||
| it. We want all Teleport customers to enable Agent Auto Upgrades in order to | ||||||
| reduce the significant workload associated with manually managing and upgrading | ||||||
| a fleet of Teleport Agents. | ||||||
|
|
||||||
| ## How | ||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This section doesn't differentiate between tenants enrolled in automatic updates and tenants who are not. Even if in the end we want to enroll everyone, we might want to explain that this only applies to the tenants enrolled in this new mode. |
||||||
|
|
||||||
| There are two key tactical UX change that will increase the adoption of Agent | ||||||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||
| Auto Upgrades among Teleport Cloud Admins. | ||||||
|
|
||||||
| 1. Where possible, push Admins to deploy Teleport Agents via supported package | ||||||
| managers (i.e. `helm`, `apt`, and `yum`) as this will deploy the | ||||||
| `teleport-ent-cloud-updater` package alongside the `teleport` package. It's | ||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
Comment on lines
+47
to
+49
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
To be clear, installing teleport still only installs teleport, but installing the updater causes teleport to also be installed if it wasn't already. |
||||||
| important that we provide a method (potentially via an optional flag with a | ||||||
| sane default) for Admins to point the auto upgrader service to the right | ||||||
| server. Post installation, the script should also inform Admins about how | ||||||
| Agent Auto Upgrades works, with an accompanying docs link to more details. | ||||||
| 2. Notify Admins via Cluster Alerts when Unaccompanied Agents are detected. The | ||||||
| Cluster Alert should also present a docs link informing Admins how to | ||||||
| properly add an auto upgrader service. | ||||||
|
Comment on lines
+54
to
+56
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. By design, we can't really know if this is the case. The updater is teleport-idependant so that the agent can fail and still be updated. The agent only knows if it should export its maintenance schedule, not if there's an updater reading it. We can detect unaccompanied agents in two ways:
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @xinding33 Can we just update our existing "your agents are outdated" alerts with a link to docs on how to enable auto-upgrades for Cloud customers?
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @hugoShaka You're right, we likely don't need to technically actually identify Unaccompanied Agents. Admins will be happy as long as we solve the following:
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @r0mant IMO, that's not sufficient because it doesn't satisfy 2 and 3 of the comment above.
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @xinding33 we've started discussing implementation of this RFD and need some clarification on a few points:
Do we actually want users to be able to list all agents that are older than a given version and/or aren't configured to use an upgrader, or do we just want the alert message to include some examples of offending agents to help users catch agents that fell through the cracks? Actually being able to list offending agents is obviously nicer, but is also a much more complex ask since it requires us to either implement the logic for tracking and filtering on upgrader status for each service type (brittle and annoying to maintain), or to enable the per-agent "instance" heartbeat which we switched to disabled by default when we opted to use the more weakly coordinated upgrade system (known to cause scalability issues for some etcd-based deployments). Both options are doable, but neither is really ideal. Including a few hostnames in an alert is comparatively easy, since each auth server can directly observe information about the agents connected to it, and write them to a resource. But that has the downside of only really being useful in the case where agents are generally OK, but one or two fell through the cracks.
Are you thinking of something more than just pointing a user at the docs? If so what?
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Yes, we want users to be able to list all agents that are out-of-date. That's where the real value is because then users can take that list and fix everything. I don't have opinions on how we should go about the implementation.
We definitely want to link to docs but if we can provide a one-liner that helps users fix the problem 90% of the time, that'd be an excellent addition. |
||||||
|
|
||||||
| The first tactical change requires modifications to: | ||||||
|
|
||||||
| * https://goteleport.com/download/. | ||||||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Have we decided how we want to determine whether it's a cloud or self-hosted user that's downloading from the Downloads page? IIRC that was the main question. Do we want to add "scopes" to the Downloads page similar to how we have scopes in the documentation? Or something else?
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @r0mant IMO, we shouldn't add scope to the Downloads page. Rather, we should add required options/flags in the package manager commands that differentiate between cloud and self-hosted. |
||||||
| * Any commands exposed in the Teleport Web UI, including those in Teleport | ||||||
| Discover. | ||||||
|
Comment on lines
+61
to
+62
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think most of the work was already done in: #22731, especially for Helm. The only remaining thing would be to change the package name for apt/yum installs. |
||||||
|
|
||||||
| The second tactical change requires: | ||||||
|
|
||||||
| * Detection of and Cluster Alerts associated with Unaccompanied Teleport Agents. | ||||||
| * An easy (ideally one command) way to add an auto upgrader service to an | ||||||
| Unaccompanied Teleport Agent. | ||||||
|
Comment on lines
+67
to
+68
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think this is already doable. My biggest question is how do we expose this command to the user? Through teleport discover? Docs-only? Can we have something Teleport Assist-style that runs the command on the behalf of the users?
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @fspmarshall @fheinecke Will installing an updater package if you already have teleport installed be sufficient for enabling auto-upgrades? @hugoShaka I think we should include this in the guides you're working on, for starters.
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes, installing the updater package on an existing teleport agent is all you need to do. The updater package's install process automatically configures teleport to start exporting schedules and restarts it. |
||||||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: this definition covers teleport processes running auth and proxy, even if cloud users are not supposed to run them.