Add Trigger, Rollback, ForceDone autoupdate RPCs by hugoShaka · Pull Request #52931 · gravitational/teleport

hugoShaka · 2025-03-10T18:50:27Z

PR 2/4 adding manual rollout control as specified in RFD 184.

This PR adds manual rollout RPCs. Audit events and tctl commands will be added in a followup PR.

sclevine · 2025-03-11T21:12:18Z

+	tries := 0
+	const maxTries = 3
+
+	for {


Curious about the motivation for repeatedly retrying on conflict here. Is there something we could watch instead?

(Not blocking, just want to understand the architecture better.)

The point of "optimistic" locking is that in the happy path where there's no contention, you'll get a single read and a single write. The downside of that is that if there's contention you're gonna have to retry once or twice, and it's a little nicer if we do a couple of retries in the auth where we're closer to the backend rather than having to retry from the client. There's no way to know if a write will succeed or to wait for a write to succeed, the only point at which such a decision can be made is the conditional update.

I understand the intent behind optimistic locking, but it's unusual to me to see it applied with an arbitrary retry count directly on the backend, vs., e.g., retrying a transaction. I'm assuming we don't have a transaction-like concept given the multitude of backends, and the optimistic lock is our only tool? (Just want to understand this better, not suggesting a change)

This is, in fact, a transaction (-ish): get some things, make some decisions based on the things, figure out what to write and what are the conditions for the write to go through and not need a retry of the whole transaction, attempt to apply the write. The underlying dynamodb rpc that's used in more complicated situations that involve more than one item (which we surface as (backend.Backend).AtomicWrite) is literally called dynamodb:TransactWriteItems, but for simpler scenarios where the condition is on the revision of the same item that's being written, it's just a conditional write (exposed as (backend.Backend).ConditionalUpdate).

espadolini · 2025-03-12T08:53:26Z

+	tries := 0
+	const maxTries = 3
+
+	for {


The point of "optimistic" locking is that in the happy path where there's no contention, you'll get a single read and a single write. The downside of that is that if there's contention you're gonna have to retry once or twice, and it's a little nicer if we do a couple of retries in the auth where we're closer to the backend rather than having to retry from the client. There's no way to know if a write will succeed or to wait for a write to succeed, the only point at which such a decision can be made is the conditional update.

* Add Trigger, Rollback, ForceDone autoupdate RPCs * Add all_started_groups bool + switch to group set * fix error type

* Add rollout mutation functions (#52930) * Add Trigger, Rollback, ForceDone autoupdate RPCs (#52931) * Add Trigger, Rollback, ForceDone autoupdate RPCs * Add all_started_groups bool + switch to group set * fix error type * Align semver libs (#52795) * Convert autoupdate version handling to coreos/go-semver * get the right version in installer endpoint + get rid of x/mod/semver * depguard x/mod/semver * Add nolint rules for existing x/mod/semver usages * Add depguard explanation * Add autoupdate trigger/mark-done/rollback commands (#52933) * Add updater info in Hello (#53911) * Introduce autoupdate_agent_report proto types (#54175) * Introduce autoupdate_agent_report proto types * Fix tests + remove delete all RPC * Move updater info proto from authclient to types (#54236) * Report updater info in Hello (#53938) * Report updater info in Hello * Add UUID to Hello * lint * Fix after rebase --------- Co-authored-by: Stephen Levine <stephen.levine@goteleport.com> * Send goodbye even when doing soft-reload (#54176) * Send goodbye even when doing soft-reload * Save and replay Goodbye on connect * Add SoftReload flag to Goodbye * SendGoodbye -> SetAndSendGoodbye * Display update group in `tctl inventory` (#54324) * Add autoupdate manual rollout audit events (#52934) * Add autoupdate trigger/merk-done/rollback audit events * Remove useless resource metadata and add groups to audit event * Add events to web UI * Add autoupdate_agent_report backend service (#54333) * Add autoupdate_agent_report backend service * Saner resource validation * Add agent rollout cache + service + client (#54772) * Add agent rollout cache + service + client * fix after rebase * add event in tests * fix autoupdateagenmtreport event streaming * lint * Fix backport: slog -> logrus * Generate autoupdate agent report periodically (#54865) * Generate autoupdate agent report periodically * address edoardo's feedback * Apply suggestions from code review Co-authored-by: Edoardo Spadolini <edoardo.spadolini@goteleport.com> * fix proto field lookup + address feedback * fix tests + add license --------- Co-authored-by: Edoardo Spadolini <edoardo.spadolini@goteleport.com> * Add omission info in autoupdate report (#55001) * Add agent counters to autoupdate_agent_rollout proto (#55096) * Add agent counters to autoupdate_agent_rollout proto * int64 -> uint64 * Add reports to client and rewrite mockClient using testify (#55097) * Add reports to client and rewrite mockClient using testify When adding the ListAutoUpdateAgentReports() function to the Client interface I realized that the mock client was not supporting List endpoints. Instead of expanding the custom mock system, I rewrote the mock client to use the standard testify/mock library. * checkIfEmpty -> checkIfCallsWereDone * Make halt-on-error autoupdate strategy use agent reports (#55116) * Make half-on-error autoupdate strategy use agent reports * Make report helpers reusable for time-based strategy * address edoardo's feedback * Set the agent count when reconciling time-based rollouts (#55152) * Set the agent count when reconciling time-based rollouts * Apply suggestions from code review Co-authored-by: Stephen Levine <stephen.levine@goteleport.com> --------- Co-authored-by: Stephen Levine <stephen.levine@goteleport.com> * Fix flaky `TestServer_generateAgentVersionReport` (#56015) * [v18] Add autoupdate agent report commands (#56495) * Add autoupdate agent report commands * Address feedback * autoupdate canary support: proto messages (#56259) * autoupdate canary support: inventory and auth primitives (#56261) * autoupdate canary support: tctl (#56473) * autoupdate canary support: tctl support This commits makes `tctl autoupdate agents status` display groups in the canary state properly. * add `--force` flag to `tctl autoupdate agents start-update` * autoupdate canary support: modulate proxy response (#56468) This commit makes the TEleport Proxcy service find and pind endpoints fetch the updater ID from the request parameters and lookup if the requestor is a canary. If it is, the requestor will be told to update. * autoupdate canary support: rollout controller (#56467) * autoupdate canary support: rollout controller This commit adds canary support to the autoupdate_agent_rollout controller when the strategy is "halt-on-error". * Apply suggestions from code review * Fix backport: add inventory clock + deal with edoardo breaking everything * Fix tests after backport * fixup! Fix tests after backport * fixup! fixup! Fix tests after backport * lint authproto -> clientproto * Fix autoupdate canary sampling for the catch-all group * Tune the canary logic (#56926) - Users can now specify how many canaries they want - Instead of looking at the current group size, we rely on user input - max canary 10 -> 5 (I have not done the max message size yet) - fix a bug causing the start date to be reset when doing canary -> active * Reliably detect update.yaml after soft reloads * always send group in agent hello (#55071) * Fix detection on initial install * fix log * Always persist new configuration * cleanup * fix tests * fix tests relying on go 1.24 * fix crd snapshot tests + fix linter issue --------- Co-authored-by: Stephen Levine <stephen.levine@goteleport.com> Co-authored-by: Edoardo Spadolini <edoardo.spadolini@goteleport.com>

hugoShaka added the no-changelog Indicates that a PR does not require a changelog entry label Mar 10, 2025

hugoShaka requested review from sclevine and vapopov March 10, 2025 18:50

github-actions bot added the size/md label Mar 10, 2025

github-actions bot requested review from capnspacehook and espadolini March 10, 2025 18:51

sclevine approved these changes Mar 11, 2025

View reviewed changes

espadolini reviewed Mar 12, 2025

View reviewed changes

hugoShaka force-pushed the hugo/add-manual-rollout-rpcs branch from e6845bf to 9dc0a65 Compare March 13, 2025 11:19

hugoShaka requested review from camscale, codingllama, fheinecke, klizhentas, r0mant, rosstimothy, russjones, tcsc and zmb3 as code owners March 13, 2025 11:19

hugoShaka force-pushed the hugo/add-rollout-mutation-functions branch from d12dd7c to 6087b1e Compare March 13, 2025 11:19

hugoShaka requested a review from espadolini March 13, 2025 11:22

espadolini reviewed Mar 14, 2025

View reviewed changes

Comment thread api/client/client.go

espadolini reviewed Mar 14, 2025

View reviewed changes

Comment thread lib/auth/autoupdate/autoupdatev1/service.go Outdated

hugoShaka force-pushed the hugo/add-manual-rollout-rpcs branch from 9dc0a65 to 94db9ce Compare March 18, 2025 15:39

hugoShaka force-pushed the hugo/add-rollout-mutation-functions branch 2 times, most recently from f98f8cf to fda5630 Compare March 19, 2025 08:50

Base automatically changed from hugo/add-rollout-mutation-functions to master March 19, 2025 09:44

Add Trigger, Rollback, ForceDone autoupdate RPCs

949f253

hugoShaka force-pushed the hugo/add-manual-rollout-rpcs branch from 94db9ce to 949f253 Compare March 19, 2025 11:50

Add all_started_groups bool + switch to group set

0a93388

hugoShaka requested a review from espadolini March 19, 2025 13:55

hugoShaka removed request for camscale, capnspacehook, codingllama, fheinecke, klizhentas, rosstimothy, russjones, tcsc and zmb3 March 19, 2025 13:56

fix error type

f8c53f9

espadolini approved these changes Mar 19, 2025

View reviewed changes

vapopov approved these changes Mar 19, 2025

View reviewed changes

public-teleport-github-review-bot bot removed the request for review from r0mant March 19, 2025 16:03

hugoShaka added this pull request to the merge queue Mar 19, 2025

Merged via the queue into master with commit f6dccd8 Mar 19, 2025
43 checks passed

hugoShaka deleted the hugo/add-manual-rollout-rpcs branch March 19, 2025 17:48

hugoShaka added a commit that referenced this pull request Jul 18, 2025

Add Trigger, Rollback, ForceDone autoupdate RPCs (#52931)

8f1c8c2

* Add Trigger, Rollback, ForceDone autoupdate RPCs * Add all_started_groups bool + switch to group set * fix error type

hugoShaka added a commit that referenced this pull request Jul 24, 2025

Add Trigger, Rollback, ForceDone autoupdate RPCs (#52931)

a719a2a

* Add Trigger, Rollback, ForceDone autoupdate RPCs * Add all_started_groups bool + switch to group set * fix error type

hugoShaka added a commit that referenced this pull request Jul 25, 2025

Add Trigger, Rollback, ForceDone autoupdate RPCs (#52931)

750d41e

* Add Trigger, Rollback, ForceDone autoupdate RPCs * Add all_started_groups bool + switch to group set * fix error type

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Trigger, Rollback, ForceDone autoupdate RPCs#52931

Add Trigger, Rollback, ForceDone autoupdate RPCs#52931
hugoShaka merged 3 commits intomasterfrom
hugo/add-manual-rollout-rpcs

hugoShaka commented Mar 10, 2025

Uh oh!

Uh oh!

sclevine Mar 11, 2025

Uh oh!

espadolini Mar 12, 2025 •

edited

Loading

Uh oh!

sclevine Mar 17, 2025

Uh oh!

espadolini Mar 19, 2025 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

espadolini Mar 12, 2025 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

hugoShaka commented Mar 10, 2025

Uh oh!

Uh oh!

sclevine Mar 11, 2025

Choose a reason for hiding this comment

Uh oh!

espadolini Mar 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

sclevine Mar 17, 2025

Choose a reason for hiding this comment

Uh oh!

espadolini Mar 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

espadolini Mar 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

espadolini Mar 12, 2025 •

edited

Loading

espadolini Mar 19, 2025 •

edited

Loading

espadolini Mar 12, 2025 •

edited

Loading