Skip to content

Run the readiness logic synchronously#62178

Merged
espadolini merged 4 commits intomasterfrom
espadolini/synchronous-processstate
Dec 12, 2025
Merged

Run the readiness logic synchronously#62178
espadolini merged 4 commits intomasterfrom
espadolini/synchronous-processstate

Conversation

@espadolini
Copy link
Copy Markdown
Contributor

@espadolini espadolini commented Dec 11, 2025

This PR moves the lib/service.processState machinery at the LocalSupervisor level, so that (*LocalSupervisor).BroadcastEvent can update the process state synchronously instead of relying on a monitoring goroutine. This avoids problems with readiness caused by events being broadcast before the monitoring goroutine is in place, either because of sequencing or because of a goroutine race.

changelog: fixed Teleport instances running the Auth Service sometimes not becoming ready during initialization

@espadolini espadolini added this pull request to the merge queue Dec 12, 2025
Merged via the queue into master with commit 25f9819 Dec 12, 2025
42 of 43 checks passed
@espadolini espadolini deleted the espadolini/synchronous-processstate branch December 12, 2025 11:27
@backport-bot-workflows
Copy link
Copy Markdown
Contributor

@espadolini See the table below for backport results.

Branch Result
branch/v18 Failed

21KennethTran pushed a commit that referenced this pull request Jan 6, 2026
* Run the readiness logic synchronously

* Document special event handling in BroadcastEvent

* Make NewSupervisor fallible

* Move logging out of the processState state machine
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would you be able to backport this to v17? Is it feasible? I just got a TestDebugService hit on v17.

Copy link
Copy Markdown
Contributor

@hugoShaka hugoShaka Mar 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I vote against backporting this. Each time we changed readiness we broke things, and this happened many times. v17 should be kept stable. This is a high risk change for a low reward.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't have a great track record in touching readiness without breaking it and v17 is supposed to be stable; unless the test proves to be especially flaky in v17 I'd prefer not to touch anything. 😬

The backport would also not be clean without including something like #61620 (partial #59667 and #59907), which adds to the risk of either getting a tweaked backport wrong or to add more changes than necessary to a stable release.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Understandable! Thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants