Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Restore relative node expiry to original behavior #38258

Merged
merged 1 commit into from
Feb 15, 2024

Conversation

rosstimothy
Copy link
Contributor

@rosstimothy rosstimothy commented Feb 15, 2024

When relative node expiry was originally implemented it only ran on Auth and propagated delete events downstream. However, that was quickly changed in #12002 because the event system could not keep up in clusters with high churn and would result in buffer overflow errors in downstream caches. The event system has been overhauled to use FanoutV2, which no longer suffers from the burst of events causing problems. The change to not propagate delete events also breaks any node watchers on the local cache from ever expiring the server since they never receive a delete event.

This reverts some of the changes from #12002 such that relative expiry now only runs on the auth cache and emits delete events. Each relative expiry interval is however still limited to only remove up to a fixed number of nodes.

Closes #37527

Changelog: Fix a bug that could cause expired SSH servers from appearing in the Web UI until the Proxy is restarted.

@rosstimothy rosstimothy added backport/branch/v14 no-changelog Indicates that a PR does not require a changelog entry backport/branch/v15 labels Feb 15, 2024
@rosstimothy rosstimothy marked this pull request as ready for review February 15, 2024 16:13
lib/cache/cache.go Outdated Show resolved Hide resolved
lib/cache/cache.go Outdated Show resolved Hide resolved
@public-teleport-github-review-bot public-teleport-github-review-bot bot removed the request for review from bl-nero February 15, 2024 20:01
When relative node expiry was originally implemented it only ran
on Auth and propagated delete events downstream. However, that
was quickly changed in #12002 because the event system could not
keep up in clusters with high churn and would result in buffer
overflow errors in downstream caches. The event system has been
overhauled to use FanoutV2, which no longer suffers from the
burst of events causing problems. The change to not propagate
delete events also breaks any node watchers on the local
cache from ever expiring the server since they never receive
a delete event.

This reverts some of the changes from #12002 such that relative
expiry now only runs on the auth cache and emits delete events.
Each relative expiry interval is however still limited to only
remove up to a fixed number of nodes.

Closes #37527
@rosstimothy rosstimothy force-pushed the tross/relative_expiry branch from 5d63589 to 78caa9a Compare February 15, 2024 21:45
@zmb3
Copy link
Collaborator

zmb3 commented Feb 15, 2024

/excludeflake *

@rosstimothy rosstimothy added this pull request to the merge queue Feb 15, 2024
Merged via the queue into master with commit 9c94deb Feb 15, 2024
33 checks passed
@rosstimothy rosstimothy deleted the tross/relative_expiry branch February 15, 2024 22:30
@public-teleport-github-review-bot

@rosstimothy See the table below for backport results.

Branch Result
branch/v14 Create PR
branch/v15 Create PR

@rosstimothy rosstimothy removed the no-changelog Indicates that a PR does not require a changelog entry label Feb 15, 2024
Copy link

The PR changelog entry failed validation: Changelog entry not found in the PR body. Please add a "no-changelog" label to the PR, or changelog lines starting with changelog: followed by the changelog entries for the PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Unified Resource Cache contains expired items
4 participants