Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fingerprint: convert consul and vault fingerprinters to be reloadable #24526

Merged
merged 1 commit into from
Jan 27, 2025

Conversation

shoenig
Copy link
Contributor

@shoenig shoenig commented Nov 21, 2024

This PR changes the Consul and Vault fingerprint implementations to be
reloadable rather than periodic. Reasons described in the issue.

Closes: #24049

This PR changes the Consul and Vault fingerprint implementations to be
reloadable rather than periodic. Reasons described in the issue.

Closes: #24049
@shoenig shoenig force-pushed the cv-reloadable-fingerprints branch from 21ed04c to 0689759 Compare December 5, 2024 14:16
@shoenig shoenig marked this pull request as ready for review December 5, 2024 16:30
@shoenig shoenig requested review from a team as code owners December 5, 2024 16:30
Copy link
Member

@jrasell jrasell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@jrasell jrasell self-assigned this Jan 21, 2025
@jrasell jrasell added the backport/1.9.x backport to 1.9.x release line label Jan 27, 2025
@jrasell jrasell merged commit 1356880 into main Jan 27, 2025
29 checks passed
@jrasell jrasell deleted the cv-reloadable-fingerprints branch January 27, 2025 09:20
tgross added a commit that referenced this pull request Feb 12, 2025
In #24526 we updated the Consul and Vault fingerprints so that they are no
longer periodic. This fixed a problem that cluster admins reported where rolling
updates of Vault or Consul would cause a thundering herd of fingerprint updates
across the whole cluster.

But if Consul/Vault is not available during the initial fingerprint, it will
never get fingerprinted again. This is challenging for cluster updates and black
starts because the implicit service startup ordering may require
reloads. Instead, have the fingerprinter run periodically but mark that it has
made its first successful fingerprint of all Consul/Vault clusters. At that
point, we can skip further periodic updates. The `Reload` method will reset the
mark and allow the subsequent fingerprint to run normally.

Fixes: #25097
Ref: #24526
Ref: #24049
tgross added a commit that referenced this pull request Feb 12, 2025
In #24526 we updated the Consul and Vault fingerprints so that they are no
longer periodic. This fixed a problem that cluster admins reported where rolling
updates of Vault or Consul would cause a thundering herd of fingerprint updates
across the whole cluster.

But if Consul/Vault is not available during the initial fingerprint, it will
never get fingerprinted again. This is challenging for cluster updates and black
starts because the implicit service startup ordering may require
reloads. Instead, have the fingerprinter run periodically but mark that it has
made its first successful fingerprint of all Consul/Vault clusters. At that
point, we can skip further periodic updates. The `Reload` method will reset the
mark and allow the subsequent fingerprint to run normally.

Fixes: #25097
Ref: #24526
Ref: #24049
tgross added a commit that referenced this pull request Feb 12, 2025
In #24526 we updated the Consul and Vault fingerprints so that they are no
longer periodic. This fixed a problem that cluster admins reported where rolling
updates of Vault or Consul would cause a thundering herd of fingerprint updates
across the whole cluster.

But if Consul/Vault is not available during the initial fingerprint, it will
never get fingerprinted again. This is challenging for cluster updates and black
starts because the implicit service startup ordering may require
reloads. Instead, have the fingerprinter run periodically but mark that it has
made its first successful fingerprint of all Consul/Vault clusters. At that
point, we can skip further periodic updates. The `Reload` method will reset the
mark and allow the subsequent fingerprint to run normally.

Fixes: #25097
Ref: #24526
Ref: #24049
tgross added a commit that referenced this pull request Feb 13, 2025
…25102)

In #24526 we updated the Consul and Vault fingerprints so that they are no
longer periodic. This fixed a problem that cluster admins reported where rolling
updates of Vault or Consul would cause a thundering herd of fingerprint updates
across the whole cluster.

But if Consul/Vault is not available during the initial fingerprint, it will
never get fingerprinted again. This is challenging for cluster updates and black
starts because the implicit service startup ordering may require
reloads. Instead, have the fingerprinter run periodically but mark that it has
made its first successful fingerprint of all Consul/Vault clusters. At that
point, we can skip further periodic updates. The `Reload` method will reset the
mark and allow the subsequent fingerprint to run normally.

Fixes: #25097
Ref: #24526
Ref: #24049
tgross added a commit that referenced this pull request Feb 21, 2025
In #24526 we updated Consul and Vault fingerprinting so that we no longer
periodically fingerprint. In #25102 we made it so that we fingerprint
periodically on start until the first fingerprint, in order to tolerate Consul
or Vault not being available on start. For clusters not running Consul, this
leads to a warn-level log every 15s. This same log exists for Vault, but Vault
support is opt-in via `vault.enable = true` whereas you have to manually disable
the fingerprinter for Consul.

Make it so that we only log a failed Consul fingerprint once per Consul
cluster. Reset the gate on this once we have a successful fingerprint, so that
we get the logs after a reload if Consul is unavailable.

Ref: #24526
Ref: #25102
Fixes: #25181
tgross added a commit that referenced this pull request Feb 21, 2025
In #24526 we updated Consul and Vault fingerprinting so that we no longer
periodically fingerprint. In #25102 we made it so that we fingerprint
periodically on start until the first fingerprint, in order to tolerate Consul
or Vault not being available on start. For clusters not running Consul, this
leads to a warn-level log every 15s. This same log exists for Vault, but Vault
support is opt-in via `vault.enable = true` whereas you have to manually disable
the fingerprinter for Consul.

Make it so that we only log a failed Consul fingerprint once per Consul
cluster. Reset the gate on this once we have a successful fingerprint, so that
we get the logs after a reload if Consul is unavailable.

Ref: #24526
Ref: #25102
Fixes: #25181
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport/1.9.x backport to 1.9.x release line
Projects
None yet
Development

Successfully merging this pull request may close these issues.

make Consul/Vault fingerprints reloadable, not periodic
2 participants