Skip to content

Mpuncel/sds hc sequence#17712

Closed
mpuncel wants to merge 11 commits intoenvoyproxy:mainfrom
mpuncel:mpuncel/sds-hc-sequence
Closed

Mpuncel/sds hc sequence#17712
mpuncel wants to merge 11 commits intoenvoyproxy:mainfrom
mpuncel:mpuncel/sds-hc-sequence

Conversation

@mpuncel
Copy link
Contributor

@mpuncel mpuncel commented Aug 13, 2021

Commit Message: Make health check loop wait for any required SDS secrets to be loaded before starting.
Additional Description: This should avoid an issue where Envoy might take over a minute to warm clusters with default settings when SDS and active health checking are used. The issue was caused by health checks starting before secrets are ready and then waiting for no_traffic_interval (default 60s) before trying again. This change makes health checks wait for SDS secrets to be ready before starting.
Risk Level: Medium
Testing: Concurrency test covers the bug that caused this to be reverted initially, and was tested locally with --runs_per_test 100, confirming test failures when the bug is present and none when it was fixed.
Docs Changes: None
Release Notes: Included
Platform Specific Features: None
Fixes #12389, #15977, #17529

mpuncel added 11 commits April 29, 2021 11:48
This commit is a revival of
envoyproxy#13516 which aimed to solve
envoyproxy#12389. That PR was merged and
then reverted due to a deadlock bug, which is still present in this
commit.

The fix will be applied in a subsequent commit in order to make the
resolution clearer to PR reviewers.

Signed-off-by: Michael Puncel <mpuncel@squareup.com>
The bug was caused by not queuing callbacks in addReadyCb() while
holding the lock. This means that you could make the decision to NOT run
the callback, get a secret concurrently which would run all callbacks,
and then add it to the queue at which point it would never run.

This also changed the name of addReadyCb() to addHealthCheckingReadyCb()
on HostImpl, and made the logic add a callback for the health checking
transport socket. That's the socket that we actually want health checks
to wait on which may be different from the data plane socket.

Signed-off-by: Michael Puncel <mpuncel@squareup.com>
Signed-off-by: Michael Puncel <mpuncel@squareup.com>
Signed-off-by: Michael Puncel <mpuncel@squareup.com>
Signed-off-by: Michael Puncel <mpuncel@squareup.com>
Signed-off-by: Michael Puncel <mpuncel@squareup.com>
* main: (687 commits)
  ci: set build debug information from env (envoyproxy#17635)
  ext_authz: do the authentication even the direct response is set (envoyproxy#17546)
  upstream: various cleanups in connection pool code (envoyproxy#17644)
  owners: promote Dmitry to maintainer (envoyproxy#17642)
  quiche: client session supports creating bidi stream (envoyproxy#17543)
  Update HTTP/2 METADATA documentation. (envoyproxy#17637)
  ext_proc: Check validity of the :status header (envoyproxy#17596)
  test: add ASSERT indicating that gRPC stream has not been started yet (envoyproxy#17614)
  ensure that the inline cookie header will be folded correctly  (envoyproxy#17560)
  cluster_manager: Make ClusterEntry a class instead of a struct (envoyproxy#17616)
  owners: make Raúl a Thrift senior extension maintainer (envoyproxy#17641)
  quiche: update QUICHE dependency (envoyproxy#17618)
  Delete mock for removed RouteEntry::perFilterConfig() method (envoyproxy#17623)
  REPO_LAYOUT.md: fix outdated link (envoyproxy#17626)
  hcm: forbid use of detection extensions with use_remote_addr/xff_num_trusted_hops (envoyproxy#17558)
  thrift proxy: add request shadowing support (envoyproxy#17544)
  ext_proc: Ensure that timer is always cancelled (envoyproxy#17569)
  Proposal: Add CachePolicy interface to allow for custom cache behavior (envoyproxy#17362)
  proto: fix verify to point at v3 only (envoyproxy#17622)
  api: move generic matcher proto to its own package (envoyproxy#17096)
  ...
@mpuncel
Copy link
Contributor Author

mpuncel commented Aug 13, 2021

This PR is just #16236 reopened. It went stale while I was on parental leave. I haven't quite finished addressing PR feedback there, I'll add more context later today.

@mpuncel
Copy link
Contributor Author

mpuncel commented Aug 16, 2021

had to force push to fix some signatures so will open a new PR again

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Should HC activation be delayed until needed secrets are available?

1 participant