-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
linkerd-proxy using stale endpoints #11480
Comments
Thank you for the report, we have a few confirmed cases of this and are working on a fix. Just to make sure this is the same issue we see elsewhere:
|
2.11.4
yes
I looked at the metrics and they don't seem to flatline. In our case it seems there is not a single destination controller affected but rather a small number of pods running linkerd-proxy at the beginning and then over time the amount of pods affected is growing. So it seems most of the pods still get their endpoint updates but some are using outdated lists. |
## edge-23.10.3 This edge release fixes issues in the proxy and destination controller which can result in Linkerd proxies sending traffic to stale endpoints. In addition, it contains other bugfixes and updates dependencies to include patches for the security advisories [CVE-2023-44487]/GHSA-qppj-fm5r-hxr3 and GHSA-c827-hfw6-qwvm. * Fixed an issue where the Destination controller could stop processing changes in the endpoints of a destination, if a proxy subscribed to that destination stops reading service discovery updates. This issue results in proxies attempting to send traffic for that destination to stale endpoints ([#11483], fixes [#11480], [#11279], and [#10590]) * Fixed a regression introduced in stable-2.13.0 where proxies would not terminate unused service discovery watches, exerting backpressure on the Destination controller which could cause it to become stuck ([linkerd2-proxy#2484] and [linkerd2-proxy#2486]) * Added `INFO`-level logging to the proxy when endpoints are added or removed from a load balancer. These logs are enabled by default, and can be disabled by [setting the proxy log level][proxy-log-level] to `warn,linkerd=info,linkerd_proxy_balance=warn` or similar ([linkerd2-proxy#2486]) * Fixed a regression where the proxy rendered `grpc_status` metric labels as a string rather than as the numeric status code ([linkerd2-proxy#2480]; fixes [#11449]) * Added missing `imagePullSecrets` to `linkerd-jaeger` ServiceAccount ([#11504]) * Updated the control plane's dependency on the `golang.google.org/grpc` Go package to include patches for [CVE-2023-44487]/GHSA-qppj-fm5r-hxr3 ([#11496]) * Updated dependencies on `rustix` to include patches for GHSA-c827-hfw6-qwvm ([linkerd2-proxy#2488] and [#11512]). [#10590]: #10590 [#11279]: #11279 [#11483]: #11483 [#11449]: #11449 [#11480]: #11480 [#11504]: #11504 [#11504]: #11512 [linkerd2-proxy#2480]: linkerd/linkerd2-proxy#2480 [linkerd2-proxy#2484]: linkerd/linkerd2-proxy#2484 [linkerd2-proxy#2486]: linkerd/linkerd2-proxy#2486 [linkerd2-proxy#2488]: linkerd/linkerd2-proxy#2488 [proxy-log-level]: https://linkerd.io/2.14/tasks/modifying-proxy-log-level/ [CVE-2023-44487]: GHSA-qppj-fm5r-hxr3
The latest edge release, |
This stable release fixes issues in the proxy and Destination controller which can result in Linkerd proxies sending traffic to stale endpoints. In addition, it contains a bug fix for profile resolutions for pods bound on host ports and includes patches for security advisory [CVE-2023-44487]/GHSA-qppj-fm5r-hxr3 * Control Plane * Fixed an issue where the Destination controller could stop processing changes in the endpoints of a destination, if a proxy subscribed to that destination stops reading service discovery updates. This issue results in proxies attempting to send traffic for that destination to stale endpoints ([#11483], fixes [#11480], [#11279], [#10590]) * Fixed an issue where the Destination controller would not update pod metadata for profile resolutions for a pod accessed via the host network (e.g. HostPort endpoints) ([#11334]) * Addressed [CVE-2023-44487]/GHSA-qppj-fm5r-hxr3 by upgrading several dependencies (including Go's gRPC and net libraries) * Proxy * Fixed a regression where the proxy rendered `grpc_status` metric labels as a string rather than as the numeric status code ([linkerd2-proxy#2480]; fixes [#11449]) * Fixed a regression introduced in stable-2.13.0 where proxies would not terminate unusred service discovery watches, exerting backpressure on the Destination controller which could cause it to become stuck ([linkerd2-proxy#2484]) [#10590]: #10590 [#11279]: #11279 [#11483]: #11483 [#11480]: #11480 [#11334]: #11334 [#11449]: #11449 [CVE-2023-44487]: GHSA-qppj-fm5r-hxr3 [linkerd2-proxy#2480]: linkerd/linkerd2-proxy#2480 [linkerd2-proxy#2484]: linkerd/linkerd2-proxy#2484 Signed-off-by: Matei David <[email protected]>
This stable release fixes issues in the proxy and Destination controller which can result in Linkerd proxies sending traffic to stale endpoints. In addition, it contains a bug fix for profile resolutions for pods bound on host ports and includes patches for security advisory [CVE-2023-44487]/GHSA-qppj-fm5r-hxr3 * Control Plane * Fixed an issue where the Destination controller could stop processing changes in the endpoints of a destination, if a proxy subscribed to that destination stops reading service discovery updates. This issue results in proxies attempting to send traffic for that destination to stale endpoints ([#11483], fixes [#11480], [#11279], [#10590]) * Fixed an issue where the Destination controller would not update pod metadata for profile resolutions for a pod accessed via the host network (e.g. HostPort endpoints) ([#11334]) * Addressed [CVE-2023-44487]/GHSA-qppj-fm5r-hxr3 by upgrading several dependencies (including Go's gRPC and net libraries) * Proxy * Fixed a regression where the proxy rendered `grpc_status` metric labels as a string rather than as the numeric status code ([linkerd2-proxy#2480]; fixes [#11449]) * Fixed a regression introduced in stable-2.13.0 where proxies would not terminate unusred service discovery watches, exerting backpressure on the Destination controller which could cause it to become stuck ([linkerd2-proxy#2484]) [#10590]: #10590 [#11279]: #11279 [#11483]: #11483 [#11480]: #11480 [#11334]: #11334 [#11449]: #11449 [CVE-2023-44487]: GHSA-qppj-fm5r-hxr3 [linkerd2-proxy#2480]: linkerd/linkerd2-proxy#2480 [linkerd2-proxy#2484]: linkerd/linkerd2-proxy#2484 --------- Signed-off-by: Matei David <[email protected]> Co-authored-by: Alejandro Pedraza <[email protected]> Co-authored-by: Oliver Gould <[email protected]>
This fix is now in stable-2.14.2. |
What is the issue?
Since the update to 2.13.5 we experience sporadic issues where the linkerd-proxy seems to connect to endpoints that don't exist anymore since days.
We think it's triggered by an issue to connect to linkerd-destination (which is a different problem).
Restarting linkerd-destination solves the issue.
We also compared endpoints from
linkerd diagnostics endpoints myservice...
withkubectl get endpoints myservice
and they seem to match. So we don't think that linkerd-destination contains stale data but rather the proxies.The affected proxies did not have any pending endpoints after the issue, but we currently don't have the data to understand what it looked like during the issue
How can it be reproduced?
Not clear. We updated to 2.13.5 and since then had 3 issues over the course of a few days.
Logs, error output, etc
output of
linkerd check -o short
Environment
Possible solution
No response
Additional context
No response
Would you like to work on fixing this bug?
maybe
The text was updated successfully, but these errors were encountered: