-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Seeing errors/panics when trying to upgrade past 2.14.1 #12010
Comments
Thanks for moving this over @alpeb. A minor update here, we've tried again with an upgrade in our staging environment, this time to |
Fixes #12010 ## Problem We're observing crashes in the destination controller in some scenarios, due to data race as described in #12010. ## Cause The problem is the same instance of the `AddressSet.Addresses` map is getting mutated in the endpoints watcher Server [informer handler](https://github.com/linkerd/linkerd2/blob/edge-24.1.3/controller/api/destination/watcher/endpoints_watcher.go#L1309), and iterated over in the endpoint translator [queue loop](https://github.com/linkerd/linkerd2/blob/edge-24.1.3/controller/api/destination/endpoint_translator.go#L197-L211), which run in different goroutines and the map is not guarded. I believe this doesn't result in Destination returning stale data; it's more of a correctness issue. ## Solution Added a new mutex for `AddressSet`, used whenever its fields are accessed/mutated. As mutexes cannot be copied, we can only pass a pointer around, and so any use of `AddressSet` ended up being a pointer. ## Repro Install linkerd core and injected emojivoto and patch the endpoint translator to include a sleep call that will help surfacing the race (don't install the patch in the cluster; we'll only use it locally below): <details> <summary>endpoint_translator.go diff</summary> ```diff diff --git a/controller/api/destination/endpoint_translator.go b/controller/api/destination/endpoint_translator.go index d1018d5f9..7d5abd638 100644 --- a/controller/api/destination/endpoint_translator.go +++ b/controller/api/destination/endpoint_translator.go @@ -5,6 +5,7 @@ import ( "reflect" "strconv" "strings" + "time" pb "github.com/linkerd/linkerd2-proxy-api/go/destination" "github.com/linkerd/linkerd2-proxy-api/go/net" @@ -195,7 +196,9 @@ func (et *endpointTranslator) processUpdate(update interface{}) { } func (et *endpointTranslator) add(set watcher.AddressSet) { for id, address := range set.Addresses { + time.Sleep(1 * time.Second) et.availableEndpoints.Addresses[id] = address } ``` </details> Then create these two Server manifests: <details> <summary>emoji-web-server.yml</summary> ```yaml apiVersion: policy.linkerd.io/v1beta2 kind: Server metadata: namespace: emojivoto name: web-http labels: app.kubernetes.io/part-of: emojivoto app.kubernetes.io/name: web app.kubernetes.io/version: v11 spec: podSelector: matchLabels: app: web-svc port: http proxyProtocol: HTTP/1 ``` </details> <details> <summary>emoji-web-server-opaque.yml</summary> ```yaml apiVersion: policy.linkerd.io/v1beta2 kind: Server metadata: namespace: emojivoto name: web-http labels: app.kubernetes.io/part-of: emojivoto app.kubernetes.io/name: web app.kubernetes.io/version: v11 spec: podSelector: matchLabels: app: web-svc port: http proxyProtocol: opaque ``` </details> In separate consoles run the patched destination service and a destination client: ```bash HOSTNAME=foobar go run -race ./controller/cmd/main.go destination -enable-h2-upgrade=true -enable-endpoint-slices=true -cluster-domain=cluster.local -identity-trust-domain=cluster.local -default-opaque-ports=25,587,3306,4444,5432,6379,9300,11211 ``` ```bash go run ./controller/script/destination-client -path web-svc.emojivoto.svc.cluster.local:80 ``` And run this to continuously switch the `proxyProtocol` field: ```bash while true; do kubectl apply -f ~/src/k8s/sample_yamls/emoji-web-server.yml; kubectl apply -f ~/src/k8s/sample_yamls/emoji-web-server-opaque.yml ; done ``` Tou'll see the following data race report in the Destination controller logs: <details> <summary>destination logs</summary> ```console ================== WARNING: DATA RACE Write at 0x00c0006d30e0 by goroutine 178: github.com/linkerd/linkerd2/controller/api/destination/watcher.(*portPublisher).updateServer() /home/alpeb/pr/destination-race/linkerd2/controller/api/destination/watcher/endpoints_watcher.go:1310 +0x772 github.com/linkerd/linkerd2/controller/api/destination/watcher.(*servicePublisher).updateServer() /home/alpeb/pr/destination-race/linkerd2/controller/api/destination/watcher/endpoints_watcher.go:711 +0x150 github.com/linkerd/linkerd2/controller/api/destination/watcher.(*EndpointsWatcher).addServer() /home/alpeb/pr/destination-race/linkerd2/controller/api/destination/watcher/endpoints_watcher.go:514 +0x173 github.com/linkerd/linkerd2/controller/api/destination/watcher.(*EndpointsWatcher).updateServer() /home/alpeb/pr/destination-race/linkerd2/controller/api/destination/watcher/endpoints_watcher.go:528 +0x26f github.com/linkerd/linkerd2/controller/api/destination/watcher.(*EndpointsWatcher).updateServer-fm() <autogenerated>:1 +0x64 k8s.io/client-go/tools/cache.ResourceEventHandlerFuncs.OnUpdate() /home/alpeb/go/pkg/mod/k8s.io/[email protected]/tools/cache/controller.go:246 +0x81 k8s.io/client-go/tools/cache.(*ResourceEventHandlerFuncs).OnUpdate() <autogenerated>:1 +0x1f k8s.io/client-go/tools/cache.(*processorListener).run.func1() /home/alpeb/go/pkg/mod/k8s.io/[email protected]/tools/cache/shared_informer.go:970 +0x1f4 k8s.io/apimachinery/pkg/util/wait.BackoffUntil.func1() /home/alpeb/go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/backoff.go:226 +0x41 k8s.io/apimachinery/pkg/util/wait.BackoffUntil() /home/alpeb/go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/backoff.go:227 +0xbe k8s.io/apimachinery/pkg/util/wait.JitterUntil() /home/alpeb/go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/backoff.go:204 +0x10a k8s.io/apimachinery/pkg/util/wait.Until() /home/alpeb/go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/backoff.go:161 +0x9b k8s.io/client-go/tools/cache.(*processorListener).run() /home/alpeb/go/pkg/mod/k8s.io/[email protected]/tools/cache/shared_informer.go:966 +0x38 k8s.io/client-go/tools/cache.(*processorListener).run-fm() <autogenerated>:1 +0x33 k8s.io/apimachinery/pkg/util/wait.(*Group).Start.func1() /home/alpeb/go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:72 +0x86 Previous read at 0x00c0006d30e0 by goroutine 360: github.com/linkerd/linkerd2/controller/api/destination.(*endpointTranslator).add() /home/alpeb/pr/destination-race/linkerd2/controller/api/destination/endpoint_translator.go:200 +0x1ab github.com/linkerd/linkerd2/controller/api/destination.(*endpointTranslator).processUpdate() /home/alpeb/pr/destination-race/linkerd2/controller/api/destination/endpoint_translator.go:190 +0x166 github.com/linkerd/linkerd2/controller/api/destination.(*endpointTranslator).Start.func1() /home/alpeb/pr/destination-race/linkerd2/controller/api/destination/endpoint_translator.go:174 +0x45 ``` </details> ## Extras This also removes the unused method `func (as *AddressSet) WithPort(port Port) AddressSet` in endpoints_watcher.go
@alpeb thanks for opening a PR with the fix! Do you happen to know if this bug was introduced after |
I confirmed the issue was introduced in |
@alpeb Our environment does have a fair amount of churn in our linkerd data plane. Lots of pods being created due to HPAs. Though to be clear, that refers to our production environment. While our staging environment also has those HPAs, the level of churn there is significantly lower than our production environment. And we never deployed past 2.14.1 in production, because we observed this issue in our staging environment. I'm sorry for not reporting this sooner! Next time, I'll file something right away. |
Fixes #12010 ## Problem We're observing crashes in the destination controller in some scenarios, due to data race as described in #12010. ## Cause The problem is the same instance of the `AddressSet.Addresses` map is getting mutated in the endpoints watcher Server [informer handler](https://github.com/linkerd/linkerd2/blob/edge-24.1.3/controller/api/destination/watcher/endpoints_watcher.go#L1309), and iterated over in the endpoint translator [queue loop](https://github.com/linkerd/linkerd2/blob/edge-24.1.3/controller/api/destination/endpoint_translator.go#L197-L211), which run in different goroutines and the map is not guarded. I believe this doesn't result in Destination returning stale data; it's more of a correctness issue. ## Solution Make a shallow copy of `pp.addresses` in the endpoints watcher and only pass that to the listeners. It's a shallow copy because we avoid making copies of the pod reference in there, knowing it won't get mutated. ## Repro Install linkerd core and injected emojivoto and patch the endpoint translator to include a sleep call that will help surfacing the race (don't install the patch in the cluster; we'll only use it locally below): <details> <summary>endpoint_translator.go diff</summary> ```diff diff --git a/controller/api/destination/endpoint_translator.go b/controller/api/destination/endpoint_translator.go index d1018d5f9..7d5abd638 100644 --- a/controller/api/destination/endpoint_translator.go +++ b/controller/api/destination/endpoint_translator.go @@ -5,6 +5,7 @@ import ( "reflect" "strconv" "strings" + "time" pb "github.com/linkerd/linkerd2-proxy-api/go/destination" "github.com/linkerd/linkerd2-proxy-api/go/net" @@ -195,7 +196,9 @@ func (et *endpointTranslator) processUpdate(update interface{}) { } func (et *endpointTranslator) add(set watcher.AddressSet) { for id, address := range set.Addresses { + time.Sleep(1 * time.Second) et.availableEndpoints.Addresses[id] = address } ``` </details> Then create these two Server manifests: <details> <summary>emoji-web-server.yml</summary> ```yaml apiVersion: policy.linkerd.io/v1beta2 kind: Server metadata: namespace: emojivoto name: web-http labels: app.kubernetes.io/part-of: emojivoto app.kubernetes.io/name: web app.kubernetes.io/version: v11 spec: podSelector: matchLabels: app: web-svc port: http proxyProtocol: HTTP/1 ``` </details> <details> <summary>emoji-web-server-opaque.yml</summary> ```yaml apiVersion: policy.linkerd.io/v1beta2 kind: Server metadata: namespace: emojivoto name: web-http labels: app.kubernetes.io/part-of: emojivoto app.kubernetes.io/name: web app.kubernetes.io/version: v11 spec: podSelector: matchLabels: app: web-svc port: http proxyProtocol: opaque ``` </details> In separate consoles run the patched destination service and a destination client: ```bash HOSTNAME=foobar go run -race ./controller/cmd/main.go destination -enable-h2-upgrade=true -enable-endpoint-slices=true -cluster-domain=cluster.local -identity-trust-domain=cluster.local -default-opaque-ports=25,587,3306,4444,5432,6379,9300,11211 ``` ```bash go run ./controller/script/destination-client -path web-svc.emojivoto.svc.cluster.local:80 ``` And run this to continuously switch the `proxyProtocol` field: ```bash while true; do kubectl apply -f ~/src/k8s/sample_yamls/emoji-web-server.yml; kubectl apply -f ~/src/k8s/sample_yamls/emoji-web-server-opaque.yml ; done ``` You'll see the following data race report in the Destination controller logs: <details> <summary>destination logs</summary> ```console ================== WARNING: DATA RACE Write at 0x00c0006d30e0 by goroutine 178: github.com/linkerd/linkerd2/controller/api/destination/watcher.(*portPublisher).updateServer() /home/alpeb/pr/destination-race/linkerd2/controller/api/destination/watcher/endpoints_watcher.go:1310 +0x772 github.com/linkerd/linkerd2/controller/api/destination/watcher.(*servicePublisher).updateServer() /home/alpeb/pr/destination-race/linkerd2/controller/api/destination/watcher/endpoints_watcher.go:711 +0x150 github.com/linkerd/linkerd2/controller/api/destination/watcher.(*EndpointsWatcher).addServer() /home/alpeb/pr/destination-race/linkerd2/controller/api/destination/watcher/endpoints_watcher.go:514 +0x173 github.com/linkerd/linkerd2/controller/api/destination/watcher.(*EndpointsWatcher).updateServer() /home/alpeb/pr/destination-race/linkerd2/controller/api/destination/watcher/endpoints_watcher.go:528 +0x26f github.com/linkerd/linkerd2/controller/api/destination/watcher.(*EndpointsWatcher).updateServer-fm() <autogenerated>:1 +0x64 k8s.io/client-go/tools/cache.ResourceEventHandlerFuncs.OnUpdate() /home/alpeb/go/pkg/mod/k8s.io/[email protected]/tools/cache/controller.go:246 +0x81 k8s.io/client-go/tools/cache.(*ResourceEventHandlerFuncs).OnUpdate() <autogenerated>:1 +0x1f k8s.io/client-go/tools/cache.(*processorListener).run.func1() /home/alpeb/go/pkg/mod/k8s.io/[email protected]/tools/cache/shared_informer.go:970 +0x1f4 k8s.io/apimachinery/pkg/util/wait.BackoffUntil.func1() /home/alpeb/go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/backoff.go:226 +0x41 k8s.io/apimachinery/pkg/util/wait.BackoffUntil() /home/alpeb/go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/backoff.go:227 +0xbe k8s.io/apimachinery/pkg/util/wait.JitterUntil() /home/alpeb/go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/backoff.go:204 +0x10a k8s.io/apimachinery/pkg/util/wait.Until() /home/alpeb/go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/backoff.go:161 +0x9b k8s.io/client-go/tools/cache.(*processorListener).run() /home/alpeb/go/pkg/mod/k8s.io/[email protected]/tools/cache/shared_informer.go:966 +0x38 k8s.io/client-go/tools/cache.(*processorListener).run-fm() <autogenerated>:1 +0x33 k8s.io/apimachinery/pkg/util/wait.(*Group).Start.func1() /home/alpeb/go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:72 +0x86 Previous read at 0x00c0006d30e0 by goroutine 360: github.com/linkerd/linkerd2/controller/api/destination.(*endpointTranslator).add() /home/alpeb/pr/destination-race/linkerd2/controller/api/destination/endpoint_translator.go:200 +0x1ab github.com/linkerd/linkerd2/controller/api/destination.(*endpointTranslator).processUpdate() /home/alpeb/pr/destination-race/linkerd2/controller/api/destination/endpoint_translator.go:190 +0x166 github.com/linkerd/linkerd2/controller/api/destination.(*endpointTranslator).Start.func1() /home/alpeb/pr/destination-race/linkerd2/controller/api/destination/endpoint_translator.go:174 +0x45 ``` </details> ## Extras This also removes the unused method `func (as *AddressSet) WithPort(port Port) AddressSet` in endpoints_watcher.go
This release addresses some issues in the destination service that could cause it to behave unexpectedly when processing updates. * Fixed a race condition in the destination service that could cause panics under very specific conditions ([#12022]; fixes [#12010]) * Changed how updates to a `Server` selector are handled in the destination service. When a `Server` that marks a port as opaque no longer selects a resource, the resource's opaqueness will reverted to default settings ([#12031]; fixes [#11995]) * Introduced Helm configuration values for liveness and readiness probe timeouts and delays ([#11458]; fixes [#11453]) (thanks @jan-kantert!) [#12010]: #12010 [#12022]: #12022 [#11995]: #11995 [#12031]: #12031 [#11453]: #11453 [#11458]: #11458 Signed-off-by: Matei David <[email protected]>
Fixes #12010 ## Problem We're observing crashes in the destination controller in some scenarios, due to data race as described in #12010. ## Cause The problem is the same instance of the `AddressSet.Addresses` map is getting mutated in the endpoints watcher Server [informer handler](https://github.com/linkerd/linkerd2/blob/edge-24.1.3/controller/api/destination/watcher/endpoints_watcher.go#L1309), and iterated over in the endpoint translator [queue loop](https://github.com/linkerd/linkerd2/blob/edge-24.1.3/controller/api/destination/endpoint_translator.go#L197-L211), which run in different goroutines and the map is not guarded. I believe this doesn't result in Destination returning stale data; it's more of a correctness issue. ## Solution Make a shallow copy of `pp.addresses` in the endpoints watcher and only pass that to the listeners. It's a shallow copy because we avoid making copies of the pod reference in there, knowing it won't get mutated. ## Repro Install linkerd core and injected emojivoto and patch the endpoint translator to include a sleep call that will help surfacing the race (don't install the patch in the cluster; we'll only use it locally below): <details> <summary>endpoint_translator.go diff</summary> ```diff diff --git a/controller/api/destination/endpoint_translator.go b/controller/api/destination/endpoint_translator.go index d1018d5f9..7d5abd638 100644 --- a/controller/api/destination/endpoint_translator.go +++ b/controller/api/destination/endpoint_translator.go @@ -5,6 +5,7 @@ import ( "reflect" "strconv" "strings" + "time" pb "github.com/linkerd/linkerd2-proxy-api/go/destination" "github.com/linkerd/linkerd2-proxy-api/go/net" @@ -195,7 +196,9 @@ func (et *endpointTranslator) processUpdate(update interface{}) { } func (et *endpointTranslator) add(set watcher.AddressSet) { for id, address := range set.Addresses { + time.Sleep(1 * time.Second) et.availableEndpoints.Addresses[id] = address } ``` </details> Then create these two Server manifests: <details> <summary>emoji-web-server.yml</summary> ```yaml apiVersion: policy.linkerd.io/v1beta2 kind: Server metadata: namespace: emojivoto name: web-http labels: app.kubernetes.io/part-of: emojivoto app.kubernetes.io/name: web app.kubernetes.io/version: v11 spec: podSelector: matchLabels: app: web-svc port: http proxyProtocol: HTTP/1 ``` </details> <details> <summary>emoji-web-server-opaque.yml</summary> ```yaml apiVersion: policy.linkerd.io/v1beta2 kind: Server metadata: namespace: emojivoto name: web-http labels: app.kubernetes.io/part-of: emojivoto app.kubernetes.io/name: web app.kubernetes.io/version: v11 spec: podSelector: matchLabels: app: web-svc port: http proxyProtocol: opaque ``` </details> In separate consoles run the patched destination service and a destination client: ```bash HOSTNAME=foobar go run -race ./controller/cmd/main.go destination -enable-h2-upgrade=true -enable-endpoint-slices=true -cluster-domain=cluster.local -identity-trust-domain=cluster.local -default-opaque-ports=25,587,3306,4444,5432,6379,9300,11211 ``` ```bash go run ./controller/script/destination-client -path web-svc.emojivoto.svc.cluster.local:80 ``` And run this to continuously switch the `proxyProtocol` field: ```bash while true; do kubectl apply -f ~/src/k8s/sample_yamls/emoji-web-server.yml; kubectl apply -f ~/src/k8s/sample_yamls/emoji-web-server-opaque.yml ; done ``` You'll see the following data race report in the Destination controller logs: <details> <summary>destination logs</summary> ```console ================== WARNING: DATA RACE Write at 0x00c0006d30e0 by goroutine 178: github.com/linkerd/linkerd2/controller/api/destination/watcher.(*portPublisher).updateServer() /home/alpeb/pr/destination-race/linkerd2/controller/api/destination/watcher/endpoints_watcher.go:1310 +0x772 github.com/linkerd/linkerd2/controller/api/destination/watcher.(*servicePublisher).updateServer() /home/alpeb/pr/destination-race/linkerd2/controller/api/destination/watcher/endpoints_watcher.go:711 +0x150 github.com/linkerd/linkerd2/controller/api/destination/watcher.(*EndpointsWatcher).addServer() /home/alpeb/pr/destination-race/linkerd2/controller/api/destination/watcher/endpoints_watcher.go:514 +0x173 github.com/linkerd/linkerd2/controller/api/destination/watcher.(*EndpointsWatcher).updateServer() /home/alpeb/pr/destination-race/linkerd2/controller/api/destination/watcher/endpoints_watcher.go:528 +0x26f github.com/linkerd/linkerd2/controller/api/destination/watcher.(*EndpointsWatcher).updateServer-fm() <autogenerated>:1 +0x64 k8s.io/client-go/tools/cache.ResourceEventHandlerFuncs.OnUpdate() /home/alpeb/go/pkg/mod/k8s.io/[email protected]/tools/cache/controller.go:246 +0x81 k8s.io/client-go/tools/cache.(*ResourceEventHandlerFuncs).OnUpdate() <autogenerated>:1 +0x1f k8s.io/client-go/tools/cache.(*processorListener).run.func1() /home/alpeb/go/pkg/mod/k8s.io/[email protected]/tools/cache/shared_informer.go:970 +0x1f4 k8s.io/apimachinery/pkg/util/wait.BackoffUntil.func1() /home/alpeb/go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/backoff.go:226 +0x41 k8s.io/apimachinery/pkg/util/wait.BackoffUntil() /home/alpeb/go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/backoff.go:227 +0xbe k8s.io/apimachinery/pkg/util/wait.JitterUntil() /home/alpeb/go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/backoff.go:204 +0x10a k8s.io/apimachinery/pkg/util/wait.Until() /home/alpeb/go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/backoff.go:161 +0x9b k8s.io/client-go/tools/cache.(*processorListener).run() /home/alpeb/go/pkg/mod/k8s.io/[email protected]/tools/cache/shared_informer.go:966 +0x38 k8s.io/client-go/tools/cache.(*processorListener).run-fm() <autogenerated>:1 +0x33 k8s.io/apimachinery/pkg/util/wait.(*Group).Start.func1() /home/alpeb/go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:72 +0x86 Previous read at 0x00c0006d30e0 by goroutine 360: github.com/linkerd/linkerd2/controller/api/destination.(*endpointTranslator).add() /home/alpeb/pr/destination-race/linkerd2/controller/api/destination/endpoint_translator.go:200 +0x1ab github.com/linkerd/linkerd2/controller/api/destination.(*endpointTranslator).processUpdate() /home/alpeb/pr/destination-race/linkerd2/controller/api/destination/endpoint_translator.go:190 +0x166 github.com/linkerd/linkerd2/controller/api/destination.(*endpointTranslator).Start.func1() /home/alpeb/pr/destination-race/linkerd2/controller/api/destination/endpoint_translator.go:174 +0x45 ``` </details> ## Extras This also removes the unused method `func (as *AddressSet) WithPort(port Port) AddressSet` in endpoints_watcher.go
This stable release back-ports bugfixes and improvements from recent edge releases. * Introduced support for arbitrary labels in the `podMonitors` field in the control plane Helm chart (thanks @jseiser!) ([#11222]; fixes [#11175]) * Added a `prometheusUrl` field for the heartbeat job in the control plane Helm chart (thanks @david972!) ([#11343]; fixes [#11342]) * Updated the Destination controller to return `INVALID_ARGUMENT` status codes properly when a `ServiceProfile` is requested for a service that does not exist. ([#11980]) * Reduced the load on the Destination controller by only processing Server updates on workloads affected by the Server ([#12017]) * Changed how updates to a `Server` selector are handled in the destination service. When a `Server` that marks a port as opaque no longer selects a resource, the resource's opaqueness will reverted to default settings ([#12031]; fixes [#11995]) * Fixed a race condition in the destination service that could cause panics under very specific conditions ([#12022]; fixes [#12010]) * Fixed an issue where inbound policy could be incorrect after certain policy resources are deleted ([#12088]) [#11222]: #11222 [#11175]: #11175 [#11343]: #11343 [#11342]: #11342 [#11980]: #11980 [#12017]: #12017 [#11995]: #11995 [#12031]: #12031 [#12010]: #12010 [#12022]: #12022 [#12088]: #12088 Signed-off-by: Alex Leong <[email protected]> Signed-off-by: David ALEXANDRE <[email protected]> Signed-off-by: Justin S <[email protected]> Co-authored-by: Oliver Gould <[email protected]> Co-authored-by: Alejandro Pedraza <[email protected]> Co-authored-by: David ALEXANDRE <[email protected]> Co-authored-by: Justin Seiser <[email protected]>
Discussed in #11927
Originally posted by nealharris January 12, 2024
This feels like a bug to me, but I'm not exactly sure how to reproduce it, so I'm not yet willing to open this as an issue in the project.
We're currently running Linkerd 2.14.1 in production. We try to keep up to date, and always run the latest stable release. However, for each stable release after 2.14.1, we've had some trouble when trying to upgrade.
Specifically, when we upgrade in our staging environment, after a few days, we notice restarts in the destination controller. When looking at logs, I see things like
The above are from our most recent attempt to upgrade, which is for
2.14.8
.The text was updated successfully, but these errors were encountered: