Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Linkerd is altering status on HTTPRoutes it doesn't own #11659

Closed
quyenhoang96 opened this issue Nov 28, 2023 · 4 comments · Fixed by #11705 or #11764
Closed

Linkerd is altering status on HTTPRoutes it doesn't own #11659

quyenhoang96 opened this issue Nov 28, 2023 · 4 comments · Fixed by #11705 or #11764
Assignees
Labels

Comments

@quyenhoang96
Copy link

What is the issue?

When I config Envoy-Gateway https://github.com/envoyproxy/gateway.
Then I apply HTTPRoute, the control plan of linkerd watch resource of envoy-gateway, and change it. So it make the envoy-gateway error. Help me how to fix it.

How can it be reproduced?

When I config Envoy-Gateway https://github.com/envoyproxy/gateway.
Then I apply HTTPRoute, the control plan of linkerd watch resource of envoy-gateway, and change it. So it make the envoy-gateway error. Help me how to fix it.

Logs, error output, etc

When I config Envoy-Gateway https://github.com/envoyproxy/gateway.
Then I apply HTTPRoute, the control plan of linkerd watch resource of envoy-gateway, and change it. So it make the envoy-gateway error. Help me how to fix it.

output of linkerd check -o short

Environment

Possible solution

No response

Additional context

When I config Envoy-Gateway https://github.com/envoyproxy/gateway.
Then I apply HTTPRoute, the control plan of linkerd watch resource of envoy-gateway, and change it. So it make the envoy-gateway error. Help me how to fix it.

Would you like to work on fixing this bug?

yes

@kflynn
Copy link
Member

kflynn commented Nov 28, 2023

I'm going to resolve this as a dup of envoyproxy/gateway#2248 – I'm involved with Envoy Gateway, too, so I think we can track it in just the one project. 🙂

@kflynn kflynn closed this as completed Nov 28, 2023
@kflynn
Copy link
Member

kflynn commented Nov 30, 2023

Upon investigation... yeah, we think that this is a Linkerd bug. 🙁 Reopening this issue.

How to reproduce

Start with an empty cluster.

1. Install Linkerd.

Install Linkerd without messing with Gateway API CRDs:

linkerd install --crds --set enableHttpRoutes=false | k apply -f -
linkerd install --set enableHttpRoutes=false | k apply -f -

At this point Linkerd will not start running, which surprised me, but the next step fixes that.

2. Install Envoy Gateway.

This is from the Envoy Gateway quickstart:

helm install eg oci://docker.io/envoyproxy/gateway-helm --version v0.6.0 \
             -n envoy-gateway-system --create-namespace

(Note that eg here is an abbreviation for Envoy Gateway, not "for example".)

3. Wait for Linkerd and Envoy Gateway to be running.

kubectl rollout status -n envoy-gateway deploy
kubectl rollout status -n linkerd deploy

4. Create a GatewayClass, Gateway, and HTTPRoute.

Again from the Envoy Gateway quickstart:

kubectl apply -f https://github.com/envoyproxy/gateway/releases/download/latest/quickstart.yaml -n default

5. Observe the bug.

kubectl get httproutes.gateway.networking.k8s.io backend -o yaml

The status will show

status:
  parents: []

instead of the proper status from Envoy Gateway.

Hypothesis

It appears that Envoy Gateway correctly notices that our HTTPRoute has an entry in parentRefs pointing to Envoy Gateway, so it does all the reconciliation that it needs to and then updates the HTTPRoute's status.

Linkerd correctly sees the status update, but then incorrectly decides to update the status itself. I think that what Linkerd is trying to do is an empty update with the Merge strategy, but it appears that Merge incorrectly overwrites the parents list with the (defaulted-to-?)empty list from Linkerd.

Overall, I think that Linkerd shouldn't be attempting any update of a route that it doesn't own, and I think that maybe that starts with altering the check in policy-controller/k8s/status/src/index.rs at line 129?

                    } else if id.gkn.group == k8s_gateway_api::HttpRoute::group(&()) {

@kflynn kflynn reopened this Nov 30, 2023
@kflynn kflynn changed the title Conflic control plan with envoy-gateway Linkerd is altering status on HTTPRoutes it doesn't own Nov 30, 2023
@kflynn
Copy link
Member

kflynn commented Nov 30, 2023

Part of this fix is also going to be to add conformance tests for mesh/gateway-controller interaction, as tracked in kubernetes-sigs/gateway-api#2630.)

@quyenhoang96
Copy link
Author

Hi @kflynn
Thank you for your assistance. Currently we are deploying linkerd + envoy-gateway for production. Can you push this bug to priority?

adleong added a commit that referenced this issue Dec 11, 2023
Fixes #11659

When the policy controller updates the status of an HttpRoute, it will override any existing statuses on that resource.  

We update the policy controller to take into account any statuses which are not controlled by Linkerd already on the resource.  It will patch the final statuses to be the combination of thee non-Linkerd statuses plus the Linkerd statuses.

Signed-off-by: Alex Leong <[email protected]>
alpeb added a commit that referenced this issue Dec 14, 2023
## edge-23.12.2

This edge release includes a restructuring of the proxy's balancer along with
accompanying new metrics. The new minimum supported Kubernetes version is 1.22.

* Restructured the proxy's balancer ([#11750]): balancer changes may now occur
  independently of request processing. Fail-fast circuit breaking is enforced on
  the balancer's queue so that requests can't get stuck in a queue indefinitely.
  This new balancer is instrumented with new metrics: request (in-queue) latency
  histograms, failfast states, discovery updates counts, and balancer endpoint
  pool sizes.
* Changed how the policy controller updates HTTPRoute status so that it doesn't
  affect statuses from other non-linkerd controllers ([#11705]; fixes [#11659])

[#11750]: #11750
[#11705]: #11705
[#11659]: #11659
alpeb added a commit that referenced this issue Dec 14, 2023
## edge-23.12.2

This edge release includes a restructuring of the proxy's balancer along with
accompanying new metrics. The new minimum supported Kubernetes version is 1.22.

* Restructured the proxy's balancer ([#11750]): balancer changes may now occur
  independently of request processing. Fail-fast circuit breaking is enforced on
  the balancer's queue so that requests can't get stuck in a queue indefinitely.
  This new balancer is instrumented with new metrics: request (in-queue) latency
  histograms, failfast states, discovery updates counts, and balancer endpoint
  pool sizes.
* Changed how the policy controller updates HTTPRoute status so that it doesn't
  affect statuses from other non-linkerd controllers ([#11705]; fixes [#11659])

[#11750]: #11750
[#11705]: #11705
[#11659]: #11659
adleong added a commit that referenced this issue Dec 20, 2023
Fixes #11659

When the policy controller updates the status of an HttpRoute, it will override any existing statuses on that resource.  

We update the policy controller to take into account any statuses which are not controlled by Linkerd already on the resource.  It will patch the final statuses to be the combination of thee non-Linkerd statuses plus the Linkerd statuses.

Signed-off-by: Alex Leong <[email protected]>
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Jan 11, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
3 participants