Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

XDS Ring Hash inconsistent after client and server restarts #6927

Closed
k-raval opened this issue Jan 18, 2024 · 31 comments · Fixed by #7156
Closed

XDS Ring Hash inconsistent after client and server restarts #6927

k-raval opened this issue Jan 18, 2024 · 31 comments · Fixed by #7156

Comments

@k-raval
Copy link

k-raval commented Jan 18, 2024

GRPC Version - 1.60.1
Go Version - 1.19.1
OS - Linux 5.4.0-81-generic #91-Ubuntu SMP aarch64 GNU/Linux

Setup:

  • Kubernetes cluster with Istio (1.20.2) installed
  • 3 GRPC servers (S1, S2, S3) run as one kubernetes deployment with 3 replicas
  • 3 GRPC clients (C1, C2, C3) run as one kubernetes deployment with 3 replicas
  • Istio virtual service and destination rule pointing to server-service (and load balancer policy as consistent hash based on GRPC header)
  • Servers and Clients using GRPC XDS feature
  • Clients send a Hello message to server over GRPC with metadata as a header field - "XYZ"
  • The GRPC XDS ring hash is supposed to do hashing on this header and route it to one of the servers
  • ALL clients are repeating this in a loop and with same field value, and hence all messages are reaching just 1 server (which has been chosen based on hash of value "XYZ")

Problem:
When one server and one client each are killed (or crash at the same time) and they restart back up again, now the hashing pattern changes. Some clients send the message to one server, but other clients send the message to S2. The header field is still the same, however different clients now hash it to different destinations.

Expectation:
No matter in what order client or server go down and come back up, the final hashing logic should be such that a given field value should be hashed to exactly same destination.

Other Information:
Instead of using GRPC XDS hashing, if we try to use Envoy sidecar which does its own hashing, we see the expected behaviour always. However, we want to use GRPC XDS features as it avoids one hop to Envoy proxy and gives us better performance.

@easwars - we talked about similar in a question/answer format some time back #5933 . Appreciate your help here. thanks in advance.

@k-raval
Copy link
Author

k-raval commented Jan 18, 2024

A point to add - Since this setup is on Kubernetes when the clients or servers crash and come back up, they come up with different IP address/names. Just for info if that makes any difference.

@k-raval
Copy link
Author

k-raval commented Jan 30, 2024

Hi @ginayeh , any update ? let me know if more info is needed. Thanks.

@k-raval
Copy link
Author

k-raval commented Feb 6, 2024

Hi @zasweq , @ginayeh - any update ? awaiting your inputs. Thanks.

@ginayeh
Copy link
Contributor

ginayeh commented Feb 6, 2024

Hi @k-raval, thanks for the patience. Are you only seeing this inconsistent behavior when both client AND server are killed? Would you mind sharing your example code to reproduce the ring hash inconsistent behavior?

@k-raval
Copy link
Author

k-raval commented Feb 7, 2024

To answer your question, when I kill either one server or one client independently, this behaviour is not observed, but when I kill a server and client simultaneously, this problem is observed.

Attaching the codebase here. Directory contains:

  1. Client code. <go build -o client cmd/client/main.go>
  2. Server code <go build -o server cmd/server/main.go>
  3. server.yaml file - Replace the path of local dir from which server binary is mounted, search for path - /home/kraval/test/grpctest
  4. client.yaml file - Replace the path of local dir from which client binary is mounted, search for path - /home/kraval/test/grpctest

Environment:
K3s based Kubernetes cluster
Istio installed using Helm chart (just the Istio-base and Istiod, Ingress Gateway not required)

Steps:

  1. Start server - kubectl apply -f server.yaml
  2. Wait for servers to start, it will start 3 replicas using a deployment
  3. Start client - kubectl apply -f client.yaml
  4. Again 3 replicas of client will start.
  5. Client has 4 different unique strings, 1-AAAA...., 2-BBBB...., 3-CCCC...., 4-DDDD....
  6. With each of above unique string used as a GRPC Metadata pair, the message is sent to server
  7. As we have setup Istio rules to hash on above string which in turn map to GRPC XDS Ring hash, the outgoing messages from client are hashed on above and they reach appropriate server.
  8. Note the server logs of 3 servers. Each unique string reaches exactly 1 server. e.g. 1-AAAA reaches any server, say server-1, irrespective of the client sending it (1/2/3). Hence, you see 3 messages of 1-AAAA reaching the server (one from each client).
  9. kubectl delete pod (pick any one server and client to kill)
  10. Client and servers come back up. Again observe all the 3 server logs
  11. It is observed that one particular unique key gets hashed to different destinations by different clients. You start seeing 1-BBBB messages reaching server-1 as well as at server-2. So, different clients are doing different hashing now.

Thanks. Let me know if you want to try something.
hashtest.tgz

@k-raval k-raval removed their assignment Feb 12, 2024
@k-raval
Copy link
Author

k-raval commented Feb 12, 2024

Hi @ginayeh , let me know if you need any help in reproducing the issue. Some more info : I used calico as CNI on the K3s cluster and my Linux was Ubuntu 22.04 with uname: Linux kartik-ubuntu 5.15.0-92-generic #102-Ubuntu SMP Wed Jan 10 09:37:39 UTC 2024 aarch64 aarch64 aarch64 GNU/Linux

@k-raval
Copy link
Author

k-raval commented Feb 19, 2024

Hi @ginayeh , any update so far ? let me know. thanks.

@arvindbr8 arvindbr8 assigned arvindbr8 and unassigned ginayeh Feb 28, 2024
@k-raval
Copy link
Author

k-raval commented Mar 4, 2024

Hi @arvindbr8 , let me know if you need any help in simulating this. awaiting your reply. thanks.

@arvindbr8
Copy link
Member

@k-raval -- We appreciate your patience here. Just taking a bit to come up to context with xDS Ring Hash consistencies. Will get back to you asap.

@arvindbr8
Copy link
Member

Hi @k-raval -- apologies. We are a bit thin on the ground with multiple teammates on vacation. Hence the delayed turn back time.

I would lean on to you for understanding the behavior of istiod here. Looking at your setup, it seems like you are expecting a header based ring hash load balancing. As per the spec:

header: gRPC will support this type of hash policy. This allows hashing on a request header. The header matching logic will behave the same way as specified for route matching, as per gRFC A28 (i.e., it will ignore headers with a -bin suffix and may need special handling for content-type).

And this is the code that actually generates the hash for each RPC (

case xdsresource.HashPolicyTypeHeader:
).

Based on the reading above, the header to use for hashing needs to come though the xDS response as a RouteAction in the hash_policy field. I'm not sure how the xDS response in an istio environment would look like and I would like your help here. Could you paste the RouteAction part of the xDS response here so that we could debug better?

PS: The particular scenario when the client and server restarts is one of those cases where the definition of persistence is different for different people. Especially because in gRPC when in the event of a down/upscale of backends (or even if there is a churn of backends) the ring changes. Also its important to note that a new gRPC channel could potentially create a different ring for the same backends based on the ordering of the endpoints response from the xdsResolver.

@k-raval
Copy link
Author

k-raval commented Mar 15, 2024

Thanks @arvindbr8.
I had started 3 clients and 3 servers and collected GRPC logs of one particular client to capture XDS response. Attaching the log file here.

For your comment on the ring formation, my comment is - irrespective of the order in which XDS resolver sends the list of server endpoints, the client can order them probably alphabetically to construct the ring, so that all different clients will follow the same behaviour and when we will take a hash on a parameter, it will result in choosing same destination server "across clients". Let me know if this makes sense ?

xds.txt

@arvindbr8
Copy link
Member

Thanks for the logs. The RouteAction looks correct to me with the hash_policy providing the header to use.

"hash_policy":{"header":{"header_name":"ULID"}}

Also, I should rephrase my previous comment about ring formation.

the ring changes

what I meant here is that, its hard to determine if the ring is going to look alike across all the clients. But the current hashing algorithm tries to ensure that an address update only a small portion of the keys in the ring are updates in the same binary. But when the client restarts, the shape of the ring might look totally different

For your comment on the ring formation, my comment is - irrespective of the order in which XDS resolver sends the list of server endpoints, the client can order them probably alphabetically to construct the ring, so that all different clients will follow the same behaviour and when we will take a hash on a parameter, it will result in choosing same destination server "across clients". Let me know if this makes sense ?

The ring hash LB policy is implemented as per the xDS spec. Keeping consistency over different runs of the client doesn't seem to be in the scope of this LB policy. Seems like Ring Hash LB Policy might be too barebones for your usecase here.

A55-xds-stateful-session-affinity.md might something you might be interested in- but with the caveats of client/server shutdown cases. But this feature is pending implementation in golang and has not been prioritized yet.

@k-raval
Copy link
Author

k-raval commented Mar 20, 2024

Hi @arvindbr8 , thanks for explanation.

My requirement is simple. For a given parameter (e.g. ULID string above), I want all the different clients independently started and stateless to hash it to SAME destination server endpoint. If this costs me un-equal load distribution, I am fine with it. If the server endpoints change (add/delete/crash-restart), the same ULID may get hashed to any other server endpoint, that is fine, but I need all the clients to follow the same behaviour independently without exchanging any state.

And I see this working with Envoy proxy doing the load balancing using same configuration. However, to avoid the cost of proxy, I wanted to used GRPC proxy less mode using XDS. Hence expected same behaviour here as well.

Let me know if there is any solution you may see since you are much more into the GRPC codebase.

Thanks again and awaiting your reply.

@arvindbr8
Copy link
Member

Let me followup with the team to get you better guidance for this usecase. Meanwhile, could you confirm that in the case with Envoy proxy you are also using the same ring hash LB policy?

Copy link

This issue is labeled as requiring an update from the reporter, and no update has been received after 6 days. If no update is provided in the next 7 days, this issue will be automatically closed.

@github-actions github-actions bot added the stale label Apr 15, 2024
@arvindbr8 arvindbr8 removed the stale label Apr 15, 2024
@arvindbr8
Copy link
Member

Removing the stale label to give more time on for the reproduction, since there might be a potential bug here.

@k-raval
Copy link
Author

k-raval commented Apr 16, 2024

Hi @arvindbr8, I was able to reproduce the issue with your branch with logs enabled.

Sequence of events:

  • 3 clients, 3 servers running
  • Client-2 and Server-2 killed at the same time.
  • Somehow, no issue was observed
  • Client-1 and Server-1 killed at the same time
  • Issue now observed.

Attaching logs:
client2.txt - client2 logs before restart
client2-restarted.txt - client2 logs after restart
client1.txt - client1 logs before restart
client1-restarted.txt - client1 logs after restart (issue is seen here in these logs)
client3.txt - client3 - never restarted, has all events - both client1 and client2 restart in logs.

testlogs.tgz

@arvindbr8
Copy link
Member

arvindbr8 commented Apr 17, 2024

Thanks.

Somehow, no issue was observed

Shows evidence of not being deterministic. I'm sort of inclined now to believe if there is an interaction with istiod ADS responses which makes the behavior of gRPC non deterministic. Anyways, I'll take a look

@arvindbr8
Copy link
Member

I feel like I have found what the issue could be. I have pushed some more commits to https://github.com/arvindbr8/grpc-go/tree/dbug_6927. This contains a potential fix and other debug messages (which would be useful in case the fix does not completely fix the issue)

Do you mind running the test again with the HEAD @ https://github.com/arvindbr8/grpc-go/tree/dbug_6927

@k-raval
Copy link
Author

k-raval commented Apr 18, 2024

Hi @arvindbr8 ,happy to inform that with your new commit, the issue seems to have been fixed. attaching the logs again for your review. thanks a lot for putting so much effort.

Same procedure I tried twice and each time I did two sets of restarts (each set consisting of one client and one server).
Attaching logs from both runs.

logsnew2.tgz
logsnew.tgz

@k-raval k-raval removed their assignment Apr 18, 2024
Copy link

This issue is labeled as requiring an update from the reporter, and no update has been received after 6 days. If no update is provided in the next 7 days, this issue will be automatically closed.

@arvindbr8
Copy link
Member

@k-raval I think this should fix the bug. Please feel free to reopen this issue if you still have concerns.

@arvindbr8 arvindbr8 assigned k-raval and unassigned arvindbr8 and k-raval Jun 4, 2024
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Dec 2, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants