Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Upstream the availability zone info string from KeyDB #700

Merged
merged 1 commit into from
Jun 27, 2024

Conversation

JohnSully
Copy link
Contributor

@JohnSully JohnSully commented Jun 26, 2024

When Redis/Valkey/KeyDB is run in a cloud environment across multiple AZ's it is preferable to keep traffic local to an AZ both for cost reasons and for latency. This is typically done when you are enabling reads on replicas with the READONLY command.

For this change we are creating a setting that is echo'd back in the info command. We do not want to add the cloud SDKs as dependencies and this is the easiest way around that. It is fairly trivial to grab the AZ from the cloud and push that into your setting file.

Currently at Snapchat we have a custom client that after connecting reads this from the server and will preferentially use that server if the AZ string matches its internally configured AZ.

In the future it would be ideal if we used this information when performing failover or even exposed it in cluster nodes.

@hwware
Copy link
Member

hwware commented Jun 26, 2024

@JohnSully DCO need to be signed off, Thanks

@JohnSully JohnSully force-pushed the availability-zone branch from 1c2868a to 7ad6d51 Compare June 26, 2024 20:48
@JohnSully
Copy link
Contributor Author

done

Copy link
Member

@madolson madolson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with your statement that it would be ideal to put in CLUSTER SHARDS or CLUSTER SLOTS (we don't want clients using CLUSTER NODES since it's hard to release stuff in a compatible way) and we can propagate the information around a cluster, but if this is sufficient for your needs I'm okay with it.

@valkey-io/core-team thoughts?

src/server.h Outdated Show resolved Hide resolved
valkey.conf Outdated Show resolved Hide resolved
@madolson madolson added release-notes This issue should get a line item in the release notes major-decision-pending Major decision pending by TSC team labels Jun 26, 2024
@JohnSully JohnSully force-pushed the availability-zone branch from 7ad6d51 to a9f8347 Compare June 26, 2024 20:57
Copy link

codecov bot commented Jun 26, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 70.02%. Comparing base (ab38730) to head (76beec6).
Report is 1 commits behind head on unstable.

Additional details and impacted files
@@             Coverage Diff              @@
##           unstable     #700      +/-   ##
============================================
- Coverage     70.20%   70.02%   -0.18%     
============================================
  Files           110      110              
  Lines         60104    60104              
============================================
- Hits          42195    42090     -105     
- Misses        17909    18014     +105     
Files Coverage Δ
src/config.c 78.39% <ø> (+0.33%) ⬆️
src/server.c 88.54% <ø> (ø)
src/server.h 100.00% <ø> (ø)

... and 14 files with indirect coverage changes

@JohnSully
Copy link
Contributor Author

JohnSully commented Jun 26, 2024

Because we have >200 clusters and 150M QPS running successfully with this change I'd suggest taking this as is and adding it to more locations later on if that is desired. It makes sense to be shown in the info command, and we can make future PRs putting it in more places if that is desired although for our client we don't need it anywhere else.

@JohnSully JohnSully force-pushed the availability-zone branch from 7ee0506 to efdbaa2 Compare June 26, 2024 21:00
@hwware
Copy link
Member

hwware commented Jun 26, 2024

I agree with your statement that it would be ideal to put in CLUSTER SHARDS or CLUSTER SLOTS (we don't want clients using CLUSTER NODES since it's hard to release stuff in a compatible way) and we can propagate the information around a cluster, but if this is sufficient for your needs I'm okay with it.

@valkey-io/core-team thoughts?

I am ok with this information.

@JohnSully JohnSully force-pushed the availability-zone branch from efdbaa2 to 76beec6 Compare June 26, 2024 21:05
@JohnSully
Copy link
Contributor Author

Its ideal if we don't change the existing ABI otherwise it will create an unecessary compat change. It seems people want it exposed in more places but that is a superset change and you'd still want this exposed in INFO.

@JohnSully
Copy link
Contributor Author

Looks like the test failure is a timeout, is this one known to be glitchy?

Copy link
Contributor

@zuiderkwast zuiderkwast left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey John! Welcome to Valkey!

LGTM.

Should we recommend clients to implement this? Should we request our official clients to implement it?

An observation: The client needs to issue INFO to each replica to find out which one to use. If we put it in CLUSTER SLOTS (in a future PR), then the client can find it faster and without connecting to all replicas. This optimization is probably relevant only for short-lived connections.

@PingXie
Copy link
Member

PingXie commented Jun 27, 2024

Should we recommend clients to implement this? Should we request our official clients to implement it

@zuiderkwast can you elaborate on the client support? This is part of the INFO output. Does any client provide higher level abstraction than a "map" or "dictionary" for the INFO output?

I am aligned on adding cluster shards/slots support at a later time too. Cluster nodes is indeed a tricky one and I have seen clients choked on new fields before. We can have the discussion when we get there.

@JohnSully
Copy link
Contributor Author

JohnSully commented Jun 27, 2024

So this has been in KeyDB for a while but I don't know if any clients except our internal proxy service has implemented support for it. Because of the way that works we are already issuing an INFO on connect for a few other reasons so we just grab it from there.

I think if we consult with other client authors on their preference we can expose it in more places later. But you'd still want to be able to run an info and quickly grab it even if its in other places.

An example of a human use of this is we've used it when debugging AZ imbalances where replicas aren't properly dispersed across them.

@soloestoy
Copy link
Member

@JohnSully nice feature! And I'm also curious about how your users are utilizing this feature. As you mentioned, there's a proxy that acts as middleware to route users to a closer availability zone, but how is the proxy deployed? As I understand it, there could be two scenarios:

  1. the proxy could be deployed as a sidecar alongside the client on the same machine or within the same availability zone. In this case, the proxy seems to actually be a part of the client, implementing the client's routing functionality.
  2. proxies could also be a set of remote services deployed across multiple availability zones. In this scenario, clients might still need to access proxies across availability zones, which would mean that the proxies should have information about the availability zones they are deployed in.

Personally, I think having client-side native support for availability zone routing would be more universal.

Additionally, are there any experiences that you could share about the actual operation? For example, how to migrate across availability zones, and how to conduct availability zone-level failover when an entire availability zone fails? These experiences would be very helpful for us and the users, thank you.

@JohnSully
Copy link
Contributor Author

JohnSully commented Jun 27, 2024

Hi @soloestoy

We implemented this at Snap because x-AZ bandwidth costs are extremely expensive so this setting saves us millions or dollars. It is not necessary to use our proxy to take advantage but support would have to be added to the client library.

In our case we send an info command on connect and then store the AZ data internally. When we next update the cluster topology we use this data to place all replicas in a priority queue and preferentially use the local server unless there is some issue with it. "Local" is defined as the remote server has a matching string to the AZ configured in our client.

We do plan to open source our proxy but this feature is by no means tied to it, that just happens to be the first "client" that implemented it. The proxy is more about making our clusters work with our service mesh architecture.

@enjoy-binbin
Copy link
Member

so the configuration item is meant to be set by the cloud provider, or the cluster admin?

it is a good idea that the client has more choice to do this route.

we also have a zoneid (int) configuration item in ours fork, it is used to prioritize the local replica (in the same AZ) to the promoted to the primary during cluster failover. We also have a proxy in front, the traffic will be routed to the AZ that is the same or close to the application.

@JohnSully
Copy link
Contributor Author

JohnSully commented Jun 27, 2024

We self host so we set it as part of our kubernetes init scripts. Fully managed services could set it on startup as well by the provider. This is designed to be very flexible for lots of different environments so it is not very opinionated. Just a simple string that you can read back.

The benefit to the string over an int is you can use the actual name your provider uses so its a bit easier to use.

We typically use 1 node per AZ so haven't needed to prioritize a same zone failure (and so did not implement that), but it would be a logical next step.

@zuiderkwast
Copy link
Contributor

@enjoy-binbin @JohnSully If you need a hierarchy of zones, are there names for the different levels? I've seen words like "zone", "failure zone", "region" and "availability zone", e.g. this page: https://kubernetes.io/docs/setup/best-practices/multiple-zones/

Would it be enough to encode the hierarchy in one string? The example "us-east-1" looks to me like a hierarchy separated by dashes. If this is enough for all use cases, then I hope just one string config "availability-zone" is enough.

@JohnSully
Copy link
Contributor Author

@zuiderkwast we do not have a need for heirarchies as we don't do x-region replication. If those were needed they could be a separate field. I'm hesitant to overarchitect something without an exact idea of how it will get used whereas this change while small and targeted is actually very impactful cost wise if you are in the cloud.

@PingXie
Copy link
Member

PingXie commented Jun 27, 2024

If you need a hierarchy of zones, are there names for the different levels?

The exact hierarchy is IMO more of an operator or infra decision (where k8s is operating). If we would like to accommodate the future hierarchy use case, I would still like to go with a more generic name like location and that takes the semantics context out of the engine.

@JohnSully
Copy link
Contributor Author

JohnSully commented Jun 27, 2024

If you do want to add regional support later you need it to be semantically seperate since the cluster decisions you would make are different. You can't use the same field.

So we would either have to make the setting heirarchical now or just assume it will be a new field.

AZ typically implies a 1-2ms latency difference whereas a seperate region can be >100ms

@PingXie
Copy link
Member

PingXie commented Jun 27, 2024

AZ typically implies a 1-2ms latency difference whereas a seperate region can be >100ms

I was actually thinking about potentially smaller fault domains (than AZs) in the future proofing context.

I haven't seen anyone deploying a cluster across regions, for both the latency and cost reasons as you mentioned above. The latency would have a significant impact on the failover experience/reliability too.

The cross-region replication would make sense at the cluster level where one cluster running in region 1 replicates from another running in region 2. I am not sure if we ever need the region information in the future.

For the record, availablity-zone is not a blocker for me though I do have a preference for a term like location that leaves the interpretation to the operators.

@JohnSully
Copy link
Contributor Author

In the other thread we had a discussion on this as well, ultimately I think Availability Zone is an industry standard term (even has its own wikipedia: https://en.wikipedia.org/wiki/Availability_zone), and since we have working client code using that I would prefer if we don't change it.

If its not a blocker for you I hope we can take the term as is.

@madolson madolson added major-decision-approved Major decision approved by TSC team and removed major-decision-pending Major decision pending by TSC team labels Jun 27, 2024
@JohnSully
Copy link
Contributor Author

@madolson What is the remaining step to be merged?

@PingXie PingXie merged commit ad5704f into valkey-io:unstable Jun 27, 2024
19 checks passed
@PingXie
Copy link
Member

PingXie commented Jun 27, 2024

@madolson What is the remaining step to be merged?

Merged. Great discussion, @JohnSully.

@JohnSully
Copy link
Contributor Author

Thanks everyone! Excited to be done with my first contribution!

@JohnSully JohnSully deleted the availability-zone branch June 27, 2024 19:34
@madolson
Copy link
Member

@madolson What is the remaining step to be merged?

I went to get lunch, sorry :)

@JohnSully
Copy link
Contributor Author

I wasn't sure if there was a different part to the process. Enjoy your lunch :)

See you on the next PR

@zuiderkwast
Copy link
Contributor

zuiderkwast commented Jun 27, 2024

@JohnSully Regarding the process: API changes are "major decisions" so we need a majority of the core team to accept it. In this case 4 out of 6 [edit] all six of us expressed they're in favor, which is formal enough for merging. 😁

@madolson
Copy link
Member

In this case 4 out of 6 [edit] all six of us expressed they're in favor, which is formal enough for merging

Yeah, I was going to go again and ask for a vote but then notice everyone already chimed in saying sure. I don't think any PR has gotten us all to comment on it in less than 24 hours.

@rueian
Copy link
Contributor

rueian commented Dec 25, 2024

Hi, could the availability_zone information also be added to the HELLO response? I think it would be more convenient for clients to parse than in the INFO response. #1487

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
major-decision-approved Major decision approved by TSC team release-notes This issue should get a line item in the release notes
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

8 participants