Upstream the availability zone info string from KeyDB #700

JohnSully · 2024-06-26T20:05:06Z

When Redis/Valkey/KeyDB is run in a cloud environment across multiple AZ's it is preferable to keep traffic local to an AZ both for cost reasons and for latency. This is typically done when you are enabling reads on replicas with the READONLY command.

For this change we are creating a setting that is echo'd back in the info command. We do not want to add the cloud SDKs as dependencies and this is the easiest way around that. It is fairly trivial to grab the AZ from the cloud and push that into your setting file.

Currently at Snapchat we have a custom client that after connecting reads this from the server and will preferentially use that server if the AZ string matches its internally configured AZ.

In the future it would be ideal if we used this information when performing failover or even exposed it in cluster nodes.

hwware · 2024-06-26T20:46:04Z

@JohnSully DCO need to be signed off, Thanks

JohnSully · 2024-06-26T20:50:28Z

done

madolson

I agree with your statement that it would be ideal to put in CLUSTER SHARDS or CLUSTER SLOTS (we don't want clients using CLUSTER NODES since it's hard to release stuff in a compatible way) and we can propagate the information around a cluster, but if this is sufficient for your needs I'm okay with it.

@valkey-io/core-team thoughts?

src/server.h

valkey.conf

codecov · 2024-06-26T20:58:23Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 70.02%. Comparing base (ab38730) to head (76beec6).
Report is 1 commits behind head on unstable.

Additional details and impacted files

@@             Coverage Diff              @@
##           unstable     #700      +/-   ##
============================================
- Coverage     70.20%   70.02%   -0.18%     
============================================
  Files           110      110              
  Lines         60104    60104              
============================================
- Hits          42195    42090     -105     
- Misses        17909    18014     +105

Files	Coverage Δ
src/config.c	`78.39% <ø> (+0.33%)`	⬆️
src/server.c	`88.54% <ø> (ø)`
src/server.h	`100.00% <ø> (ø)`

... and 14 files with indirect coverage changes

JohnSully · 2024-06-26T20:59:19Z

Because we have >200 clusters and 150M QPS running successfully with this change I'd suggest taking this as is and adding it to more locations later on if that is desired. It makes sense to be shown in the info command, and we can make future PRs putting it in more places if that is desired although for our client we don't need it anywhere else.

hwware · 2024-06-26T21:04:06Z

I agree with your statement that it would be ideal to put in CLUSTER SHARDS or CLUSTER SLOTS (we don't want clients using CLUSTER NODES since it's hard to release stuff in a compatible way) and we can propagate the information around a cluster, but if this is sufficient for your needs I'm okay with it.

@valkey-io/core-team thoughts?

I am ok with this information.

Signed-off-by: John Sully <[email protected]>

JohnSully · 2024-06-26T21:07:23Z

Its ideal if we don't change the existing ABI otherwise it will create an unecessary compat change. It seems people want it exposed in more places but that is a superset change and you'd still want this exposed in INFO.

JohnSully · 2024-06-26T21:57:29Z

Looks like the test failure is a timeout, is this one known to be glitchy?

zuiderkwast

Hey John! Welcome to Valkey!

LGTM.

Should we recommend clients to implement this? Should we request our official clients to implement it?

An observation: The client needs to issue INFO to each replica to find out which one to use. If we put it in CLUSTER SLOTS (in a future PR), then the client can find it faster and without connecting to all replicas. This optimization is probably relevant only for short-lived connections.

PingXie · 2024-06-27T01:41:54Z

Should we recommend clients to implement this? Should we request our official clients to implement it

@zuiderkwast can you elaborate on the client support? This is part of the INFO output. Does any client provide higher level abstraction than a "map" or "dictionary" for the INFO output?

I am aligned on adding cluster shards/slots support at a later time too. Cluster nodes is indeed a tricky one and I have seen clients choked on new fields before. We can have the discussion when we get there.

JohnSully · 2024-06-27T01:46:26Z

So this has been in KeyDB for a while but I don't know if any clients except our internal proxy service has implemented support for it. Because of the way that works we are already issuing an INFO on connect for a few other reasons so we just grab it from there.

I think if we consult with other client authors on their preference we can expose it in more places later. But you'd still want to be able to run an info and quickly grab it even if its in other places.

An example of a human use of this is we've used it when debugging AZ imbalances where replicas aren't properly dispersed across them.

soloestoy · 2024-06-27T02:30:33Z

@JohnSully nice feature! And I'm also curious about how your users are utilizing this feature. As you mentioned, there's a proxy that acts as middleware to route users to a closer availability zone, but how is the proxy deployed? As I understand it, there could be two scenarios:

the proxy could be deployed as a sidecar alongside the client on the same machine or within the same availability zone. In this case, the proxy seems to actually be a part of the client, implementing the client's routing functionality.
proxies could also be a set of remote services deployed across multiple availability zones. In this scenario, clients might still need to access proxies across availability zones, which would mean that the proxies should have information about the availability zones they are deployed in.

Personally, I think having client-side native support for availability zone routing would be more universal.

Additionally, are there any experiences that you could share about the actual operation? For example, how to migrate across availability zones, and how to conduct availability zone-level failover when an entire availability zone fails? These experiences would be very helpful for us and the users, thank you.

JohnSully · 2024-06-27T02:37:25Z

Hi @soloestoy

We implemented this at Snap because x-AZ bandwidth costs are extremely expensive so this setting saves us millions or dollars. It is not necessary to use our proxy to take advantage but support would have to be added to the client library.

In our case we send an info command on connect and then store the AZ data internally. When we next update the cluster topology we use this data to place all replicas in a priority queue and preferentially use the local server unless there is some issue with it. "Local" is defined as the remote server has a matching string to the AZ configured in our client.

We do plan to open source our proxy but this feature is by no means tied to it, that just happens to be the first "client" that implemented it. The proxy is more about making our clusters work with our service mesh architecture.

enjoy-binbin · 2024-06-27T03:27:03Z

so the configuration item is meant to be set by the cloud provider, or the cluster admin?

it is a good idea that the client has more choice to do this route.

we also have a zoneid (int) configuration item in ours fork, it is used to prioritize the local replica (in the same AZ) to the promoted to the primary during cluster failover. We also have a proxy in front, the traffic will be routed to the AZ that is the same or close to the application.

JohnSully · 2024-06-27T03:35:17Z

We self host so we set it as part of our kubernetes init scripts. Fully managed services could set it on startup as well by the provider. This is designed to be very flexible for lots of different environments so it is not very opinionated. Just a simple string that you can read back.

The benefit to the string over an int is you can use the actual name your provider uses so its a bit easier to use.

We typically use 1 node per AZ so haven't needed to prioritize a same zone failure (and so did not implement that), but it would be a logical next step.

zuiderkwast · 2024-06-27T07:44:40Z

@enjoy-binbin @JohnSully If you need a hierarchy of zones, are there names for the different levels? I've seen words like "zone", "failure zone", "region" and "availability zone", e.g. this page: https://kubernetes.io/docs/setup/best-practices/multiple-zones/

Would it be enough to encode the hierarchy in one string? The example "us-east-1" looks to me like a hierarchy separated by dashes. If this is enough for all use cases, then I hope just one string config "availability-zone" is enough.

JohnSully · 2024-06-27T16:45:52Z

@zuiderkwast we do not have a need for heirarchies as we don't do x-region replication. If those were needed they could be a separate field. I'm hesitant to overarchitect something without an exact idea of how it will get used whereas this change while small and targeted is actually very impactful cost wise if you are in the cloud.

PingXie · 2024-06-27T16:52:29Z

If you need a hierarchy of zones, are there names for the different levels?

The exact hierarchy is IMO more of an operator or infra decision (where k8s is operating). If we would like to accommodate the future hierarchy use case, I would still like to go with a more generic name like location and that takes the semantics context out of the engine.

JohnSully · 2024-06-27T16:53:49Z

If you do want to add regional support later you need it to be semantically seperate since the cluster decisions you would make are different. You can't use the same field.

So we would either have to make the setting heirarchical now or just assume it will be a new field.

AZ typically implies a 1-2ms latency difference whereas a seperate region can be >100ms

PingXie · 2024-06-27T17:22:21Z

AZ typically implies a 1-2ms latency difference whereas a seperate region can be >100ms

I was actually thinking about potentially smaller fault domains (than AZs) in the future proofing context.

I haven't seen anyone deploying a cluster across regions, for both the latency and cost reasons as you mentioned above. The latency would have a significant impact on the failover experience/reliability too.

The cross-region replication would make sense at the cluster level where one cluster running in region 1 replicates from another running in region 2. I am not sure if we ever need the region information in the future.

For the record, availablity-zone is not a blocker for me though I do have a preference for a term like location that leaves the interpretation to the operators.

JohnSully · 2024-06-27T17:50:03Z

In the other thread we had a discussion on this as well, ultimately I think Availability Zone is an industry standard term (even has its own wikipedia: https://en.wikipedia.org/wiki/Availability_zone), and since we have working client code using that I would prefer if we don't change it.

If its not a blocker for you I hope we can take the term as is.

JohnSully · 2024-06-27T18:55:00Z

@madolson What is the remaining step to be merged?

PingXie · 2024-06-27T19:31:05Z

@madolson What is the remaining step to be merged?

Merged. Great discussion, @JohnSully.

JohnSully · 2024-06-27T19:34:41Z

Thanks everyone! Excited to be done with my first contribution!

madolson · 2024-06-27T19:46:18Z

@madolson What is the remaining step to be merged?

I went to get lunch, sorry :)

JohnSully · 2024-06-27T19:50:54Z

I wasn't sure if there was a different part to the process. Enjoy your lunch :)

See you on the next PR

zuiderkwast · 2024-06-27T21:07:21Z

@JohnSully Regarding the process: API changes are "major decisions" so we need a majority of the core team to accept it. In this case ~~4 out of 6~~ [edit] all six of us expressed they're in favor, which is formal enough for merging. 😁

madolson · 2024-06-27T22:41:50Z

In this case 4 out of 6 [edit] all six of us expressed they're in favor, which is formal enough for merging

Yeah, I was going to go again and ask for a vote but then notice everyone already chimed in saying sure. I don't think any PR has gotten us all to comment on it in less than 24 hours.

rueian · 2024-12-25T16:51:56Z

Hi, could the availability_zone information also be added to the HELLO response? I think it would be more convenient for clients to parse than in the INFO response. #1487

JohnSully force-pushed the availability-zone branch from 1c2868a to 7ad6d51 Compare June 26, 2024 20:48

madolson reviewed Jun 26, 2024

View reviewed changes

src/server.h Outdated Show resolved Hide resolved

valkey.conf Outdated Show resolved Hide resolved

madolson added release-notes This issue should get a line item in the release notes major-decision-pending Major decision pending by TSC team labels Jun 26, 2024

JohnSully force-pushed the availability-zone branch from 7ad6d51 to a9f8347 Compare June 26, 2024 20:57

JohnSully force-pushed the availability-zone branch from 7ee0506 to efdbaa2 Compare June 26, 2024 21:00

Upstream the availability zone info string from KeyDB

76beec6

Signed-off-by: John Sully <[email protected]>

JohnSully force-pushed the availability-zone branch from efdbaa2 to 76beec6 Compare June 26, 2024 21:05

zuiderkwast approved these changes Jun 26, 2024

View reviewed changes

madolson added major-decision-approved Major decision approved by TSC team and removed major-decision-pending Major decision pending by TSC team labels Jun 27, 2024

PingXie merged commit ad5704f into valkey-io:unstable Jun 27, 2024
19 checks passed

JohnSully deleted the availability-zone branch June 27, 2024 19:34

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Upstream the availability zone info string from KeyDB #700

Upstream the availability zone info string from KeyDB #700

JohnSully commented Jun 26, 2024 •

edited

Loading

hwware commented Jun 26, 2024

JohnSully commented Jun 26, 2024

madolson left a comment

codecov bot commented Jun 26, 2024 •

edited

Loading

JohnSully commented Jun 26, 2024 •

edited

Loading

hwware commented Jun 26, 2024

JohnSully commented Jun 26, 2024

JohnSully commented Jun 26, 2024

zuiderkwast left a comment •

edited

Loading

PingXie commented Jun 27, 2024

JohnSully commented Jun 27, 2024 •

edited

Loading

soloestoy commented Jun 27, 2024

JohnSully commented Jun 27, 2024 •

edited

Loading

enjoy-binbin commented Jun 27, 2024

JohnSully commented Jun 27, 2024 •

edited

Loading

zuiderkwast commented Jun 27, 2024

JohnSully commented Jun 27, 2024

PingXie commented Jun 27, 2024

JohnSully commented Jun 27, 2024 •

edited

Loading

PingXie commented Jun 27, 2024

JohnSully commented Jun 27, 2024

JohnSully commented Jun 27, 2024

PingXie commented Jun 27, 2024

JohnSully commented Jun 27, 2024

madolson commented Jun 27, 2024

JohnSully commented Jun 27, 2024

zuiderkwast commented Jun 27, 2024 •

edited

Loading

madolson commented Jun 27, 2024

rueian commented Dec 25, 2024

Upstream the availability zone info string from KeyDB #700

Upstream the availability zone info string from KeyDB #700

Conversation

JohnSully commented Jun 26, 2024 • edited Loading

hwware commented Jun 26, 2024

JohnSully commented Jun 26, 2024

madolson left a comment

Choose a reason for hiding this comment

codecov bot commented Jun 26, 2024 • edited Loading

Codecov Report

JohnSully commented Jun 26, 2024 • edited Loading

hwware commented Jun 26, 2024

JohnSully commented Jun 26, 2024

JohnSully commented Jun 26, 2024

zuiderkwast left a comment • edited Loading

Choose a reason for hiding this comment

PingXie commented Jun 27, 2024

JohnSully commented Jun 27, 2024 • edited Loading

soloestoy commented Jun 27, 2024

JohnSully commented Jun 27, 2024 • edited Loading

enjoy-binbin commented Jun 27, 2024

JohnSully commented Jun 27, 2024 • edited Loading

zuiderkwast commented Jun 27, 2024

JohnSully commented Jun 27, 2024

PingXie commented Jun 27, 2024

JohnSully commented Jun 27, 2024 • edited Loading

PingXie commented Jun 27, 2024

JohnSully commented Jun 27, 2024

JohnSully commented Jun 27, 2024

PingXie commented Jun 27, 2024

JohnSully commented Jun 27, 2024

madolson commented Jun 27, 2024

JohnSully commented Jun 27, 2024

zuiderkwast commented Jun 27, 2024 • edited Loading

madolson commented Jun 27, 2024

rueian commented Dec 25, 2024

JohnSully commented Jun 26, 2024 •

edited

Loading

codecov bot commented Jun 26, 2024 •

edited

Loading

JohnSully commented Jun 26, 2024 •

edited

Loading

zuiderkwast left a comment •

edited

Loading

JohnSully commented Jun 27, 2024 •

edited

Loading

JohnSully commented Jun 27, 2024 •

edited

Loading

JohnSully commented Jun 27, 2024 •

edited

Loading

JohnSully commented Jun 27, 2024 •

edited

Loading

zuiderkwast commented Jun 27, 2024 •

edited

Loading