-
Notifications
You must be signed in to change notification settings - Fork 704
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Upstream the availability zone info string from KeyDB #700
Conversation
@JohnSully DCO need to be signed off, Thanks |
1c2868a
to
7ad6d51
Compare
done |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with your statement that it would be ideal to put in CLUSTER SHARDS
or CLUSTER SLOTS
(we don't want clients using CLUSTER NODES
since it's hard to release stuff in a compatible way) and we can propagate the information around a cluster, but if this is sufficient for your needs I'm okay with it.
@valkey-io/core-team thoughts?
7ad6d51
to
a9f8347
Compare
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## unstable #700 +/- ##
============================================
- Coverage 70.20% 70.02% -0.18%
============================================
Files 110 110
Lines 60104 60104
============================================
- Hits 42195 42090 -105
- Misses 17909 18014 +105
|
Because we have >200 clusters and 150M QPS running successfully with this change I'd suggest taking this as is and adding it to more locations later on if that is desired. It makes sense to be shown in the info command, and we can make future PRs putting it in more places if that is desired although for our client we don't need it anywhere else. |
7ee0506
to
efdbaa2
Compare
I am ok with this information. |
Signed-off-by: John Sully <[email protected]>
efdbaa2
to
76beec6
Compare
Its ideal if we don't change the existing ABI otherwise it will create an unecessary compat change. It seems people want it exposed in more places but that is a superset change and you'd still want this exposed in INFO. |
Looks like the test failure is a timeout, is this one known to be glitchy? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey John! Welcome to Valkey!
LGTM.
Should we recommend clients to implement this? Should we request our official clients to implement it?
An observation: The client needs to issue INFO to each replica to find out which one to use. If we put it in CLUSTER SLOTS (in a future PR), then the client can find it faster and without connecting to all replicas. This optimization is probably relevant only for short-lived connections.
@zuiderkwast can you elaborate on the client support? This is part of the INFO output. Does any client provide higher level abstraction than a "map" or "dictionary" for the INFO output? I am aligned on adding cluster shards/slots support at a later time too. Cluster nodes is indeed a tricky one and I have seen clients choked on new fields before. We can have the discussion when we get there. |
So this has been in KeyDB for a while but I don't know if any clients except our internal proxy service has implemented support for it. Because of the way that works we are already issuing an INFO on connect for a few other reasons so we just grab it from there. I think if we consult with other client authors on their preference we can expose it in more places later. But you'd still want to be able to run an info and quickly grab it even if its in other places. An example of a human use of this is we've used it when debugging AZ imbalances where replicas aren't properly dispersed across them. |
@JohnSully nice feature! And I'm also curious about how your users are utilizing this feature. As you mentioned, there's a proxy that acts as middleware to route users to a closer availability zone, but how is the proxy deployed? As I understand it, there could be two scenarios:
Personally, I think having client-side native support for availability zone routing would be more universal. Additionally, are there any experiences that you could share about the actual operation? For example, how to migrate across availability zones, and how to conduct availability zone-level failover when an entire availability zone fails? These experiences would be very helpful for us and the users, thank you. |
Hi @soloestoy We implemented this at Snap because x-AZ bandwidth costs are extremely expensive so this setting saves us millions or dollars. It is not necessary to use our proxy to take advantage but support would have to be added to the client library. In our case we send an info command on connect and then store the AZ data internally. When we next update the cluster topology we use this data to place all replicas in a priority queue and preferentially use the local server unless there is some issue with it. "Local" is defined as the remote server has a matching string to the AZ configured in our client. We do plan to open source our proxy but this feature is by no means tied to it, that just happens to be the first "client" that implemented it. The proxy is more about making our clusters work with our service mesh architecture. |
so the configuration item is meant to be set by the cloud provider, or the cluster admin? it is a good idea that the client has more choice to do this route. we also have a zoneid (int) configuration item in ours fork, it is used to prioritize the local replica (in the same AZ) to the promoted to the primary during cluster failover. We also have a proxy in front, the traffic will be routed to the AZ that is the same or close to the application. |
We self host so we set it as part of our kubernetes init scripts. Fully managed services could set it on startup as well by the provider. This is designed to be very flexible for lots of different environments so it is not very opinionated. Just a simple string that you can read back. The benefit to the string over an int is you can use the actual name your provider uses so its a bit easier to use. We typically use 1 node per AZ so haven't needed to prioritize a same zone failure (and so did not implement that), but it would be a logical next step. |
@enjoy-binbin @JohnSully If you need a hierarchy of zones, are there names for the different levels? I've seen words like "zone", "failure zone", "region" and "availability zone", e.g. this page: https://kubernetes.io/docs/setup/best-practices/multiple-zones/ Would it be enough to encode the hierarchy in one string? The example "us-east-1" looks to me like a hierarchy separated by dashes. If this is enough for all use cases, then I hope just one string config "availability-zone" is enough. |
@zuiderkwast we do not have a need for heirarchies as we don't do x-region replication. If those were needed they could be a separate field. I'm hesitant to overarchitect something without an exact idea of how it will get used whereas this change while small and targeted is actually very impactful cost wise if you are in the cloud. |
The exact hierarchy is IMO more of an operator or infra decision (where k8s is operating). If we would like to accommodate the future hierarchy use case, I would still like to go with a more generic name like |
If you do want to add regional support later you need it to be semantically seperate since the cluster decisions you would make are different. You can't use the same field. So we would either have to make the setting heirarchical now or just assume it will be a new field. AZ typically implies a 1-2ms latency difference whereas a seperate region can be >100ms |
I was actually thinking about potentially smaller fault domains (than AZs) in the future proofing context. I haven't seen anyone deploying a cluster across regions, for both the latency and cost reasons as you mentioned above. The latency would have a significant impact on the failover experience/reliability too. The cross-region replication would make sense at the cluster level where one cluster running in region 1 replicates from another running in region 2. I am not sure if we ever need the region information in the future. For the record, |
In the other thread we had a discussion on this as well, ultimately I think Availability Zone is an industry standard term (even has its own wikipedia: https://en.wikipedia.org/wiki/Availability_zone), and since we have working client code using that I would prefer if we don't change it. If its not a blocker for you I hope we can take the term as is. |
@madolson What is the remaining step to be merged? |
Merged. Great discussion, @JohnSully. |
Thanks everyone! Excited to be done with my first contribution! |
I went to get lunch, sorry :) |
I wasn't sure if there was a different part to the process. Enjoy your lunch :) See you on the next PR |
@JohnSully Regarding the process: API changes are "major decisions" so we need a majority of the core team to accept it. In this case |
Yeah, I was going to go again and ask for a vote but then notice everyone already chimed in saying sure. I don't think any PR has gotten us all to comment on it in less than 24 hours. |
Hi, could the |
When Redis/Valkey/KeyDB is run in a cloud environment across multiple AZ's it is preferable to keep traffic local to an AZ both for cost reasons and for latency. This is typically done when you are enabling reads on replicas with the READONLY command.
For this change we are creating a setting that is echo'd back in the info command. We do not want to add the cloud SDKs as dependencies and this is the easiest way around that. It is fairly trivial to grab the AZ from the cloud and push that into your setting file.
Currently at Snapchat we have a custom client that after connecting reads this from the server and will preferentially use that server if the AZ string matches its internally configured AZ.
In the future it would be ideal if we used this information when performing failover or even exposed it in cluster nodes.