Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dual stack and client-specific IPs in cluster #736

Merged
merged 23 commits into from
Jul 10, 2024

Conversation

zuiderkwast
Copy link
Contributor

@zuiderkwast zuiderkwast commented Jul 2, 2024

New configs:

  • cluster-announce-client-ipv4
  • cluster-announce-client-ipv6

New module API function:

  • ValkeyModule_GetClusterNodeInfoForClient, takes a client id and is otherwise just like its non-ForClient cousin.

If configured, one of these IP addresses are reported to each client in CLUSTER SLOTS, CLUSTER SHARDS, CLUSTER NODES and redirects, replacing the IP (custer-announce-ip or the auto-detected IP) of each node. Which one is reported to the client depends on whether the client is connected over IPv4 or IPv6.

Benefits:

  • This allows clients using IPv4 to get the IPv4 addresses of all cluster nodes and IPv6 clients to get the IPv6 addresses.
  • This allows the IPs visible to clients to be different to the IPs used between the cluster nodes due to NAT'ing.

The information is propagated in the cluster bus using new Ping extensions. (Old nodes without this feature ignore unknown Ping extensions.)

This adds another dimension to CLUSTER SLOTS reply. It now depends on the client's use of TLS, the IP address family and RESP version. Refactoring: The cached connection type definition is moved from connection.h (it actually has nothing to do with the connection abstraction) to server.h and is changed to a bitmap, with one bit for each of TLS, IPv6 and RESP3.

Fixes #337

The cached conn type enum is replaced by an integer bitmap with one bit
for TLS, IPv6 and RESP3.

The definition is moved from connection.h to server.h. It has actually
nothing to do with the connection.h abstraction.

Signed-off-by: Viktor Söderqvist <[email protected]>
Signed-off-by: Viktor Söderqvist <[email protected]>
Signed-off-by: Viktor Söderqvist <[email protected]>
Signed-off-by: Viktor Söderqvist <[email protected]>
Copy link

codecov bot commented Jul 2, 2024

Codecov Report

Attention: Patch coverage is 79.25926% with 28 lines in your changes missing coverage. Please review.

Project coverage is 70.27%. Comparing base (752b6ee) to head (7e237fc).
Report is 13 commits behind head on unstable.

Additional details and impacted files
@@             Coverage Diff              @@
##           unstable     #736      +/-   ##
============================================
+ Coverage     70.20%   70.27%   +0.06%     
============================================
  Files           111      111              
  Lines         60242    60365     +123     
============================================
+ Hits          42295    42419     +124     
+ Misses        17947    17946       -1     
Files Coverage Δ
src/cluster.c 88.30% <100.00%> (+0.03%) ⬆️
src/config.c 78.65% <100.00%> (+0.25%) ⬆️
src/connection.h 93.50% <ø> (ø)
src/networking.c 85.53% <100.00%> (+0.16%) ⬆️
src/server.h 100.00% <ø> (ø)
src/module.c 9.65% <11.11%> (+<0.01%) ⬆️
src/cluster_legacy.c 85.81% <78.02%> (-0.11%) ⬇️

... and 18 files with indirect coverage changes

@zuiderkwast zuiderkwast changed the title Dual IPs and client-specific IPs in cluster Dual stack and client-specific IPs in cluster Jul 2, 2024
Copy link
Member

@hwware hwware left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall LGTM, only 2 comments for this PR

valkey.conf Show resolved Hide resolved
valkey.conf Outdated Show resolved Hide resolved
Signed-off-by: Viktor Söderqvist <[email protected]>
src/cluster_legacy.c Outdated Show resolved Hide resolved
src/cluster_legacy.c Show resolved Hide resolved
src/cluster_legacy.c Outdated Show resolved Hide resolved
src/cluster_legacy.c Outdated Show resolved Hide resolved
src/cluster_legacy.c Outdated Show resolved Hide resolved
src/cluster_legacy.c Outdated Show resolved Hide resolved
src/cluster_legacy.h Outdated Show resolved Hide resolved
src/cluster_legacy.c Show resolved Hide resolved
…alled with)

Updates code for human nodename too, accordingly.

Signed-off-by: Viktor Söderqvist <[email protected]>
Signed-off-by: Viktor Söderqvist <[email protected]>
Copy link
Member

@PingXie PingXie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am done reviewing. Overall LGTM with 2 areas that I hope we can improve:

  1. boilerplate code reduction
  2. naming consistency

Also, I think we will need a new VM_GetClusterNodeInfo module API that takes a client * argument.

src/cluster_legacy.c Show resolved Hide resolved
src/cluster_legacy.c Outdated Show resolved Hide resolved
src/cluster_legacy.c Outdated Show resolved Hide resolved
src/cluster_legacy.c Show resolved Hide resolved
src/cluster_legacy.c Outdated Show resolved Hide resolved
src/module.c Outdated Show resolved Hide resolved
Signed-off-by: Viktor Söderqvist <[email protected]>
Signed-off-by: Viktor Söderqvist <[email protected]>
and add 'static' for some internal functions.

Signed-off-by: Viktor Söderqvist <[email protected]>
Signed-off-by: Viktor Söderqvist <[email protected]>
Copy link
Member

@PingXie PingXie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR looks great! :-)

src/cluster_legacy.c Show resolved Hide resolved
src/cluster_legacy.c Show resolved Hide resolved
src/cluster_legacy.h Outdated Show resolved Hide resolved
valkey.conf Show resolved Hide resolved
Signed-off-by: Viktor Söderqvist <[email protected]>
Signed-off-by: Viktor Söderqvist <[email protected]>
Signed-off-by: Viktor Söderqvist <[email protected]>
@zuiderkwast zuiderkwast added cluster release-notes This issue should get a line item in the release notes needs-doc-pr This change needs to update a documentation page. Remove label once doc PR is open. labels Jul 7, 2024
Copy link
Member

@PingXie PingXie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@zuiderkwast zuiderkwast merged commit a323dce into valkey-io:unstable Jul 10, 2024
19 checks passed
@zuiderkwast zuiderkwast deleted the dual-ip-cluster branch July 10, 2024 11:53
zvi-code pushed a commit to zvi-code/valkey-z that referenced this pull request Jul 14, 2024
New configs:

* `cluster-announce-client-ipv4`
* `cluster-announce-client-ipv6`

New module API function:

* `ValkeyModule_GetClusterNodeInfoForClient`, takes a client id and is
otherwise just like its non-ForClient cousin.

If configured, one of these IP addresses are reported to each client in
CLUSTER SLOTS, CLUSTER SHARDS, CLUSTER NODES and redirects, replacing
the IP (`custer-announce-ip` or the auto-detected IP) of each node.
Which one is reported to the client depends on whether the client is
connected over IPv4 or IPv6.

Benefits:

* This allows clients using IPv4 to get the IPv4 addresses of all
cluster nodes and IPv6 clients to get the IPv6 clients.
* This allows the IPs visible to clients to be different to the IPs used
between the cluster nodes due to NAT'ing.

The information is propagated in the cluster bus using new Ping
extensions. (Old nodes without this feature ignore unknown Ping
extensions.)

This adds another dimension to CLUSTER SLOTS reply. It now depends on
the client's use of TLS, the IP address family and RESP version.
Refactoring: The cached connection type definition is moved from
connection.h (it actually has nothing to do with the connection
abstraction) to server.h and is changed to a bitmap, with one bit for
each of TLS, IPv6 and RESP3.

Fixes valkey-io#337

---------

Signed-off-by: Viktor Söderqvist <[email protected]>
PingXie added a commit that referenced this pull request Aug 29, 2024
Fix a bug in isValidAuxChar where valid characters '.' and ':' were
incorrectly included in the banned charset. This issue affected the
validation of auxiliary fields in the nodes.conf file used by Valkey in
cluster mode, particularly when handling IPv4 and IPv6 addresses. The
code now correctly allows '.' and ':' as valid characters, ensuring
proper handling of these fields. Comments were added to clarify the use
of the banned charset.
 
Related to #736

---------

Signed-off-by: Ping Xie <[email protected]>
madolson pushed a commit that referenced this pull request Sep 2, 2024
Fix a bug in isValidAuxChar where valid characters '.' and ':' were
incorrectly included in the banned charset. This issue affected the
validation of auxiliary fields in the nodes.conf file used by Valkey in
cluster mode, particularly when handling IPv4 and IPv6 addresses. The
code now correctly allows '.' and ':' as valid characters, ensuring
proper handling of these fields. Comments were added to clarify the use
of the banned charset.
 
Related to #736

---------

Signed-off-by: Ping Xie <[email protected]>
madolson pushed a commit that referenced this pull request Sep 3, 2024
Fix a bug in isValidAuxChar where valid characters '.' and ':' were
incorrectly included in the banned charset. This issue affected the
validation of auxiliary fields in the nodes.conf file used by Valkey in
cluster mode, particularly when handling IPv4 and IPv6 addresses. The
code now correctly allows '.' and ':' as valid characters, ensuring
proper handling of these fields. Comments were added to clarify the use
of the banned charset.
 
Related to #736

---------

Signed-off-by: Ping Xie <[email protected]>
@erikschul
Copy link

erikschul commented Sep 15, 2024

I think cluster-announce-client-ipv4 works for me. Without it, the MOVED ip is the private ip.

But, the cluster information shows the client-ipv4 instead of the correct private IP?

I create the cluster using

docker exec -it valkey-master1 valkey-cli --cluster create   172.18.0.11:7001   172.18.0.12:7002   172.18.0.13:7003
>>> Performing hash slots allocation on 3 nodes...
Primary[0] -> Slots 0 - 5460
Primary[1] -> Slots 5461 - 10922
Primary[2] -> Slots 10923 - 16383
M: fb48fb622b43a33d664745b8216c4ae301b7a386 172.18.0.11:7001
   slots:[0-5460] (5461 slots) master
M: 84ffa91847fd4c54ad69d01ead5d884ef4662d57 172.18.0.12:7002
   slots:[5461-10922] (5462 slots) master
M: c2a0cee3e478e14a83a5299944f912d0f1cf625c 172.18.0.13:7003
   slots:[10923-16383] (5461 slots) master
Can I set the above configuration? (type 'yes' to accept): yes
>>> Nodes configuration updated
>>> Assign a different config epoch to each node
>>> Sending CLUSTER MEET messages to join the cluster

but the result changes the IP to the public IP:

>>> Performing Cluster Check (using node 172.18.0.11:7001)
M: e17a7207e99e33c81ff0d8b0efa45ffbc4f0b91d 172.18.0.11:7001
   slots:[0-5460] (5461 slots) master
M: 48e7859071ee82bdfda8d8fd7abbc9020b31f0a0 192.168.211.128:7002
   slots:[5461-10922] (5462 slots) master
M: f4abd3b03585c678b172096a8733b8d43021a3b8 192.168.211.128:7003
   slots:[10923-16383] (5461 slots) master
[OK] All nodes agree about slots configuration.
>>> Check for open slots...
>>> Check slots coverage...
[OK] All 16384 slots covered.
docker exec -it valkey-master1 valkey-cli -p 7001 cluster nodes
48e7859071ee82bdfda8d8fd7abbc9020b31f0a0 192.168.211.128:7002@17002 master - 0 1726399184026 2 connected 5461-10922
e17a7207e99e33c81ff0d8b0efa45ffbc4f0b91d 192.168.211.128:7001@17001 myself,master - 0 0 1 connected 0-5460
f4abd3b03585c678b172096a8733b8d43021a3b8 192.168.211.128:7003@17003 master - 0 1726399185042 3 connected 10923-16383

The weird thing is that it works. The cluster ports (17001-17003) are not exposed. Can the cluster appear to work correctly even if these ports are not reachable?
I'm not sure if the problem is that the ports are not used, and that the IP is set incorrectly,
or if this is just a display issue with the cluster nodes command.
etc.

@zuiderkwast
Copy link
Contributor Author

Hi Erik,

well, the valkey nodes use the the private IP and cluster ports (1700x) when they talk to each other. The cluster ports don't need to be reachable by clients.

I agree it's a bit weird that CLUSTER NODES displays the cluster ports together with the client IPs. I don't know how we can display it in a backward-compatible way. Do you have a better idea?

CLUSTER SLOTS doesn't show the cluster ports. It's only for clients.

@erikschul
Copy link

@zuiderkwast Oh, right, it makes sense that from the client's perspective (valkey-cli), the IPs are overridden.

But then how can you monitor the actual configuration?
I would argue that the CLUSTER commands shouldn't have the IPs overridden, except if client implementations use this information for hashing the topology view etc.

Is there another way to monitor the cluster health and configuration? It makes debugging complicated when the private ips are not known.
Perhaps they can be implemented as "VALKEY CLUSTER INFO" to allow for a custom implementation, perhaps showing both public and private ips. It could also help to have a server name.

@zuiderkwast
Copy link
Contributor Author

Client libraries are supposed to use CLUSTER SLOTS to discover the cluster slot mapping, but we know that several clients still use CLUSTER NODES for this. Breaking clients seems risky.

Are you relying on CLUSTER NODES in some admin script or tool that will break with this new config?

There's yet another command: CLUSTER SHARDS. This one is more extensible with key-value fields so we can probably consider including all IP variants there.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cluster needs-doc-pr This change needs to update a documentation page. Remove label once doc PR is open. release-notes This issue should get a line item in the release notes
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

[NEW] Dual stack (IPv4 & IPv6)
5 participants