Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

valkey-cli make source node ignores NOREPLICAS when doing the last CLUSTER SETSLOT #928

Merged
merged 2 commits into from
Aug 23, 2024

Conversation

enjoy-binbin
Copy link
Member

@enjoy-binbin enjoy-binbin commented Aug 21, 2024

This fixes #899. In that issue, the primary is cluster-allow-replica-migration no
and its replica is cluster-allow-replica-migration yes.

And during the slot migration:

  1. Primary calling blockClientForReplicaAck, waiting its replica.
  2. Its replica reconfiguring itself as a replica of other shards due to
    replica migration and disconnect from the old primary.
  3. The old primary never got the chance to receive the ack, so it got a
    timeout and got a NOREPLICAS error.

In this case, the replicas might automatically migrate to another primary,
resulting in the client being unblocked with the NOREPLICAS error. In this
case, since the configuration will eventually propagate itself, we can safely
ignore this error on the source node.

…USTER SETSLOT

This fixes valkey-io#899. In that issue, the primary is cluster-allow-replica-migration no
and its replica is cluster-allow-replica-migration yes.

And during the slot migration:
1. Primary calling blockClientForReplicaAck, waiting its replica.
2. Its replica reconfiguring itself as a replica of other shards due to
replica migration and disconnect from the old primary.
3. The old primary never got the chance to receive the ack, so it got a
timeout and got a NOREPLICAS error.

In this case, the replicas were automatically migrates to another primary,
resulting in the client being unblocked with the NOREPLICAS error. In this
case, since the configuration will eventually propagate itself, so we ignore
this error in the source node side.

Signed-off-by: Binbin <[email protected]>
@enjoy-binbin enjoy-binbin requested a review from PingXie August 21, 2024 07:41
@enjoy-binbin enjoy-binbin added the release-notes This issue should get a line item in the release notes label Aug 21, 2024
Copy link

codecov bot commented Aug 21, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 70.59%. Comparing base (e1b3629) to head (6b0fed8).
Report is 12 commits behind head on unstable.

Additional details and impacted files
@@             Coverage Diff              @@
##           unstable     #928      +/-   ##
============================================
- Coverage     70.61%   70.59%   -0.02%     
============================================
  Files           112      112              
  Lines         61509    61512       +3     
============================================
- Hits          43434    43427       -7     
- Misses        18075    18085      +10     
Files Coverage Δ
src/valkey-cli.c 55.52% <ø> (+0.06%) ⬆️

... and 13 files with indirect coverage changes

Copy link
Member

@PingXie PingXie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @enjoy-binbin!

src/valkey-cli.c Outdated Show resolved Hide resolved
src/valkey-cli.c Outdated Show resolved Hide resolved
tests/unit/cluster/replica-migration.tcl Outdated Show resolved Hide resolved
tests/unit/cluster/replica-migration.tcl Outdated Show resolved Hide resolved
tests/unit/cluster/replica-migration.tcl Outdated Show resolved Hide resolved
Co-authored-by: Ping Xie <[email protected]>

Signed-off-by: Binbin <[email protected]>
@enjoy-binbin enjoy-binbin merged commit 8045994 into valkey-io:unstable Aug 23, 2024
47 checks passed
@enjoy-binbin enjoy-binbin deleted the fix_setslot_cli branch August 23, 2024 08:22
madolson pushed a commit that referenced this pull request Sep 2, 2024
…USTER SETSLOT (#928)

This fixes #899. In that issue, the primary is cluster-allow-replica-migration no
and its replica is cluster-allow-replica-migration yes.

And during the slot migration:
1. Primary calling blockClientForReplicaAck, waiting its replica.
2. Its replica reconfiguring itself as a replica of other shards due to
replica migration and disconnect from the old primary.
3. The old primary never got the chance to receive the ack, so it got a
timeout and got a NOREPLICAS error.

In this case, the replicas might automatically migrate to another primary,
resulting in the client being unblocked with the NOREPLICAS error. In this
case, since the configuration will eventually propagate itself, we can safely
ignore this error on the source node.

Signed-off-by: Binbin <[email protected]>
madolson pushed a commit that referenced this pull request Sep 3, 2024
…USTER SETSLOT (#928)

This fixes #899. In that issue, the primary is cluster-allow-replica-migration no
and its replica is cluster-allow-replica-migration yes.

And during the slot migration:
1. Primary calling blockClientForReplicaAck, waiting its replica.
2. Its replica reconfiguring itself as a replica of other shards due to
replica migration and disconnect from the old primary.
3. The old primary never got the chance to receive the ack, so it got a
timeout and got a NOREPLICAS error.

In this case, the replicas might automatically migrate to another primary,
resulting in the client being unblocked with the NOREPLICAS error. In this
case, since the configuration will eventually propagate itself, we can safely
ignore this error on the source node.

Signed-off-by: Binbin <[email protected]>
PingXie added a commit to PingXie/valkey that referenced this pull request Sep 14, 2024
…USTER SETSLOT (valkey-io#928)

This fixes valkey-io#899. In that issue, the primary is cluster-allow-replica-migration no
and its replica is cluster-allow-replica-migration yes.

And during the slot migration:
1. Primary calling blockClientForReplicaAck, waiting its replica.
2. Its replica reconfiguring itself as a replica of other shards due to
replica migration and disconnect from the old primary.
3. The old primary never got the chance to receive the ack, so it got a
timeout and got a NOREPLICAS error.

In this case, the replicas might automatically migrate to another primary,
resulting in the client being unblocked with the NOREPLICAS error. In this
case, since the configuration will eventually propagate itself, we can safely
ignore this error on the source node.

Signed-off-by: Binbin <[email protected]>
Signed-off-by: Ping Xie <[email protected]>
PingXie added a commit to PingXie/valkey that referenced this pull request Sep 14, 2024
…USTER SETSLOT (valkey-io#928)

This fixes valkey-io#899. In that issue, the primary is cluster-allow-replica-migration no
and its replica is cluster-allow-replica-migration yes.

And during the slot migration:
1. Primary calling blockClientForReplicaAck, waiting its replica.
2. Its replica reconfiguring itself as a replica of other shards due to
replica migration and disconnect from the old primary.
3. The old primary never got the chance to receive the ack, so it got a
timeout and got a NOREPLICAS error.

In this case, the replicas might automatically migrate to another primary,
resulting in the client being unblocked with the NOREPLICAS error. In this
case, since the configuration will eventually propagate itself, we can safely
ignore this error on the source node.

Signed-off-by: Binbin <[email protected]>
Signed-off-by: Ping Xie <[email protected]>
PingXie added a commit to PingXie/valkey that referenced this pull request Sep 14, 2024
…USTER SETSLOT (valkey-io#928)

This fixes valkey-io#899. In that issue, the primary is cluster-allow-replica-migration no
and its replica is cluster-allow-replica-migration yes.

And during the slot migration:
1. Primary calling blockClientForReplicaAck, waiting its replica.
2. Its replica reconfiguring itself as a replica of other shards due to
replica migration and disconnect from the old primary.
3. The old primary never got the chance to receive the ack, so it got a
timeout and got a NOREPLICAS error.

In this case, the replicas might automatically migrate to another primary,
resulting in the client being unblocked with the NOREPLICAS error. In this
case, since the configuration will eventually propagate itself, we can safely
ignore this error on the source node.

Signed-off-by: Binbin <[email protected]>
Signed-off-by: Ping Xie <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
release-notes This issue should get a line item in the release notes
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

[BUG] cluster rebalance --cluster-weight <node>=0 fails with clusterManagerMoveSlot: NOREPLICAS error
2 participants