-
Notifications
You must be signed in to change notification settings - Fork 4k
Description
Describe the problem
We have observed replicas on a decommissioning node (<3) never getting replaced by the corresponding range leaseholder node, over 80 minutes.
The stall was resolved by manually enqueueing the blocking ranges into the leaseholder's replicate queue.
This is surprising, as the replica scanner should check each replica once every 10 minutes, against all store replica queues. By manually enqueueing the ranges via the advanced debug page, without skipping the should queue check, it demonstrates that if the scanner is calling shouldQueue on the replica, it would have been enqueued. There is no indication that this occurred.
To Reproduce
Attempts at reproducing the issue haven't been successful so far. The methods tested are shown below.
Details
Setup the cluster and run the workload:
export cluster=austen-decom-repro
roachprod create $cluster -n 41
roachprod put $cluster ./artifacts/cockroach cockroach
roachprod start $cluster:1-40
rp sql $cluster:1 -- -e 'CREATE DATABASE kv2'
rp sql $cluster:1 -- -e 'ALTER RANGE default CONFIGURE ZONE USING num_replicas = 5'
rp sql $cluster:1 -- -e 'ALTER DATABASE kv2 CONFIGURE ZONE USING num_replicas = 5'
rp run $cluster:1 -- './cockroach workload run kv --init --splits=6000 --min-block-bytes=16384 --max-block-bytes=16384 --insert-count=100000000 --max-rate=100 {pgurl:1}'
Chain decomissions:
for iteration in $(seq 1 10000); do
echo "Starting iteration $iteration"
# Loop through nodes
for node in $(seq 20 40); do
echo "Processing node $node (Iteration $iteration)"
# Run drain command
rp run $cluster:$node -- './cockroach node drain --insecure --self'
# Run decommission command
rp run $cluster:$node -- './cockroach node decommission --insecure --self'
# Wipe the node
rp wipe $cluster:$node
# Put artifacts
rp put $cluster:$node ./artifacts/cockroach
# Start the node
rp start $cluster:$node
echo "Finished processing node $node (Iteration $iteration)"
echo "------------------------"
done
echo "Finished iteration $iteration"
echo "========================"
doneRestart a node every 3 minutes:
for iteration in $(seq 1 10000); do
# Loop through nodes
for node in $(seq 2 19); do
echo "Restarting node $node (Iteration $iteration)"
# Restart the node
rp stop $cluster:$node
# Start the node
rp start $cluster:$node
sleep 180
echo "Finished restarting node $node (Iteration $iteration)"
echo "------------------------"
done
echo "Finished iteration $iteration"
echo "========================"
doneExpected behavior
The replica scanner enqueues decommissioning ranges into the replicate queue at minimum every replica_count * 100ms (min scanner interval) duration, or 10 minutes, whichever is greater.
Decommissioning doesn't stall due to this.
Environment:
- CockroachDB version
v23.1.22, however it may affect other versions, it has only occurred on av23.1.22cluster
Additional context
Manual intervention is required to complete a decommission.
Jira issue: CRDB-41920