replication: reduce the reqeue time for GetReplicationInfo #263

Madhu-1 · 2022-11-07T13:31:30Z

Reduce the schedule time by half to get the latest update and also to avoid the inconsistency between the last sync time in
the VR and the Storage system.

The user can see updates for RPO that are not stuck in a bad schedule race, i.e, VolumeReplication checks and finds sync time as t-5m, and just after that storage system updates it to t+x. If we checked every 1/2 of the schedule we will update it to t+x in t+s/2

Signed-off-by: Madhu Rajanna [email protected]

Rakshith-R · 2022-11-07T13:46:08Z

controllers/replication.storage/volumereplication_controller.go

+	// Reduce the schedule time by half to get the latest update and also to
+	// avoid the inconsistency between the last sync time in the VR and the
+	// Storage system.
+	return scheduleTime / 2


Shouldn't this be done by the entity creating VRC in the first place?

I don't think it is logical to to add this logic at this level.

Please explain why you dont think this logic should not be added here. Please check the commit message for the proper explanation of why this is done.

Reduce the schedule time by half to get the latest update and also to avoid the inconsistency between the last sync time in
the VR and the Storage system.

SchedulingTime from the VRC implies the time interval between which we will schedule a re-queue to check for updates right ?
Can the reducing the time by half done at a above level ?

The user can see updates for RPO that are not stuck in a bad schedule race, i.e, VolumeReplication checks and finds sync time as t-5m, and just after that storage system updates it to t+x. If we checked every 1/2 of the schedule we will update it to t+x in t+s/2

By checking 1/2 of the schedule, we are not obeying the provided scheduling time at all?
Is there a reason that the entity handling this cannot create VRC with half of the required shedule?

SchedulingTime from the VRC implies the time interval between which we will schedule a re-queue to check for updates right ?
Can the reducing the time by half done at a above level ?

This is not true, This is the time specific to the storage, what is the interval to take the snapshot of the rbd image(for example ceph). we are here using the same time to Requeue to check the LastSyncTime/Update time of the replication.

By checking 1/2 of the schedule, we are not obeying the provided scheduling time at all?
Is there a reason that the entity handling this cannot create VRC with half of the required schedule?

Please Check the above explanation, the schedule interval is not meant for GetReplicationInfo Requeue time it's for the Storage Replication interval time, we are using the same time to get the updates on the replication time. if we keep both the same, there is a kind of diff between the VR LastSyncTime and the storage LastSyncTime. to keep both the same, we are decreasing the time by half so that we can keep the LastSyncTime and the storageLastSyncTime to almost the same.

okay,

https://github.com/csi-addons/kubernetes-csi-addons/blob/main/docs/volumereplicationclass.md
Can we add description here and how it used in parameters section passed to csi-driver
And for requeue time of reconcile
and mention it in function description too .

controllers/replication.storage/volumereplication_controller.go

ShyamsundarR · 2022-11-07T15:17:46Z

docs/volumereplicationclass.md

@@ -21,5 +21,8 @@ spec:
  parameters:
    replication.storage.openshift.io/replication-secret-name: secret-name
    replication.storage.openshift.io/replication-secret-namespace: secret-namespace
+    # This is storage vendor specific configuration. if present, half of the schedulingInterval


(nits)

VolumeReplication spell

(reword?) Will be used to make the requeue of the GetVolumeReplication Info to update the LastSyncTime in the VolumeReplicaiton CR to will be used to requeue VolumeReplication resource for lastSyncTime info updates

I would actually not state this at all... Just state something like: schedulingInterval is a vendor specific parameter. It is used to set the replication scheduling interval for storage volumes that are replication enabled using related VolumeReplication resource

Reduce the schedule time by half to get the latest update and also to avoid the inconsistency between the last sync time in the VR and the Storage system. The user can see updates for RPO that are not stuck in a bad schedule race i.e VR checks and finds sync time as t-5m and just after that storage system updates it to t+x. If we checked every 1/2 of schedule we will update it to t+x in t+s/2 Signed-off-by: Madhu Rajanna <[email protected]>

Rakshith-R · 2022-11-07T16:10:25Z

docs/volumereplicationclass.md

@@ -21,5 +21,8 @@ spec:
  parameters:
    replication.storage.openshift.io/replication-secret-name: secret-name
    replication.storage.openshift.io/replication-secret-namespace: secret-namespace
+    # schedulingInterval is a vendor specific parameter. It is used to set the
+    # replication scheduling interval for storage volumes that are replication
+    # enabled using related VolumeReplication resource


Add a line about re queueing of VR if specified too?

mergify bot requested review from nixpanic, Rakshith-R, yati1998 and Yuggupta27 November 7, 2022 13:32

Rakshith-R requested changes Nov 7, 2022

View reviewed changes

Madhu-1 force-pushed the improve-lastUpdateTime branch from 7cb01f6 to 3342712 Compare November 7, 2022 13:46

Madhu-1 requested a review from ShyamsundarR November 7, 2022 13:48

Madhu-1 force-pushed the improve-lastUpdateTime branch from 3342712 to 5d909ac Compare November 7, 2022 14:16

Madhu-1 requested a review from Rakshith-R November 7, 2022 14:17

ShyamsundarR reviewed Nov 7, 2022

View reviewed changes

controllers/replication.storage/volumereplication_controller.go Show resolved Hide resolved

controllers/replication.storage/volumereplication_controller.go Show resolved Hide resolved

Madhu-1 force-pushed the improve-lastUpdateTime branch from 5d909ac to 76add31 Compare November 7, 2022 14:29

Madhu-1 requested a review from ShyamsundarR November 7, 2022 14:29

Madhu-1 force-pushed the improve-lastUpdateTime branch from 76add31 to b176a51 Compare November 7, 2022 14:49

ShyamsundarR reviewed Nov 7, 2022

View reviewed changes

Madhu-1 force-pushed the improve-lastUpdateTime branch from b176a51 to 54dd513 Compare November 7, 2022 15:42

Madhu-1 requested a review from ShyamsundarR November 7, 2022 15:42

ShyamsundarR approved these changes Nov 7, 2022

View reviewed changes

Rakshith-R approved these changes Nov 7, 2022

View reviewed changes

mergify bot merged commit 8b40a09 into csi-addons:main Nov 7, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

replication: reduce the reqeue time for GetReplicationInfo #263

replication: reduce the reqeue time for GetReplicationInfo #263

Madhu-1 commented Nov 7, 2022

Rakshith-R Nov 7, 2022

Madhu-1 Nov 7, 2022

Rakshith-R Nov 7, 2022

Madhu-1 Nov 7, 2022

Rakshith-R Nov 7, 2022

Madhu-1 Nov 7, 2022

ShyamsundarR Nov 7, 2022

Rakshith-R Nov 7, 2022

replication: reduce the reqeue time for GetReplicationInfo #263

replication: reduce the reqeue time for GetReplicationInfo #263

Conversation

Madhu-1 commented Nov 7, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment