Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

replication: reduce the reqeue time for GetReplicationInfo #263

Merged
merged 1 commit into from
Nov 7, 2022

Conversation

Madhu-1
Copy link
Member

@Madhu-1 Madhu-1 commented Nov 7, 2022

Reduce the schedule time by half to get the latest update and also to avoid the inconsistency between the last sync time in
the VR and the Storage system.

The user can see updates for RPO that are not stuck in a bad schedule race, i.e, VolumeReplication checks and finds sync time as t-5m, and just after that storage system updates it to t+x. If we checked every 1/2 of the schedule we will update it to t+x in t+s/2

Signed-off-by: Madhu Rajanna [email protected]

Comment on lines +424 to +436
// Reduce the schedule time by half to get the latest update and also to
// avoid the inconsistency between the last sync time in the VR and the
// Storage system.
return scheduleTime / 2
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this be done by the entity creating VRC in the first place?

I don't think it is logical to to add this logic at this level.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please explain why you dont think this logic should not be added here. Please check the commit message for the proper explanation of why this is done.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reduce the schedule time by half to get the latest update and also to avoid the inconsistency between the last sync time in
the VR and the Storage system.

SchedulingTime from the VRC implies the time interval between which we will schedule a re-queue to check for updates right ?
Can the reducing the time by half done at a above level ?

The user can see updates for RPO that are not stuck in a bad schedule race, i.e, VolumeReplication checks and finds sync time as t-5m, and just after that storage system updates it to t+x. If we checked every 1/2 of the schedule we will update it to t+x in t+s/2

By checking 1/2 of the schedule, we are not obeying the provided scheduling time at all?
Is there a reason that the entity handling this cannot create VRC with half of the required shedule?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SchedulingTime from the VRC implies the time interval between which we will schedule a re-queue to check for updates right ?
Can the reducing the time by half done at a above level ?

This is not true, This is the time specific to the storage, what is the interval to take the snapshot of the rbd image(for example ceph). we are here using the same time to Requeue to check the LastSyncTime/Update time of the replication.

By checking 1/2 of the schedule, we are not obeying the provided scheduling time at all?
Is there a reason that the entity handling this cannot create VRC with half of the required schedule?

Please Check the above explanation, the schedule interval is not meant for GetReplicationInfo Requeue time it's for the Storage Replication interval time, we are using the same time to get the updates on the replication time. if we keep both the same, there is a kind of diff between the VR LastSyncTime and the storage LastSyncTime. to keep both the same, we are decreasing the time by half so that we can keep the LastSyncTime and the storageLastSyncTime to almost the same.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

okay,

https://github.com/csi-addons/kubernetes-csi-addons/blob/main/docs/volumereplicationclass.md
Can we add description here and how it used in parameters section passed to csi-driver
And for requeue time of reconcile
and mention it in function description too .

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done, ptal

@Madhu-1 Madhu-1 force-pushed the improve-lastUpdateTime branch from 7cb01f6 to 3342712 Compare November 7, 2022 13:46
@Madhu-1 Madhu-1 requested a review from ShyamsundarR November 7, 2022 13:48
@Madhu-1 Madhu-1 force-pushed the improve-lastUpdateTime branch from 3342712 to 5d909ac Compare November 7, 2022 14:16
@Madhu-1 Madhu-1 requested a review from Rakshith-R November 7, 2022 14:17
@Madhu-1 Madhu-1 force-pushed the improve-lastUpdateTime branch from 5d909ac to 76add31 Compare November 7, 2022 14:29
@Madhu-1 Madhu-1 requested a review from ShyamsundarR November 7, 2022 14:29
@Madhu-1 Madhu-1 force-pushed the improve-lastUpdateTime branch from 76add31 to b176a51 Compare November 7, 2022 14:49
@@ -21,5 +21,8 @@ spec:
parameters:
replication.storage.openshift.io/replication-secret-name: secret-name
replication.storage.openshift.io/replication-secret-namespace: secret-namespace
# This is storage vendor specific configuration. if present, half of the schedulingInterval

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(nits)

  • VolumeReplication spell
  • (reword?) Will be used to make the requeue of the GetVolumeReplication Info to update the LastSyncTime in the VolumeReplicaiton CR to will be used to requeue VolumeReplication resource for lastSyncTime info updates
    • I would actually not state this at all... Just state something like: schedulingInterval is a vendor specific parameter. It is used to set the replication scheduling interval for storage volumes that are replication enabled using related VolumeReplication resource

Reduce the schedule time by half to get the
latest update and also to avoid the
inconsistency between the last sync time in
the VR and the Storage system.

The user can see updates for RPO that are not
stuck in a bad schedule race i.e VR checks
and finds sync time as t-5m and just after
that storage system updates it to t+x.
If we checked every 1/2 of schedule we will
update it to t+x in t+s/2

Signed-off-by: Madhu Rajanna <[email protected]>
@Madhu-1 Madhu-1 force-pushed the improve-lastUpdateTime branch from b176a51 to 54dd513 Compare November 7, 2022 15:42
@Madhu-1 Madhu-1 requested a review from ShyamsundarR November 7, 2022 15:42
@@ -21,5 +21,8 @@ spec:
parameters:
replication.storage.openshift.io/replication-secret-name: secret-name
replication.storage.openshift.io/replication-secret-namespace: secret-namespace
# schedulingInterval is a vendor specific parameter. It is used to set the
# replication scheduling interval for storage volumes that are replication
# enabled using related VolumeReplication resource
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a line about re queueing of VR if specified too?

@mergify mergify bot merged commit 8b40a09 into csi-addons:main Nov 7, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants