-
Notifications
You must be signed in to change notification settings - Fork 37
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CFE-1131: AWS Tags DAY2 Update #297
base: master
Are you sure you want to change the base?
CFE-1131: AWS Tags DAY2 Update #297
Conversation
@anirudhAgniRedhat: This pull request references CFE-1131 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.18.0" version, but no target version was set. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
/hold |
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: anirudhAgniRedhat The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
4d92e01
to
14ecc0e
Compare
@anirudhAgniRedhat: This pull request references CFE-1131 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.18.0" version, but no target version was set. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
d68d71a
to
44e34e7
Compare
/unhold |
|
||
go ebsTagsController.Run(ctx) | ||
|
||
klog.Info("EBS Volume Tag Controller is running") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: TBD - cleanup info logs before merge.
klog.Infof("Updating EBS tags for volume ID %s with tags: %v", volumeID, mergedTags) | ||
|
||
// Create or update the tags | ||
_, err = ec2Client.CreateTags(&ec2.CreateTagsInput{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the behaviour when no.of tags > 50?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @TrilokGeer I tested the scenario the AWS client gives a bad request error which states Error updating tags for volume vol-08d7a36e4b077873f: TagLimitExceeded: More than 50 tags specified status code: 400, request id: 364561fd-6319-40e2-b3d1-ff110954e73c
We are not reconciling again on this request. What else you think we should do other than this??
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
An error event or message must be set on the operator status to indicate the error.
/cc @jsafrane |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please update cluster-storage-operator to add token-minter sidecar.
if err != nil { | ||
klog.Errorf("Error updating tags for volume %s: %v", volumeID, err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There should be retry with exp. backoff. Esp. when CreateTags calls are throttled by AWS.
That probably implies a queue of PVs.
146634c
to
858b442
Compare
infraInformer := c.commonClient.ConfigInformers.Config().V1().Infrastructures().Informer() | ||
|
||
// Add event handler to process updates only when ResourceTags change | ||
infraInformer.AddEventHandler(cache.ResourceEventHandlerFuncs{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't have do this. If we are using factory.WithInformers
, we will reconcile this within Sync
function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @gnufied I was bit confused with this!! basically I don't want to run reconciliation on every change on InfraStructure resource. I need to run reconciliation only if there is a any change in infra.Status.PlatformStatus.AWS.ResourceTags
.
Can you suggest a better way to do this so that we can remove the unnecessarily computes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How will this work when controller is restarted? You are also doing WithInformers
below and hence any change in infra
object will still trigger Sync
.
So - most bulletproof way of ensuring that, we don't unnecessarily process all PVs is to store the information that we have processed these PVs somewhere in a persistent way.
So, what we have currently is worst of both the worlds. If I were to design this, I will probably make a hash of sorted tags and annotate PV with tag hash. If tag hash annotation in PV and computed tag hash don't change, then I will not update the PV or else I will.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How will this work when controller is restarted? You are also doing
WithInformers
below and hence any change ininfra
object will still triggerSync
.
What I am thinking is on every restart I would like to run a sync function and update the volumes if there is a change and further On each change in resource-tags we would again run the sync.
So, what we have currently is worst of both the worlds. If I were to design this, I will probably make a hash of sorted tags and annotate PV with tag hash. If tag hash annotation in PV and computed tag hash don't change, then I will not update the PV or else I will.
This looks easy way to manage this!! I will then add a new field in controller struct which will have the updated hash for the sorted map of tags! now in each reconciliation I will update tags only if the hash is different from the other one? Does this sounds better to you??
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But that is not what I said. I said, we should store hash of sorted tags in PV objects as annotation and compare those with current tags we are about to apply. We should only apply tags with AWS if hashes change.
Storing them just in-memory doesn't help us much.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added Thanks For Suggestion!!
858b442
to
fe7b17e
Compare
if err != nil { | ||
return err | ||
} | ||
err = c.processInfrastructure(infra, ec2Client) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, does this controller needs to be an opt-in or a default controller? I don't assume every OCP customer wants this feature and if tagging were to fail after OCP upgrade, their clusters will be degraded and we will have support nightmare.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cc @jsafrane
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is presented as enabled by default + no opt out for all HyperShift clusters in the enhancement.
} | ||
|
||
// startFailedQueueWorker runs a worker that processes failed batches independently | ||
func (c *EBSVolumeTagsController) startFailedQueueWorker(ctx context.Context) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure if we need this function at all.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So the reason I brought this change is I would like to retry the the batches which have failed to update tags!! Here I would like to update tags in a serial order(one By one) Discussuion link for the volumes. I cannot use the similar sync function here or else you could suggest a better way for this!!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we are going to per-pv hash, then we are not going to need this right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The points I want to make sure here are
- I definitely need to batch volumes in order to not to hit throttling condition.
- Now on failure we would need to retry!!
- Since we are using batch APIs, so AWS SDK's APIs will give us error if any one of the VolumeID in the batch hits the any error(May be validation, Auth, Permission etc), All the VolumeIDs in that batch will not be able to update the tags! In this condition I would like to add a worker queue that will handle the the serial update of PV's tags and will retry to update the tags in exponential back-off time-period.
Since We need to figure out the trade-off between the either in first place we should update all PVs using AWS API's in serial order and retry using similar sync function or should we use Batch Volumes to be called in the sync Functions and later retry should be processed with the queue function serially until the queue is empty!!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I definitely need to batch volumes in order to not to hit throttling condition.
That is fine. You are already batching PVs in fetchPVsAndUpdateTags
.
Now on failure we would need to retry!!
We are unnecessarily complicating this. The controller is going to resync
every 20 minutes anyways, so it will try to tag all the PVs which aren't tagged. So do we even need to keep separate queue for failed PVs? I am also afraid that, your failed worker queue is going to race with normal controller resync.
What is the point of separate failed worker queue when every 20 minutes, we are going to try and sync tag for all PVs which doesn't have matching tag-hash? If you really want a separate worker queue, you will have to redesign the whole thing, so as at least they are not racy. But I don't really understand point of doing it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @gnufied I completely agree that this retrying the failure is unnecessarily making the changes complicated.
@jsafrane I had a chat with @TrilokGeer regarding this, IMHO we should drop the idea to add degraded condition on tags update failure. As the cluster should not be degraded for failure to apply Tags. as the sync will anyhow retry to tag the volumes withing resync period, in this way we will not immediately require to retry the failures and can remove this queue worker.
Also anyways we are emitting the warning events from the this controller if the tags update is failed. so user will know that the tags update has been failed due the the certain reason also can think of alerts based on that.
/cc @TrilokGeer
Can you also put your views on this!!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BTW are we planning to backport this PR to older releases?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@gnufied I guess we would need to backport this to 4.17. Slack Thread.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So I noticed that you removed failed worker logic. The thing is - I was talking to @jsafrane offline and he has me convinced that, we do need some kind of additional logic so as we can try retagging of "failed" PVs one-by-one, rather than in a batch. This will ensure that, one bad apple in a batch doesn't prevent tagging of rest of the PVs.
But - we need to be careful when doing this.
- We should make sure that, PVs which will be tagged via failed worker, doesn't get processed via regular controller resync (so no race).
- I would move the entire failed worker code in a separate file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ack;
I believe we can remove the ResyncEvery
parameter from the controller builder as this is a overkill for us why do we want to run the sync function in every 20 mins if nothing has changed in the resource Tags.
Alternatively, If you really think resync is important here then, I think that we can add the handle the race condition by using another annotation in PVs for Tagging status and filtering based on tagshash and status in the annotation. But this will also cost us some volume Update calls and further need to handle cases where we are not able to update the status within batches. WDYT?
a9b8b54
to
42579da
Compare
/retest |
failedQueue: workqueue.NewRateLimitingQueue(workqueue.DefaultControllerRateLimiter()), | ||
} | ||
|
||
go c.startFailedQueueWorker(ctx) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Problem with starting a worker here is, we are not waiting for caches to be synced before starting stuff. We shouldn't be starting workers at this stage.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have removed the worker from here and we are initializing the worker in postStartHook after the caches are synced and controller is started.
e70bfeb
to
c0e7dd8
Compare
c1ae14d
to
c62693a
Compare
c62693a
to
869a428
Compare
Added PR to add token minter sidecar PTAL! |
/retest |
} | ||
|
||
// getInfrastructure retrieves the Infrastructure resource in OpenShift | ||
func (c *EBSVolumeTagsController) getInfrastructure(ctx context.Context) (*configv1.Infrastructure, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why are we reading an object from informer with exponential backoff? See how we do this in other operators - https://github.com/openshift/vmware-vsphere-csi-driver-operator/blob/master/pkg/operator/vspherecontroller/vspherecontroller.go#L167
if infra.Status.PlatformStatus != nil && infra.Status.PlatformStatus.AWS != nil { | ||
infraRegion = infra.Status.PlatformStatus.AWS.Region | ||
} | ||
ec2Client, err := c.getEC2Client(ctx, infraRegion) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO, we shouldn't create ec2Client
here until we know we are going to need to tag PV objects. Since we are planning to backport this feature, for most infra objects, there may not even be any tags that needded to be synced on AWS. So, this controller will be a NO-OP for many clusters.
This means at very minimum, I would check if infra
object has non-nil infra.Status.PlatformStatus.AWS.ResourceTags
, before going ahead with ec2 client creation.
return nil | ||
} | ||
|
||
func (c *EBSVolumeTagsController) listPersistentVolumesWithRetry(ctx context.Context) ([]*v1.PersistentVolume, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same comment as before. There is no need to list PV objects from a informer with an exponential backoff. Informer has cached objects, it isn't supposed to fail.
func newAndUpdatedTags(resourceTags []configv1.AWSResourceTag) []*ec2.Tag { | ||
// Convert map back to slice of ec2.Tag | ||
var tags []*ec2.Tag | ||
for _, tag := range resourceTags { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am confused. The function definition says - it is supposed to "merge" tags, but all I see is, this function returns configv1.AWSResourceTag
as ec2.Tag
objects. I do not see any "merge" happening.
func (c *EBSVolumeTagsController) handleTagUpdateFailure(batch []*v1.PersistentVolume, error error) { | ||
errorMessage := fmt.Sprintf("Error updating tags for volume %v: %v", batch, error) | ||
for _, pv := range batch { | ||
klog.Errorf("Error updating volume %v tags: %v", pv.Name, errorMessage) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so, I do not think dumping full PV objects in error log is a good idea here. This will make logs very noisy. Also, didn't we add full PV object already in error
object?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please do not use error
as a variable name, since error
is also an interface. This confuses tools.
}) | ||
|
||
if err != nil { | ||
return fmt.Errorf("error creating tags for volume %v: %v", pvBatch, err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Aren't we appending full PV objects again and again to same error message?
} | ||
|
||
// Set or update the tag hash annotation | ||
pv.Annotations[tagHashAnnotationKey] = hash |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should get DeepCopy
before modifying object we retrieved from informer. We do not want to modify object we retrieved from informers directly.
// Update the PV with the new annotations | ||
err = c.updateVolumeWithRetry(ctx, volume) | ||
if err != nil { | ||
klog.Errorf("Error updating PV annotations for volume %s: %v", volume.Name, err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we are already going to handle these PVs via failedWorkerQueue
, why does the code in updateVOlumeWithRetry
needs to have its own local exponential backoff?
return nil | ||
} | ||
|
||
// Process the volumes in batches |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mentioned in my original comment that, the main controller should not try and update PVs which will be processed by failed worker queue. I do not see that scenairo handled here.
return errors.New("context canceled, stopping failed queue worker for EBS Volume Tags") // Stop the goroutine when the context is canceled | ||
default: | ||
// Get the next failed volume from the queue | ||
item, quit := c.failedQueue.Get() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This code is very hard to read, can you split this into multiple smaller functions please.
@anirudhAgniRedhat: The following tests failed, say
Full PR test history. Your PR dashboard. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
This PR introduces a custom EBSVolumeTagController that monitors the OpenShift Infrastructure resource for changes in AWS ResourceTags. When tags are updated, the controller automatically fetches all AWS EBS-backed PersistentVolumes (PVs) in the cluster, retrieves their volume IDs, and updates the associated EBS tags in AWS.
Key Changes:
Monitors Infrastructure resource for AWS ResourceTags updates.
Directly fetches all PVs using the AWS EBS CSI driver (ebs.csi.aws.com).
Updates AWS EBS tags by merging new and existing tags using the AWS SDK.