-
Notifications
You must be signed in to change notification settings - Fork 259
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
increase controller concurrency and cpu requests #2862
Conversation
This commit ups the cpu request for for all our installed compopents (cdi-deployment, cdi-apiserver, cdi-uploadproxy, cdi-operator) for 10m (1% of a core) to 100m (10% of a core). The main driver of this is BZ: 2216038. Without this change, it is pretty easy to create a large number of concurrent clone operations and get token timeout errors. Upping resource requests and concurrency addresses the issue in a very direct way. Signed-off-by: Michael Henriksen <[email protected]>
/lgtm |
delta := time.Since(syncState.dv.ObjectMeta.CreationTimestamp.Time) | ||
log.V(3).Info("Adding extended DataVolume token took", "delta", delta) | ||
} | ||
syncState.dv = syncState.dvMutated.DeepCopy() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Everything looks good but I'm wondering about this section, does this introduce new behavior? Should we test it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In my testing this appeared to reduce updateStatus failures/retries. The point is that once the dv has been updated the "baseline" and mutated DV are the same
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
weird, syncUpdate is usually the last thing we do in the loop, how come something else runs updates after it
EDIT:
nvm I see we updateStatus after it. makes sense.
But updateStatus seems to work on a fresh copy of the DV from the cache, which should have the updates..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point! But the fact is that the "fresh copy" from the cache is often stale (informer didn't receive update yet). Would be better if updatestatus actually uses the copy returned by update in that case. I swear that adding this assignment was reducing errors/retries. Seems unlikely but maybe the additional time to do the assignment gives more time for the informer to update.
Anyway, worst case this assignment has no actual effect. I can remove it if you'd like but it does seem more correct to me that "dv" and "dvMutated" should be equal at that point. The fact that neither value is referenced again is kind of beside the point. That may change in the future
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That may change in the future
That is my concern; if we ever have to look at dv/dvmutated diff to compute some status transition,
this may throw us off
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we should spend a lot of time wondering about hypotheticals. But what are you suggesting we do here @akalenyu?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I guess that's not very likely, I am fine with keeping it
/approve |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: akalenyu The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/cherry-pick release-v1.57 |
@mhenriks: new pull request created: #2867 In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
What this PR does / why we need it:
This commit ups the cpu request for for all our installed compopents (cdi-deployment, cdi-apiserver, cdi-uploadproxy, cdi-operator) for 10m (1% of a core) to 100m (10% of a core).
Also make all controllers handle 3 concurrent requests.
Without this change, it is pretty easy to create a large number of concurrent clone operations and get token timeout errors. Upping resource requests and concurrency addresses the issue in a very direct way.
I experimented with a bunch of other approaches including creating a controller just for refreshing tokens but this is the lowest touch (codewise) and most highly effective solution.
Which issue(s) this PR fixes (optional, in
fixes #<issue number>(, fixes #<issue_number>, ...)
format, will close the issue(s) when PR gets merged):Fixes # https://bugzilla.redhat.com/show_bug.cgi?id=2216038
Special notes for your reviewer:
Release note: