-
Notifications
You must be signed in to change notification settings - Fork 301
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: update failure in requeue #245
fix: update failure in requeue #245
Conversation
/test pull-kueue-test-integration-main |
3 similar comments
/test pull-kueue-test-integration-main |
/test pull-kueue-test-integration-main |
/test pull-kueue-test-integration-main |
dba9d51
to
542f8fa
Compare
pkg/queue/manager.go
Outdated
@@ -300,13 +300,14 @@ func (m *Manager) RequeueWorkload(ctx context.Context, info *workload.Info, imme | |||
return false | |||
} | |||
|
|||
q.AddIfNotPresent(info) | |||
newInfo := workload.NewInfo(&w) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you explain why you think this solves the problem?
If there was a change in the object, the event would update the object, and AddIfNotPresent
would prevent it to be added again.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Refer to the log, we are always failing with update operation, this must be the reason that we stored the old version workload. So we should ensure to store the latest workload. The reason might be that the failed api update with the old version requeue earlier than the newest one, then we stored the old ones. We can also find some clues from the log:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But that screenshot suggests that it wasn't actually added. So I'm not sure if this solves the issue or maybe it obscures a different one.
What's the link for this failure? I want to look at the entire logs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, we are actually rejected the newer version. Thanks for the extra explanation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add unit tests?
pkg/queue/manager.go
Outdated
@@ -300,13 +300,14 @@ func (m *Manager) RequeueWorkload(ctx context.Context, info *workload.Info, imme | |||
return false | |||
} | |||
|
|||
q.AddIfNotPresent(info) | |||
newInfo := workload.NewInfo(&w) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, we are actually rejected the newer version. Thanks for the extra explanation.
pkg/queue/manager.go
Outdated
cq := m.clusterQueues[q.ClusterQueue] | ||
if cq == nil { | ||
return false | ||
} | ||
|
||
added := cq.RequeueIfNotPresent(info, immediate) | ||
added := cq.RequeueIfNotPresent(newInfo, immediate) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we change this for just Requeue
instead of RequeueIfNotPresent
?
It might not be necessary, as one of the routines trying to requeue will have the chance to push the newest version.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sg.
542f8fa
to
16bdd92
Compare
address the comments. @alculquicondor |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please squash.
/approve |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: alculquicondor, kerthcet The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Signed-off-by: kerthcet <[email protected]>
ea8535b
to
2d4812b
Compare
The #248 bug occurs, I'll take a spike. |
/hold could it be related to this PR? Have you seen the same error for other PRs? |
I think it's not related. I used to met this error for multi times in my local dev env. I think we can ignore the error in the pr. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
/hold cancel
/retest |
…pstream-release-0.1 Automated cherry pick of #245: fix: always requeue with the latest object
Signed-off-by: kerthcet [email protected]
What type of PR is this?
/kind bug
What this PR does / why we need it:
Which issue(s) this PR fixes:
Part of #241
Special notes for your reviewer: