Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: update failure in requeue #245

Merged

Conversation

kerthcet
Copy link
Contributor

@kerthcet kerthcet commented May 3, 2022

Signed-off-by: kerthcet [email protected]

What type of PR is this?

/kind bug

What this PR does / why we need it:

Which issue(s) this PR fixes:

Part of #241

Special notes for your reviewer:

@k8s-ci-robot k8s-ci-robot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. kind/bug Categorizes issue or PR as related to a bug. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels May 3, 2022
@kerthcet kerthcet marked this pull request as draft May 3, 2022 08:52
@k8s-ci-robot k8s-ci-robot added the size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. label May 3, 2022
@kerthcet
Copy link
Contributor Author

kerthcet commented May 3, 2022

/test pull-kueue-test-integration-main

3 similar comments
@kerthcet
Copy link
Contributor Author

kerthcet commented May 3, 2022

/test pull-kueue-test-integration-main

@kerthcet
Copy link
Contributor Author

kerthcet commented May 3, 2022

/test pull-kueue-test-integration-main

@kerthcet
Copy link
Contributor Author

kerthcet commented May 3, 2022

/test pull-kueue-test-integration-main

@kerthcet kerthcet changed the title [WIP][DO-NOT-REVIEW]fix: update failure in requeue fix: update failure in requeue May 3, 2022
@kerthcet kerthcet marked this pull request as ready for review May 3, 2022 10:57
@k8s-ci-robot k8s-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label May 3, 2022
@kerthcet kerthcet force-pushed the bug/udpate-failure-in-requeue branch from dba9d51 to 542f8fa Compare May 3, 2022 11:12
@@ -300,13 +300,14 @@ func (m *Manager) RequeueWorkload(ctx context.Context, info *workload.Info, imme
return false
}

q.AddIfNotPresent(info)
newInfo := workload.NewInfo(&w)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you explain why you think this solves the problem?

If there was a change in the object, the event would update the object, and AddIfNotPresent would prevent it to be added again.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Refer to the log, we are always failing with update operation, this must be the reason that we stored the old version workload. So we should ensure to store the latest workload. The reason might be that the failed api update with the old version requeue earlier than the newest one, then we stored the old ones. We can also find some clues from the log:
image

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But that screenshot suggests that it wasn't actually added. So I'm not sure if this solves the issue or maybe it obscures a different one.

What's the link for this failure? I want to look at the entire logs.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

image

From the context, we know that the `assume-1` is return from the old version, it requeued successfully, the `assume-2` is from the new version, it requeued failed for old workload has already requeued.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, we are actually rejected the newer version. Thanks for the extra explanation.

Copy link
Contributor

@alculquicondor alculquicondor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add unit tests?

@@ -300,13 +300,14 @@ func (m *Manager) RequeueWorkload(ctx context.Context, info *workload.Info, imme
return false
}

q.AddIfNotPresent(info)
newInfo := workload.NewInfo(&w)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, we are actually rejected the newer version. Thanks for the extra explanation.

cq := m.clusterQueues[q.ClusterQueue]
if cq == nil {
return false
}

added := cq.RequeueIfNotPresent(info, immediate)
added := cq.RequeueIfNotPresent(newInfo, immediate)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we change this for just Requeue instead of RequeueIfNotPresent?

It might not be necessary, as one of the routines trying to requeue will have the chance to push the newest version.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sg.

@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label May 4, 2022
@kerthcet kerthcet force-pushed the bug/udpate-failure-in-requeue branch from 542f8fa to 16bdd92 Compare May 5, 2022 07:51
@k8s-ci-robot k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. labels May 5, 2022
@kerthcet
Copy link
Contributor Author

kerthcet commented May 5, 2022

address the comments. @alculquicondor

Copy link
Contributor

@alculquicondor alculquicondor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please squash.

@alculquicondor
Copy link
Contributor

/approve

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: alculquicondor, kerthcet

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label May 6, 2022
@kerthcet kerthcet force-pushed the bug/udpate-failure-in-requeue branch from ea8535b to 2d4812b Compare May 6, 2022 13:59
@kerthcet
Copy link
Contributor Author

kerthcet commented May 6, 2022

The #248 bug occurs, I'll take a spike.

@alculquicondor
Copy link
Contributor

/hold

could it be related to this PR? Have you seen the same error for other PRs?

@k8s-ci-robot k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label May 6, 2022
@kerthcet
Copy link
Contributor Author

kerthcet commented May 6, 2022

I think it's not related. I used to met this error for multi times in my local dev env. I think we can ignore the error in the pr.

Copy link
Contributor

@alculquicondor alculquicondor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm
/hold cancel

@k8s-ci-robot k8s-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label May 6, 2022
@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label May 6, 2022
@kerthcet
Copy link
Contributor Author

kerthcet commented May 6, 2022

/retest

@k8s-ci-robot k8s-ci-robot merged commit b515c85 into kubernetes-sigs:main May 6, 2022
@kerthcet kerthcet deleted the bug/udpate-failure-in-requeue branch May 6, 2022 16:17
k8s-ci-robot added a commit that referenced this pull request Jun 9, 2022
…pstream-release-0.1

Automated cherry pick of #245: fix: always requeue with the latest object
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/bug Categorizes issue or PR as related to a bug. lgtm "Looks good to me", indicates that a PR is ready to be merged. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants