Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: Resource version incorrectly overridden for wfInformer list requests. Fixes #11948 #12133

Merged
merged 1 commit into from
Nov 13, 2023

Conversation

terrytangyuan
Copy link
Member

@terrytangyuan terrytangyuan commented Nov 3, 2023

Potential fix for #11948

@@ -817,7 +817,7 @@ func (wfc *WorkflowController) enqueueWfFromPodLabel(obj interface{}) error {
return nil
}

func (wfc *WorkflowController) tweakListOptions(options *metav1.ListOptions) {
func (wfc *WorkflowController) tweakListRequestListOptions(options *metav1.ListOptions) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Possibly: keep them in step with

tweakWatchRequestListOptions(options)
options.ResourceVersion = ""

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have a preference here.

@terrytangyuan
Copy link
Member Author

terrytangyuan commented Nov 3, 2023

Pending test results confirmation from the community so I am converting this to draft: #11948 (comment)

@terrytangyuan terrytangyuan marked this pull request as draft November 3, 2023 15:28
@jessesuen
Copy link
Member

jessesuen commented Nov 3, 2023

I compared the watch API call in the controller logs from before the change v3.4.6:

GET [https://172.20.0.1:443/apis/argoproj.io/v1alpha1/workflows?allowWatchBookmarks=true&labelSelector=%21workflows.argoproj.io%2Fcontroller-instanceid&resourceVersion=238725385&timeoutSeconds=483&watch=true](https://172.20.0.1/apis/argoproj.io/v1alpha1/workflows?allowWatchBookmarks=true&labelSelector=%21workflows.argoproj.io%2Fcontroller-instanceid&resourceVersion=238725385&timeoutSeconds=483&watch=true) 200 OK

and after (v3.4.11):

GET [https://172.20.0.1:443/apis/argoproj.io/v1alpha1/workflows?allowWatchBookmarks=true&labelSelector=%21workflows.argoproj.io%2Fcontroller-instanceid&timeoutSeconds=407&watch=true](https://172.20.0.1/apis/argoproj.io/v1alpha1/workflows?allowWatchBookmarks=true&labelSelector=%21workflows.argoproj.io%2Fcontroller-instanceid&timeoutSeconds=407&watch=true) 200 OK

Before, in v3.4.6, K8s would set resourceVersion=238725385 to the watch. In v3.411 resourceVersion gets omitted entirely from the query parameters because of the previous change. I agree that this could cause a potential timing issue because kubernetes informer relies on continuing from a state of the world at a specific resourceVersion, but is no longer able to because we omit resourceVersion from the watch call.

So I think the fix may work.

@terrytangyuan terrytangyuan marked this pull request as ready for review November 8, 2023 02:52
@terrytangyuan terrytangyuan merged commit 222d53c into master Nov 13, 2023
42 checks passed
@terrytangyuan terrytangyuan deleted the dev-fix-informer branch November 13, 2023 20:25
jiachengxu pushed a commit to akuity/argo-workflows that referenced this pull request Nov 17, 2023
@agilgur5 agilgur5 added the area/controller Controller issues, panics label Dec 27, 2023
dpadhiar pushed a commit to dpadhiar/argo-workflows that referenced this pull request May 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/controller Controller issues, panics
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants