fix: workflow DeleteFunc use Index data to improve processing speed(#14791)#14875
fix: workflow DeleteFunc use Index data to improve processing speed(#14791)#14875guanguxiansheng wants to merge 1 commit intoargoproj:mainfrom
Conversation
|
@guanguxiansheng We encountered similar performance issues when migrating from version 3.5 to 3.6, and referencing this transformation resolved the problem for us as well. It looks like this deletion operation introduces significant overhead and slows down the event handler loop. |
|
Indeed, this is a good discovery. In the design of the infromer, any time-consuming operation should not be in the event handle, but in the workqueue. |
|
Getting pods from the informer does speed things up, but there's a problem: under high-load scenarios, the local informer might become inconsistent with the remote API server. This operation can't tolerate this inconsistency((For example, during reconceil, it's possible to tolerate a late update of the pod informer.), potentially leading to pod leaks due to undeleting finalizers. |
workflow/controller/controller.go
Outdated
| wf := obj.(*unstructured.Unstructured) | ||
| podObjs, err := wfc.PodController.GetPodsByIndex(indexes.WorkflowIndex, indexes.WorkflowIndexValue(wf.GetNamespace(), wf.GetName())) | ||
| if err != nil { | ||
| logger.WithError(err).Error(ctx, "Failed to list pods") | ||
| logger.WithError(err).Error(ctx, "Failed to get pods by index") | ||
| } | ||
| for _, p := range podList.Items { | ||
| for _, podObj := range podObjs { |
There was a problem hiding this comment.
This is fine, should we maybe also add a backup in case it isn't in the index?
|
To solve the problem of incomplete index data, there are two ideas: 1. In the |
I prefer to move this loop outside of the event handler, to the workqueue. However, I think this approach is also acceptable, as large and multiple workflows generally do not coexist in a cluster. |
|
Important Review skippedAuto reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the You can disable this status message by setting the ✨ Finishing touches🧪 Generate unit tests (beta)
Comment |
|
@shuangkun @isubasinghe hi, I chose to get pod information from |
d1bf16e to
589899b
Compare
…essing speed (argoproj#14791) Made-with: Cursor Signed-off-by: guanguxiansheng <1439425373@qq.com> Made-with: Cursor
Fixes #14791
Motivation
Try to fix #14791
Some performance issues were encountered during production. Controller processing performance degrades significantly when running workflows at scale. Adding a throttling queue doesn't significantly improve speed.
Modifications
Modify
DeleteFuncincontroller.goUse Index data instead of directly requesting Apiserver to improve processing speed
Documentation
In
shared_informer.go, a single coroutine traversesnextChand executesAddFunc,UpdateFunc, andDeleteFunc. InDeleteFunc, if thepods.Listrequest is slow, then the next element ofnextChwill be delayed, which will naturally blockAddFuncandUpdateFunc.