You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We are experiencing significant delays when deleting Routes in bulk. The problem becomes evident when the number of Routes in the cluster exceeds 1w+. During this process, we observe noticeable latency and a clear spike in the etcd monitoring.
Upon reviewing the code, I found the following segment which seems to be causing the issue:
// TODO: Maintain a reference count for each object without having to poll each time
func (u *upstreamClient) deleteCheck(ctx context.Context, obj *v1.Upstream) (bool, error) {
routes, _ := u.cluster.route.List(ctx)
sroutes, _ := u.cluster.cache.ListStreamRoutes()
if routes == nil && sroutes == nil {
return true, nil
}
for _, route := range routes {
if route.UpstreamId == obj.ID {
return false, fmt.Errorf("can not delete this upstream, route.id=%s is still using it now", route.ID)
}
}
for _, sroute := range sroutes {
if sroute.UpstreamId == obj.ID {
return false, fmt.Errorf("can not delete this upstream, stream_route.id=%s is still using it now", sroute.ID)
}
}
return true, nil
}
The line routes, _ := u.cluster.route.List(ctx)
causes the code to iterate through all routes in the cluster during every deletion. This results in unnecessary overhead.
Additionally, the latency spikes observed in etcd monitoring are caused by the etcd range calls made during the deletion of Routes.
I would like to understand why the code is fetching the routes from the cluster every time rather than from the cache. Was this a design decision or is it an oversight?
Logs
No response
Steps to Reproduce
install apisix and apisix ingress controller
route crd volume > 1w+
delete route crd
Environment
APISIX Ingress Controller Version:
We are using a self-developed version of the APISIX Ingress Controller based on an older release, which differs from the latest official versions. However, the code for deleting Routes, which is causing the performance issue, remains consistent with the official versions.
Kubernetes Cluster Version: v1.24.4.
OS Version: CentOS 7.6 x86
The text was updated successfully, but these errors were encountered:
Issue Faced
We are experiencing significant delays when deleting Routes in bulk. The problem becomes evident when the number of Routes in the cluster exceeds 1w+. During this process, we observe noticeable latency and a clear spike in the etcd monitoring.
Upon reviewing the code, I found the following segment which seems to be causing the issue:
The line
Additionally, the latency spikes observed in etcd monitoring are caused by the etcd range calls made during the deletion of Routes.routes, _ := u.cluster.route.List(ctx)
causes the code to iterate through all routes in the cluster during every deletion. This results in unnecessary overhead.
I would like to understand why the code is fetching the routes from the cluster every time rather than from the cache. Was this a design decision or is it an oversight?
Logs
No response
Steps to Reproduce
Environment
APISIX Ingress Controller Version:
We are using a self-developed version of the APISIX Ingress Controller based on an older release, which differs from the latest official versions. However, the code for deleting Routes, which is causing the performance issue, remains consistent with the official versions.
Kubernetes Cluster Version: v1.24.4.
OS Version: CentOS 7.6 x86
The text was updated successfully, but these errors were encountered: