Only process server updates on workloads affected by the server #12017

adleong · 2024-01-31T00:17:24Z

We the destination controller's workload watcher receives an update for any Server resource, it recomputes opaqueness for every workload. This is because the Server update may have changed opaqueness for that workload. However, this is very CPU intensive for the destination controller, especially during resyncs when we get Server updates for every Server resource in the cluster.

Instead, we only need to recompute opaqueness for workloads that are selected by the old version of the Server or by the new version of the Server. If a workload is not selected by either the new or old version of the Server, then the Server update cannot have changed the workload's opaqueness.

Signed-off-by: Alex Leong <[email protected]>

alpeb

Noice! 🤠 🚢

zaharidichev

Very nice. I think this is certainly better than skipping resyncs completely.

alpeb · 2024-02-01T15:48:08Z

Does this address #11995?

adleong · 2024-02-01T18:42:22Z

@alpeb no it doesn't

This edge release contains performance and stability improvements to the Destination controller, and continues stabilizing support for ExternalWorkloads. * Reduced the load on the Destination controller by only processing Server updates on workloads affected by the Server ([#12017]) * Changed how the Destination controller reacts to target clusters (in multicluster pod-to-pod mode) whose Server CRD is outdated: skip them and log an error instead of panicking ([#12008]) * Improved the leader election of the ExternalWorkloads Endpoints controller to avoid missing events ([#12021]) * Improved naming of EndpointSlices generated by ExternWorkloads ([#12016])

This edge release contains performance and stability improvements to the Destination controller, and continues stabilizing support for ExternalWorkloads. * Reduced the load on the Destination controller by only processing Server updates on workloads affected by the Server ([#12017]) * Changed how the Destination controller reacts to target clusters (in multicluster pod-to-pod mode) whose Server CRD is outdated: skip them and log an error instead of panicking ([#12008]) * Improved the leader election of the ExternalWorkloads Endpoints controller to avoid missing events ([#12021]) * Improved naming of EndpointSlices generated by ExternWorkloads ([#12016]) * Restriced the number of IPs an ExternalWorkload can have ([#12026])

We the destination controller's workload watcher receives an update for any Server resource, it recomputes opaqueness for every workload. This is because the Server update may have changed opaqueness for that workload. However, this is very CPU intensive for the destination controller, especially during resyncs when we get Server updates for every Server resource in the cluster. Instead, we only need to recompute opaqueness for workloads that are selected by the old version of the Server or by the new version of the Server. If a workload is not selected by either the new or old version of the Server, then the Server update cannot have changed the workload's opaqueness. Signed-off-by: Alex Leong <[email protected]>

@jseiser

This stable release back-ports bugfixes and improvements from recent edge releases. * Introduced support for arbitrary labels in the `podMonitors` field in the control plane Helm chart (thanks @jseiser!) ([#11222]; fixes [#11175]) * Added a `prometheusUrl` field for the heartbeat job in the control plane Helm chart (thanks @david972!) ([#11343]; fixes [#11342]) * Updated the Destination controller to return `INVALID_ARGUMENT` status codes properly when a `ServiceProfile` is requested for a service that does not exist. ([#11980]) * Reduced the load on the Destination controller by only processing Server updates on workloads affected by the Server ([#12017]) * Changed how updates to a `Server` selector are handled in the destination service. When a `Server` that marks a port as opaque no longer selects a resource, the resource's opaqueness will reverted to default settings ([#12031]; fixes [#11995]) * Fixed a race condition in the destination service that could cause panics under very specific conditions ([#12022]; fixes [#12010]) * Fixed an issue where inbound policy could be incorrect after certain policy resources are deleted ([#12088]) [#11222]: #11222 [#11175]: #11175 [#11343]: #11343 [#11342]: #11342 [#11980]: #11980 [#12017]: #12017 [#11995]: #11995 [#12031]: #12031 [#12010]: #12010 [#12022]: #12022 [#12088]: #12088 Signed-off-by: Alex Leong <[email protected]> Signed-off-by: David ALEXANDRE <[email protected]> Signed-off-by: Justin S <[email protected]> Co-authored-by: Oliver Gould <[email protected]> Co-authored-by: Alejandro Pedraza <[email protected]> Co-authored-by: David ALEXANDRE <[email protected]> Co-authored-by: Justin Seiser <[email protected]>

adleong added 2 commits January 30, 2024 23:53

Only process updates for workloads affected by the server

f14a8cc

Signed-off-by: Alex Leong <[email protected]>

re-add comment

444ca8d

Signed-off-by: Alex Leong <[email protected]>

adleong requested a review from a team as a code owner January 31, 2024 00:17

remove duplicate import

4dbdedf

Signed-off-by: Alex Leong <[email protected]>

alpeb approved these changes Jan 31, 2024

View reviewed changes

zaharidichev approved these changes Feb 1, 2024

View reviewed changes

adleong merged commit 3902b33 into main Feb 1, 2024
33 checks passed

adleong deleted the alex/sever-selector branch February 1, 2024 18:42

alpeb mentioned this pull request Feb 1, 2024

Change notes for edge-24.2.1 #12029

Merged

adleong mentioned this pull request Feb 19, 2024

stable 2.14.10 #12111

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Only process server updates on workloads affected by the server #12017

Only process server updates on workloads affected by the server #12017

adleong commented Jan 31, 2024

alpeb left a comment

zaharidichev left a comment

alpeb commented Feb 1, 2024

adleong commented Feb 1, 2024

Only process server updates on workloads affected by the server #12017

Only process server updates on workloads affected by the server #12017

Conversation

adleong commented Jan 31, 2024

alpeb left a comment

Choose a reason for hiding this comment

zaharidichev left a comment

Choose a reason for hiding this comment

alpeb commented Feb 1, 2024

adleong commented Feb 1, 2024