Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Only process server updates on workloads affected by the server #12017

Merged
merged 3 commits into from
Feb 1, 2024

Conversation

adleong
Copy link
Member

@adleong adleong commented Jan 31, 2024

We the destination controller's workload watcher receives an update for any Server resource, it recomputes opaqueness for every workload. This is because the Server update may have changed opaqueness for that workload. However, this is very CPU intensive for the destination controller, especially during resyncs when we get Server updates for every Server resource in the cluster.

Instead, we only need to recompute opaqueness for workloads that are selected by the old version of the Server or by the new version of the Server. If a workload is not selected by either the new or old version of the Server, then the Server update cannot have changed the workload's opaqueness.

@adleong adleong requested a review from a team as a code owner January 31, 2024 00:17
Signed-off-by: Alex Leong <[email protected]>
Copy link
Member

@alpeb alpeb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Noice! 🤠 🚢

Copy link
Member

@zaharidichev zaharidichev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice. I think this is certainly better than skipping resyncs completely.

@alpeb
Copy link
Member

alpeb commented Feb 1, 2024

Does this address #11995?

@adleong
Copy link
Member Author

adleong commented Feb 1, 2024

@alpeb no it doesn't

@adleong adleong merged commit 3902b33 into main Feb 1, 2024
33 checks passed
@adleong adleong deleted the alex/sever-selector branch February 1, 2024 18:42
alpeb added a commit that referenced this pull request Feb 1, 2024
This edge release contains performance and stability improvements to the
Destination controller, and continues stabilizing support for ExternalWorkloads.

* Reduced the load on the Destination controller by only processing Server
  updates on workloads affected by the Server ([#12017])
* Changed how the Destination controller reacts to target clusters (in
  multicluster pod-to-pod mode) whose Server CRD is outdated: skip them and log
  an error instead of panicking ([#12008])
* Improved the leader election of the ExternalWorkloads Endpoints controller to
  avoid missing events ([#12021])
* Improved naming of EndpointSlices generated by ExternWorkloads ([#12016])
alpeb added a commit that referenced this pull request Feb 1, 2024
This edge release contains performance and stability improvements to the
Destination controller, and continues stabilizing support for ExternalWorkloads.

* Reduced the load on the Destination controller by only processing Server
  updates on workloads affected by the Server ([#12017])
* Changed how the Destination controller reacts to target clusters (in
  multicluster pod-to-pod mode) whose Server CRD is outdated: skip them and log
  an error instead of panicking ([#12008])
* Improved the leader election of the ExternalWorkloads Endpoints controller to
  avoid missing events ([#12021])
* Improved naming of EndpointSlices generated by ExternWorkloads ([#12016])
alpeb added a commit that referenced this pull request Feb 2, 2024
This edge release contains performance and stability improvements to the
Destination controller, and continues stabilizing support for ExternalWorkloads.

* Reduced the load on the Destination controller by only processing Server
  updates on workloads affected by the Server ([#12017])
* Changed how the Destination controller reacts to target clusters (in
  multicluster pod-to-pod mode) whose Server CRD is outdated: skip them and log
  an error instead of panicking ([#12008])
* Improved the leader election of the ExternalWorkloads Endpoints controller to
  avoid missing events ([#12021])
* Improved naming of EndpointSlices generated by ExternWorkloads ([#12016])
* Restriced the number of IPs an ExternalWorkload can have ([#12026])
alpeb added a commit that referenced this pull request Feb 2, 2024
This edge release contains performance and stability improvements to the
Destination controller, and continues stabilizing support for ExternalWorkloads.

* Reduced the load on the Destination controller by only processing Server
  updates on workloads affected by the Server ([#12017])
* Changed how the Destination controller reacts to target clusters (in
  multicluster pod-to-pod mode) whose Server CRD is outdated: skip them and log
  an error instead of panicking ([#12008])
* Improved the leader election of the ExternalWorkloads Endpoints controller to
  avoid missing events ([#12021])
* Improved naming of EndpointSlices generated by ExternWorkloads ([#12016])
* Restriced the number of IPs an ExternalWorkload can have ([#12026])
adleong added a commit that referenced this pull request Feb 19, 2024
We the destination controller's workload watcher receives an update for any Server resource, it recomputes opaqueness for every workload.  This is because the Server update may have changed opaqueness for that workload.  However, this is very CPU intensive for the destination controller, especially during resyncs when we get Server updates for every Server resource in the cluster.

Instead, we only need to recompute opaqueness for workloads that are selected by the old version of the Server or by the new version of the Server.  If a workload is not selected by either the new or old version of the Server, then the Server update cannot have changed the workload's opaqueness.

Signed-off-by: Alex Leong <[email protected]>
@adleong adleong mentioned this pull request Feb 19, 2024
adleong added a commit that referenced this pull request Feb 20, 2024
This stable release back-ports bugfixes and improvements from recent edge
releases.

* Introduced support for arbitrary labels in the `podMonitors` field in the
  control plane Helm chart (thanks @jseiser!) ([#11222]; fixes [#11175])
* Added a `prometheusUrl` field for the heartbeat job in the control plane Helm
  chart (thanks @david972!) ([#11343]; fixes [#11342])
* Updated the Destination controller to return `INVALID_ARGUMENT` status codes
  properly when a `ServiceProfile` is requested for a service that does not
  exist. ([#11980])
* Reduced the load on the Destination controller by only processing Server
  updates on workloads affected by the Server ([#12017])
* Changed how updates to a `Server` selector are handled in the destination
  service. When a `Server` that marks a port as opaque no longer selects a
  resource, the resource's opaqueness will reverted to default settings
  ([#12031]; fixes [#11995])
* Fixed a race condition in the destination service that could cause panics
  under very specific conditions ([#12022]; fixes [#12010])
* Fixed an issue where inbound policy could be incorrect after certain policy
  resources are deleted ([#12088])

[#11222]: #11222
[#11175]: #11175
[#11343]: #11343
[#11342]: #11342
[#11980]: #11980
[#12017]: #12017
[#11995]: #11995
[#12031]: #12031
[#12010]: #12010
[#12022]: #12022
[#12088]: #12088

Signed-off-by: Alex Leong <[email protected]>
Signed-off-by: David ALEXANDRE <[email protected]>
Signed-off-by: Justin S <[email protected]>
Co-authored-by: Oliver Gould <[email protected]>
Co-authored-by: Alejandro Pedraza <[email protected]>
Co-authored-by: David ALEXANDRE <[email protected]>
Co-authored-by: Justin Seiser <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants