-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
dst: Stop overriding Host IP with Pod IP on HostPort lookup #11328
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
4aea55c
to
aa7629a
Compare
## Problem When there's a pod with a `hostPort` entry, `GetProfile` requests targetting the host's IP and that `hostPort` return an endpoint profile with that pod's IP and `containerPort`. If that pod vanishes and another one in that same host with that same `hostPort` comes up, the existing `GetProfile` streams won't get updated with the new pod information (metadata, identity, protocol). That breaks the connectivity of the client proxy relying on that stream. ## Partial Solution It should be less surprising for those `GetProfile` requests to return an endpoint profile with the same host IP and port requested, and leave to the cluster's CNI to peform the translation to the corresponding pod IP and `containerPort`. This PR performs that change, but continuing returning the corresponding pod's information alongside. If the pod associated to that host IP and port changes, the client proxy won't loose connectivity, but the pod's information won't get updated (that'll be fixed in a separate PR). A new unit test validating this has been added, which will be expanded to validate the changed pod information when that gets implemented. ## Details of Change - We no longer do the HostPort->ContainerPort conversion, so the `getPortForPod` function was dropped. - The `getPodByIp` function will now be split in two: `getPodByHostIP` and `getPodByPodIP`, the latter being called only if the former doesn't return anything. - The `createAddress` function is now simplified in that it just uses the passed IP to build the address. The passed IP will depend on which of the two functions just mentioned returned the pod (host IP or pod IP)
aa7629a
to
31a6623
Compare
alpeb
added a commit
that referenced
this pull request
Sep 4, 2023
Followup to #11328, based off of `alpeb/hostport-fixup-stopgap`. Implements a new pod watcher, instantiated along the other ones in the Destination server. It's generic enough to catch all pod events in the cluster, so it's up to the subscribers to filter out the ones they're interested in, and to set up any metrics. In the Destination server's `subscribeToEndpointProfile` method, we create a new `HostPortAdaptor` that is subscribed to the pod watcher, and forwards the pod and protocol updates to the `endpointProfileTranslator`. Handling of Server subscriptions are now handled by this adaptor, which are recycled whenever the pod changes. A new gauge metric `host_port_subscribers` has been created, tracking the number of subscribers for a given HostIP+port combination. ## Other Changes - Moved the `server.createAddress` method into a static function in `endpoints_watcher.go`, for better reusability. - The "Return profile for host port pods" test introduced in #11328 was extended to track the ensuing events after a pod is deleted and then recreated (:taco: to @adleong for the test). - Given that test consumes multiple events, we had to change the `profileStream` test helper to allow for the `GetProfile` call to block. Callers to `profileStream` now need to manually cancel the returned stream.
alpeb
added a commit
that referenced
this pull request
Sep 4, 2023
Followup to #11328, based off of `alpeb/hostport-fixup-stopgap`. Implements a new pod watcher, instantiated along the other ones in the Destination server. It's generic enough to catch all pod events in the cluster, so it's up to the subscribers to filter out the ones they're interested in, and to set up any metrics. In the Destination server's `subscribeToEndpointProfile` method, we create a new `HostPortAdaptor` that is subscribed to the pod watcher, and forwards the pod and protocol updates to the `endpointProfileTranslator`. Handling of Server subscriptions are now handled by this adaptor, which are recycled whenever the pod changes. A new gauge metric `host_port_subscribers` has been created, tracking the number of subscribers for a given HostIP+port combination. ## Other Changes - Moved the `server.createAddress` method into a static function in `endpoints_watcher.go`, for better reusability. - The "Return profile for host port pods" test introduced in #11328 was extended to track the ensuing events after a pod is deleted and then recreated (:taco: to @adleong for the test). - Given that test consumes multiple events, we had to change the `profileStream` test helper to allow for the `GetProfile` call to block. Callers to `profileStream` now need to manually cancel the returned stream.
olix0r
reviewed
Sep 5, 2023
olix0r
approved these changes
Sep 5, 2023
mateiidavid
approved these changes
Sep 6, 2023
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left two small comments, this looks great to me though! 🚢
alpeb
added a commit
that referenced
this pull request
Sep 6, 2023
Followup to #11328, based off of `alpeb/hostport-fixup-stopgap`. Implements a new pod watcher, instantiated along the other ones in the Destination server. It's generic enough to catch all pod events in the cluster, so it's up to the subscribers to filter out the ones they're interested in, and to set up any metrics. In the Destination server's `subscribeToEndpointProfile` method, we create a new `HostPortAdaptor` that is subscribed to the pod watcher, and forwards the pod and protocol updates to the `endpointProfileTranslator`. Handling of Server subscriptions are now handled by this adaptor, which are recycled whenever the pod changes. A new gauge metric `host_port_subscribers` has been created, tracking the number of subscribers for a given HostIP+port combination. ## Other Changes - Moved the `server.createAddress` method into a static function in `endpoints_watcher.go`, for better reusability. - The "Return profile for host port pods" test introduced in #11328 was extended to track the ensuing events after a pod is deleted and then recreated (:taco: to @adleong for the test). - Given that test consumes multiple events, we had to change the `profileStream` test helper to allow for the `GetProfile` call to block. Callers to `profileStream` now need to manually cancel the returned stream.
mateiidavid
added a commit
that referenced
this pull request
Sep 7, 2023
This edge release introduces a fix for service discovery on endpoints that use hostPorts. Previously, the destination service would return the pod IP for the discovery request which could break connectivity on pod restart. To fix this, direct pod communication for a pod bound on a hostPort will always return the hostIP. In addition, this change fixes a security vulnerability (CVE-2023-2603) detected in the CNI plugin and proxy-init images and includes a number of other fixes and small improvements. * Addressed security vulnerability CVE-2023-2603 in proxy-init and CNI plugin ([11296]) * Introduced resource requests/limits for the policy controller resource in the control plane helm chart ([11301]) * Fixed an issue where an empty `remoteDiscoverySelector` field in a multicluster link would cause all services to be mirrored ([11309]) * Removed time out from `linkerd multicluster gateways` command; when no metrics exist the command will return instantly ([11265]) * Improved help messaging for `linkerd multicluster link` ([11265]) * Changed hostPort lookup behaviour in the destination service; previously, endpoint lookups for pods bound on a hostPort would return the Pod IP which would result in loss of connectivity on pod restart, hostIPs are now always returned when a pod uses a hostPort ([11328]) * Updated HTTPRoute webhook rule to validate all apiVersions of the resource (thanks @mikutas!) ([11149]) * Fixed erroneous `skipped` messages when injecting namespaces with `linkerd inject` (thanks @mikutas!) ([10231]) [11309]: #11309 [11296]: #11296 [11328]: #11328 [11301]: #11301 [11265]: #11265 [11149]: #11149 [10231]: #10231 Signed-off-by: Matei David <[email protected]>
Merged
mateiidavid
added a commit
that referenced
this pull request
Sep 11, 2023
This edge release introduces a fix for service discovery on endpoints that use hostPorts. Previously, the destination service would return the pod IP for the discovery request which could break connectivity on pod restart. To fix this, direct pod communication for a pod bound on a hostPort will always return the hostIP. In addition, this release fixes a security vulnerability (CVE-2023-2603) detected in the CNI plugin and proxy-init images, and includes a number of other fixes and small improvements. * Addressed security vulnerability CVE-2023-2603 in proxy-init and CNI plugin ([#11296]) * Introduced resource requests/limits for the policy controller resource in the control plane helm chart ([#11301]) * Fixed an issue where an empty `remoteDiscoverySelector` field in a multicluster link would cause all services to be mirrored ([#11309]) * Removed time out from `linkerd multicluster gateways` command; when no metrics exist the command will return instantly ([#11265]) * Improved help messaging for `linkerd multicluster link` ([#11265]) * Changed how hostPort lookups are handled in the destination service. Previously, when doing service discovery for an endpoint bound on a hostPort, the destination service would return the corresponding pod IP. On pod restart, this could lead to loss of connectivity on the client's side. The destination service now always returns host IPs for service discovery on an endpoint that uses hostPorts ([#11328]) * Updated HTTPRoute webhook rule to validate all apiVersions of the resource (thanks @mikutas!) ([#11149]) * Fixed erroneous `skipped` messages when injecting namespaces with `linkerd inject` (thanks @mikutas!) ([#10231]) [#11309]: #11309 [#11296]: #11296 [#11328]: #11328 [#11301]: #11301 [#11265]: #11265 [#11149]: #11149 [#10231]: #10231 --------- Signed-off-by: Matei David <[email protected]> Co-authored-by: Eliza Weisman <[email protected]>
adamshawvipps
pushed a commit
to adamshawvipps/linkerd2
that referenced
this pull request
Sep 18, 2023
…11328) * stopgap fix for hostport staleness ## Problem When there's a pod with a `hostPort` entry, `GetProfile` requests targetting the host's IP and that `hostPort` return an endpoint profile with that pod's IP and `containerPort`. If that pod vanishes and another one in that same host with that same `hostPort` comes up, the existing `GetProfile` streams won't get updated with the new pod information (metadata, identity, protocol). That breaks the connectivity of the client proxy relying on that stream. ## Partial Solution It should be less surprising for those `GetProfile` requests to return an endpoint profile with the same host IP and port requested, and leave to the cluster's CNI to peform the translation to the corresponding pod IP and `containerPort`. This PR performs that change, but continuing returning the corresponding pod's information alongside. If the pod associated to that host IP and port changes, the client proxy won't loose connectivity, but the pod's information won't get updated (that'll be fixed in a separate PR). A new unit test validating this has been added, which will be expanded to validate the changed pod information when that gets implemented. ## Details of Change - We no longer do the HostPort->ContainerPort conversion, so the `getPortForPod` function was dropped. - The `getPodByIp` function will now be split in two: `getPodByPodIP` and `getPodByHostIP`, the latter being called only if the former doesn't return anything. - The `createAddress` function is now simplified in that it just uses the passed IP to build the address. The passed IP will depend on which of the two functions just mentioned returned the pod (host IP or pod IP)
adamshawvipps
pushed a commit
to adamshawvipps/linkerd2
that referenced
this pull request
Sep 18, 2023
This edge release introduces a fix for service discovery on endpoints that use hostPorts. Previously, the destination service would return the pod IP for the discovery request which could break connectivity on pod restart. To fix this, direct pod communication for a pod bound on a hostPort will always return the hostIP. In addition, this release fixes a security vulnerability (CVE-2023-2603) detected in the CNI plugin and proxy-init images, and includes a number of other fixes and small improvements. * Addressed security vulnerability CVE-2023-2603 in proxy-init and CNI plugin ([linkerd#11296]) * Introduced resource requests/limits for the policy controller resource in the control plane helm chart ([linkerd#11301]) * Fixed an issue where an empty `remoteDiscoverySelector` field in a multicluster link would cause all services to be mirrored ([linkerd#11309]) * Removed time out from `linkerd multicluster gateways` command; when no metrics exist the command will return instantly ([linkerd#11265]) * Improved help messaging for `linkerd multicluster link` ([linkerd#11265]) * Changed how hostPort lookups are handled in the destination service. Previously, when doing service discovery for an endpoint bound on a hostPort, the destination service would return the corresponding pod IP. On pod restart, this could lead to loss of connectivity on the client's side. The destination service now always returns host IPs for service discovery on an endpoint that uses hostPorts ([linkerd#11328]) * Updated HTTPRoute webhook rule to validate all apiVersions of the resource (thanks @mikutas!) ([linkerd#11149]) * Fixed erroneous `skipped` messages when injecting namespaces with `linkerd inject` (thanks @mikutas!) ([linkerd#10231]) [linkerd#11309]: linkerd#11309 [linkerd#11296]: linkerd#11296 [linkerd#11328]: linkerd#11328 [linkerd#11301]: linkerd#11301 [linkerd#11265]: linkerd#11265 [linkerd#11149]: linkerd#11149 [linkerd#10231]: linkerd#10231 --------- Signed-off-by: Matei David <[email protected]> Co-authored-by: Eliza Weisman <[email protected]>
adamshawvipps
pushed a commit
to adamshawvipps/linkerd2
that referenced
this pull request
Sep 18, 2023
…11328) * stopgap fix for hostport staleness ## Problem When there's a pod with a `hostPort` entry, `GetProfile` requests targetting the host's IP and that `hostPort` return an endpoint profile with that pod's IP and `containerPort`. If that pod vanishes and another one in that same host with that same `hostPort` comes up, the existing `GetProfile` streams won't get updated with the new pod information (metadata, identity, protocol). That breaks the connectivity of the client proxy relying on that stream. ## Partial Solution It should be less surprising for those `GetProfile` requests to return an endpoint profile with the same host IP and port requested, and leave to the cluster's CNI to peform the translation to the corresponding pod IP and `containerPort`. This PR performs that change, but continuing returning the corresponding pod's information alongside. If the pod associated to that host IP and port changes, the client proxy won't loose connectivity, but the pod's information won't get updated (that'll be fixed in a separate PR). A new unit test validating this has been added, which will be expanded to validate the changed pod information when that gets implemented. ## Details of Change - We no longer do the HostPort->ContainerPort conversion, so the `getPortForPod` function was dropped. - The `getPodByIp` function will now be split in two: `getPodByPodIP` and `getPodByHostIP`, the latter being called only if the former doesn't return anything. - The `createAddress` function is now simplified in that it just uses the passed IP to build the address. The passed IP will depend on which of the two functions just mentioned returned the pod (host IP or pod IP) Signed-off-by: Adam Shaw <[email protected]>
adamshawvipps
pushed a commit
to adamshawvipps/linkerd2
that referenced
this pull request
Sep 18, 2023
This edge release introduces a fix for service discovery on endpoints that use hostPorts. Previously, the destination service would return the pod IP for the discovery request which could break connectivity on pod restart. To fix this, direct pod communication for a pod bound on a hostPort will always return the hostIP. In addition, this release fixes a security vulnerability (CVE-2023-2603) detected in the CNI plugin and proxy-init images, and includes a number of other fixes and small improvements. * Addressed security vulnerability CVE-2023-2603 in proxy-init and CNI plugin ([linkerd#11296]) * Introduced resource requests/limits for the policy controller resource in the control plane helm chart ([linkerd#11301]) * Fixed an issue where an empty `remoteDiscoverySelector` field in a multicluster link would cause all services to be mirrored ([linkerd#11309]) * Removed time out from `linkerd multicluster gateways` command; when no metrics exist the command will return instantly ([linkerd#11265]) * Improved help messaging for `linkerd multicluster link` ([linkerd#11265]) * Changed how hostPort lookups are handled in the destination service. Previously, when doing service discovery for an endpoint bound on a hostPort, the destination service would return the corresponding pod IP. On pod restart, this could lead to loss of connectivity on the client's side. The destination service now always returns host IPs for service discovery on an endpoint that uses hostPorts ([linkerd#11328]) * Updated HTTPRoute webhook rule to validate all apiVersions of the resource (thanks @mikutas!) ([linkerd#11149]) * Fixed erroneous `skipped` messages when injecting namespaces with `linkerd inject` (thanks @mikutas!) ([linkerd#10231]) [linkerd#11309]: linkerd#11309 [linkerd#11296]: linkerd#11296 [linkerd#11328]: linkerd#11328 [linkerd#11301]: linkerd#11301 [linkerd#11265]: linkerd#11265 [linkerd#11149]: linkerd#11149 [linkerd#10231]: linkerd#10231 --------- Signed-off-by: Matei David <[email protected]> Co-authored-by: Eliza Weisman <[email protected]> Signed-off-by: Adam Shaw <[email protected]>
mateiidavid
pushed a commit
that referenced
this pull request
Sep 20, 2023
* stopgap fix for hostport staleness Problem: When there's a pod with a `hostPort` entry, `GetProfile` requests targetting the host's IP and that `hostPort` return an endpoint profile with that pod's IP and `containerPort`. If that pod vanishes and another one in that same host with that same `hostPort` comes up, the existing `GetProfile` streams won't get updated with the new pod information (metadata, identity, protocol). That breaks the connectivity of the client proxy relying on that stream. Partial Solution: It should be less surprising for those `GetProfile` requests to return an endpoint profile with the same host IP and port requested, and leave to the cluster's CNI to peform the translation to the corresponding pod IP and `containerPort`. This PR performs that change, but continuing returning the corresponding pod's information alongside. If the pod associated to that host IP and port changes, the client proxy won't loose connectivity, but the pod's information won't get updated (that'll be fixed in a separate PR). A new unit test validating this has been added, which will be expanded to validate the changed pod information when that gets implemented. Details of Change: - We no longer do the HostPort->ContainerPort conversion, so the `getPortForPod` function was dropped. - The `getPodByIp` function will now be split in two: `getPodByPodIP` and `getPodByHostIP`, the latter being called only if the former doesn't return anything. - The `createAddress` function is now simplified in that it just uses the passed IP to build the address. The passed IP will depend on which of the two functions just mentioned returned the pod (host IP or pod IP)
mateiidavid
added a commit
that referenced
this pull request
Sep 20, 2023
This stable releases addresses backports two fixes that address security vulnerabilities. The proxy's dependency on the webpki library has been updated to patch [RUSTSEC-2023-0052], a potential CPU usage denial-of-service attack when accepting a TLS handshake from an untrusted peer. In addition, the CNI and proxy-init images have been updated to patch [CVE-2023-2603] surfaced in the runtime image's libcap library. Finally, the release contains a backported fix for service discovery on endpoints that use hostPorts which could potentially disrupt connections on pod restarts. * Control Plane * Changed how hostPort lookups are handled in the destination service. Previously, when doing service discovery for an endpoint bound on a hostPort, the destination service would return the corresponding pod IP. On pod restart, this could lead to loss of connectivity on the client's side. The destination service now always returns host IPs for service discovery on an endpoint that uses hostPorts [#11328] * Proxy * Addressed security vulnerability [RUSTSEC-2023-0052] [#11389] * CNI * Addressed security vulnerability [CVE-2023-2603] in proxy-init and CNI plugin [#11348] [#11328]: #11328 [#11348]: #11348 [#11389]: #11389 [RUSTSEC-2023-0052]: https://rustsec.org/advisories/RUSTSEC-2023-0052.html [CVE-2023-2603]: GHSA-wp54-pwvg-rqq5 Signed-off-by: Matei David <[email protected]>
Merged
mateiidavid
added a commit
that referenced
this pull request
Sep 21, 2023
This stable release introduces a fix for service discovery on endpoints that use hostPorts. Previously, the destination service would return the pod IP associated with the endpoint which could break connectivity on pod restarts. Discovery responses have been changed to instead return the host IP. This release also fixes an issue in the multicluster extension where an empty `remoteDiscoverySelector` field in the `Link` resource would cause all services to be exported. Finally, this release addresses two security vulnerabilities, [CVE-2023-2603] and [RUSTSEC-2023-0052] respectively, and includes numerous other fixes and enhancements. * CLI * Fixed `linkerd check --proxy` incorrectly checking the proxy version of pods in the `completed` state (thanks @mikutas!) ([#11295]; fixes [#11280]) * Fixed erroneous `skipped` messages when injecting namespaces with `linkerd inject` (thanks @mikutas!) ([#10231]) * CNI * Addressed security vulnerability [CVE-2023-2603] in proxy-init and CNI plugin ([#11296]) * Control Plane * Changed how hostPort lookups are handled in the destination service. Previously, when doing service discovery for an endpoint bound on a hostPort, the destination service would return the corresponding pod IP. On pod restart, this could lead to loss of connectivity on the client's side. The destination service now always returns host IPs for service discovery on an endpoint that uses hostPorts ([#11328]) * Updated HTTPRoute webhook rule to validate all apiVersions of the resource (thanks @mikutas!) ([#11149]) * Helm * Removed unnecessary `linkerd.io/helm-release-version` annotation from the `linkerd-control-plane` Helm chart (thanks @mikutas!) ([#11329]; fixes [#10778]) * Introduced resource requests/limits for the policy controller resource in the control plane helm chart ([#11301]) * Multicluster * Fixed an issue where an empty `remoteDiscoverySelector` field in a multicluster link would cause all services to be mirrored ([#11309]) * Removed time out from `linkerd multicluster gateways` command; when no metrics exist the command will return instantly ([#11265]) * Improved help messaging for `linkerd multicluster link` ([#11265]) * Proxy * Addressed security vulnerability [RUSTSEC-2023-0052] in the proxy ([#11361]) [CVE-2023-2603]: GHSA-wp54-pwvg-rqq5 [RUSTSEC-2023-0052]: https://rustsec.org/advisories/RUSTSEC-2023-0052.html [#11295]: #11295 [#11280]: #11280 [#11361]: #11361 [#11329]: #11329 [#10778]: #10778 [#11309]: #11309 [#11296]: #11296 [#11328]: #11328 [#11301]: #11301 [#11265]: #11265 [#11149]: #11149 [#10231]: #10231 Signed-off-by: Matei David <[email protected]>
Merged
mateiidavid
added a commit
that referenced
this pull request
Sep 25, 2023
* stable-2.14.1 This stable release introduces a fix for service discovery on endpoints that use hostPorts. Previously, the destination service would return the pod IP associated with the endpoint which could break connectivity on pod restarts. Discovery responses have been changed to instead return the host IP. This release also fixes an issue in the multicluster extension where an empty `remoteDiscoverySelector` field in the `Link` resource would cause all services to be exported. Finally, this release addresses two security vulnerabilities, [CVE-2023-2603] and [RUSTSEC-2023-0052] respectively, and includes numerous other fixes and enhancements. * CLI * Fixed `linkerd check --proxy` incorrectly checking the proxy version of pods in the `completed` state (thanks @mikutas!) ([#11295]; fixes [#11280]) * Fixed erroneous `skipped` messages when injecting namespaces with `linkerd inject` (thanks @mikutas!) ([#10231]) * CNI * Addressed security vulnerability [CVE-2023-2603] in proxy-init and CNI plugin ([#11296]) * Control Plane * Changed how hostPort lookups are handled in the destination service. Previously, when doing service discovery for an endpoint bound on a hostPort, the destination service would return the corresponding pod IP. On pod restart, this could lead to loss of connectivity on the client's side. The destination service now always returns host IPs for service discovery on an endpoint that uses hostPorts ([#11328]) * Updated HTTPRoute webhook rule to validate all apiVersions of the resource (thanks @mikutas!) ([#11149]) * Helm * Removed unnecessary `linkerd.io/helm-release-version` annotation from the `linkerd-control-plane` Helm chart (thanks @mikutas!) ([#11329]; fixes [#10778]) * Introduced resource requests/limits for the policy controller resource in the control plane helm chart ([#11301]) * Multicluster * Fixed an issue where an empty `remoteDiscoverySelector` field in a multicluster link would cause all services to be mirrored ([#11309]) * Removed time out from `linkerd multicluster gateways` command; when no metrics exist the command will return instantly ([#11265]) * Improved help messaging for `linkerd multicluster link` ([#11265]) * Proxy * Addressed security vulnerability [RUSTSEC-2023-0052] in the proxy ([#11361]) [CVE-2023-2603]: GHSA-wp54-pwvg-rqq5 [RUSTSEC-2023-0052]: https://rustsec.org/advisories/RUSTSEC-2023-0052.html [#11295]: #11295 [#11280]: #11280 [#11361]: #11361 [#11329]: #11329 [#10778]: #10778 [#11309]: #11309 [#11296]: #11296 [#11328]: #11328 [#11301]: #11301 [#11265]: #11265 [#11149]: #11149 [#10231]: #10231 Signed-off-by: Matei David <[email protected]> Signed-off-by: Eliza Weisman <[email protected]> Co-authored-by: Eliza Weisman <[email protected]>
mateiidavid
added a commit
that referenced
this pull request
Sep 25, 2023
This stable releases addresses backports two fixes that address security vulnerabilities. The proxy's dependency on the webpki library has been updated to patch [RUSTSEC-2023-0052], a potential CPU usage denial-of-service attack when accepting a TLS handshake from an untrusted peer. In addition, the CNI and proxy-init images have been updated to patch [CVE-2023-2603] surfaced in the runtime image's libcap library. Finally, the release contains a backported fix for service discovery on endpoints that use hostPorts which could potentially disrupt connections on pod restarts. * Control Plane * Changed how hostPort lookups are handled in the destination service. Previously, when doing service discovery for an endpoint bound on a hostPort, the destination service would return the corresponding pod IP. On pod restart, this could lead to loss of connectivity on the client's side. The destination service now always returns host IPs for service discovery on an endpoint that uses hostPorts [#11328] * Proxy * Addressed security vulnerability [RUSTSEC-2023-0052] [#11389] * CNI * Addressed security vulnerability [CVE-2023-2603] in proxy-init and CNI plugin [#11348] [#11328]: #11328 [#11348]: #11348 [#11389]: #11389 [RUSTSEC-2023-0052]: https://rustsec.org/advisories/RUSTSEC-2023-0052.html [CVE-2023-2603]: GHSA-wp54-pwvg-rqq5 Signed-off-by: Matei David <[email protected]> Signed-off-by: Eliza Weisman <[email protected]> Co-authored-by: Alejandro Pedraza <[email protected]> Co-authored-by: Eliza Weisman <[email protected]>
alpeb
added a commit
that referenced
this pull request
Sep 27, 2023
Followup to #11328, based off of `alpeb/hostport-fixup-stopgap`. Implements a new pod watcher, instantiated along the other ones in the Destination server. It's generic enough to catch all pod events in the cluster, so it's up to the subscribers to filter out the ones they're interested in, and to set up any metrics. In the Destination server's `subscribeToEndpointProfile` method, we create a new `HostPortAdaptor` that is subscribed to the pod watcher, and forwards the pod and protocol updates to the `endpointProfileTranslator`. Handling of Server subscriptions are now handled by this adaptor, which are recycled whenever the pod changes. A new gauge metric `host_port_subscribers` has been created, tracking the number of subscribers for a given HostIP+port combination. ## Other Changes - Moved the `server.createAddress` method into a static function in `endpoints_watcher.go`, for better reusability. - The "Return profile for host port pods" test introduced in #11328 was extended to track the ensuing events after a pod is deleted and then recreated (:taco: to @adleong for the test). - Given that test consumes multiple events, we had to change the `profileStream` test helper to allow for the `GetProfile` call to block. Callers to `profileStream` now need to manually cancel the returned stream.
alpeb
added a commit
that referenced
this pull request
Sep 28, 2023
…kup changes (#11334) Followup to #11328 Implements a new pod watcher, instantiated along the other ones in the Destination server. It also watches on Servers and carries all the logic from ServerWatcher, which has now been decommissioned. The `CreateAddress()` function has been moved into a function of the PodWatcher, because now we're calling it on every update given the pod associated to an ip:port might change and we need to regenerate the Address object. That function also takes care of capturing opaque protocol info from associated Servers, which is not new and had some logic that was duped in the now defunct ServerWatcher. `getAnnotatedOpaquePorts()` got also moved for similar reasons. Other things to note about PodWatcher: - It publishes a new pair of metrics `ip_port_subscribers` and `ip_port_updates` leveraging the framework in `prometheus.go`. - The complexity in `updatePod()` is due to only send stream updates when there are changes in the pod's readiness, to avoid sending duped messages on every pod lifecycle event. - Finally, endpointProfileTranslator's `endpoint` (*pb.WeightedAddr) not being a static object anymore, the `Update()` function now receives an Address that allows it to rebuild the endpoint on the fly (and so `createEndpoint()` was converted into a method of endpointProfileTranslator).
mateiidavid
pushed a commit
that referenced
this pull request
Oct 26, 2023
…kup changes (#11334) Followup to #11328 Implements a new pod watcher, instantiated along the other ones in the Destination server. It also watches on Servers and carries all the logic from ServerWatcher, which has now been decommissioned. The `CreateAddress()` function has been moved into a function of the PodWatcher, because now we're calling it on every update given the pod associated to an ip:port might change and we need to regenerate the Address object. That function also takes care of capturing opaque protocol info from associated Servers, which is not new and had some logic that was duped in the now defunct ServerWatcher. `getAnnotatedOpaquePorts()` got also moved for similar reasons. Other things to note about PodWatcher: - It publishes a new pair of metrics `ip_port_subscribers` and `ip_port_updates` leveraging the framework in `prometheus.go`. - The complexity in `updatePod()` is due to only send stream updates when there are changes in the pod's readiness, to avoid sending duped messages on every pod lifecycle event. - Finally, endpointProfileTranslator's `endpoint` (*pb.WeightedAddr) not being a static object anymore, the `Update()` function now receives an Address that allows it to rebuild the endpoint on the fly (and so `createEndpoint()` was converted into a method of endpointProfileTranslator).
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Problem
When there's a pod with a
hostPort
entry,GetProfile
requests targetting the host's IP and thathostPort
return an endpoint profile with that pod's IP andcontainerPort
. If that pod vanishes and another one in that same host with that samehostPort
comes up, the existingGetProfile
streams won't get updated with the new pod information (metadata, identity, protocol).That breaks the connectivity of the client proxy relying on that stream.
Partial Solution
It should be less surprising for those
GetProfile
requests to return an endpoint profile with the same host IP and port requested, and leave to the cluster's CNI to peform the translation to the corresponding pod IP andcontainerPort
.This PR performs that change, but continuing returning the corresponding pod's information alongside.
If the pod associated to that host IP and port changes, the client proxy won't loose connectivity, but the pod's information won't get updated (that'll be fixed in a separate PR).
A new unit test validating this has been added, which will be expanded to validate the changed pod information when that gets implemented.
Details of Change
getPortForPod
function was dropped.getPodByIp
function will now be split in two:getPodByHostIP
andgetPodByPodIP
, the latter being called only if the former doesn't return anything.createAddress
function is now simplified in that it just uses the passed IP to build the address. The passed IP will depend on which of the two functions just mentioned returned the pod (host IP or pod IP)