Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dst: Stop overriding Host IP with Pod IP on HostPort lookup #11328

Merged
merged 3 commits into from
Sep 6, 2023

Conversation

alpeb
Copy link
Member

@alpeb alpeb commented Sep 1, 2023

Problem

When there's a pod with a hostPort entry, GetProfile requests targetting the host's IP and that hostPort return an endpoint profile with that pod's IP and containerPort. If that pod vanishes and another one in that same host with that same hostPort comes up, the existing GetProfile streams won't get updated with the new pod information (metadata, identity, protocol).

That breaks the connectivity of the client proxy relying on that stream.

Partial Solution

It should be less surprising for those GetProfile requests to return an endpoint profile with the same host IP and port requested, and leave to the cluster's CNI to peform the translation to the corresponding pod IP and containerPort.

This PR performs that change, but continuing returning the corresponding pod's information alongside.

If the pod associated to that host IP and port changes, the client proxy won't loose connectivity, but the pod's information won't get updated (that'll be fixed in a separate PR).

A new unit test validating this has been added, which will be expanded to validate the changed pod information when that gets implemented.

Details of Change

  • We no longer do the HostPort->ContainerPort conversion, so the getPortForPod function was dropped.
  • The getPodByIp function will now be split in two: getPodByHostIP and getPodByPodIP, the latter being called only if the former doesn't return anything.
  • The createAddress function is now simplified in that it just uses the passed IP to build the address. The passed IP will depend on which of the two functions just mentioned returned the pod (host IP or pod IP)

@alpeb alpeb requested a review from a team as a code owner September 1, 2023 20:17
@alpeb alpeb force-pushed the alpeb/hostport-fixup-stopgap branch from 4aea55c to aa7629a Compare September 1, 2023 20:29
## Problem

When there's a pod with a `hostPort` entry, `GetProfile` requests
targetting the host's IP and that `hostPort` return an endpoint profile
with that pod's IP and `containerPort`. If that pod vanishes and another
one in that same host with that same `hostPort` comes up, the existing
`GetProfile` streams won't get updated with the new pod information
(metadata, identity, protocol).

That breaks the connectivity of the client proxy relying on that stream.

## Partial Solution

It should be less surprising for those `GetProfile` requests to return
an endpoint profile with the same host IP and port requested, and leave
to the cluster's CNI to peform the translation to the corresponding pod
IP and `containerPort`.

This PR performs that change, but continuing returning the corresponding
pod's information alongside.

If the pod associated to that host IP and port changes, the client proxy
won't loose connectivity, but the pod's information won't get updated
(that'll be fixed in a separate PR).

A new unit test validating this has been added, which will be expanded
to validate the changed pod information when that gets implemented.

## Details of Change

- We no longer do the HostPort->ContainerPort conversion, so the
  `getPortForPod` function was dropped.
- The `getPodByIp` function will now be split in two: `getPodByHostIP`
  and `getPodByPodIP`, the latter being called only if the former
  doesn't return anything.
- The `createAddress` function is now simplified in that it just uses
  the passed IP to build the address. The passed IP will depend on which
  of the two functions just mentioned returned the pod (host IP or pod
  IP)
@alpeb alpeb force-pushed the alpeb/hostport-fixup-stopgap branch from aa7629a to 31a6623 Compare September 1, 2023 20:38
alpeb added a commit that referenced this pull request Sep 4, 2023
Followup to #11328, based off of `alpeb/hostport-fixup-stopgap`.

Implements a new pod watcher, instantiated along the other ones in the
Destination server. It's generic enough to catch all pod events in the
cluster, so it's up to the subscribers to filter out the ones they're
interested in, and to set up any metrics.

In the Destination server's `subscribeToEndpointProfile` method, we
create a new `HostPortAdaptor` that is subscribed to the pod watcher,
and forwards the pod and protocol updates to the
`endpointProfileTranslator`. Handling of Server subscriptions are now
handled by this adaptor, which are recycled whenever the pod changes.

A new gauge metric `host_port_subscribers` has been created, tracking
the number of subscribers for a given HostIP+port combination.

## Other Changes

- Moved the `server.createAddress` method into a static function in
  `endpoints_watcher.go`, for better reusability.
- The "Return profile for host port pods" test introduced in #11328 was
  extended to track the ensuing events after a pod is deleted and then
  recreated (:taco: to @adleong for the test).
- Given that test consumes multiple events, we had to change the
  `profileStream` test helper to allow for the `GetProfile` call to
  block. Callers to `profileStream` now need to manually cancel the
  returned stream.
alpeb added a commit that referenced this pull request Sep 4, 2023
Followup to #11328, based off of `alpeb/hostport-fixup-stopgap`.

Implements a new pod watcher, instantiated along the other ones in the
Destination server. It's generic enough to catch all pod events in the
cluster, so it's up to the subscribers to filter out the ones they're
interested in, and to set up any metrics.

In the Destination server's `subscribeToEndpointProfile` method, we
create a new `HostPortAdaptor` that is subscribed to the pod watcher,
and forwards the pod and protocol updates to the
`endpointProfileTranslator`. Handling of Server subscriptions are now
handled by this adaptor, which are recycled whenever the pod changes.

A new gauge metric `host_port_subscribers` has been created, tracking
the number of subscribers for a given HostIP+port combination.

## Other Changes

- Moved the `server.createAddress` method into a static function in
  `endpoints_watcher.go`, for better reusability.
- The "Return profile for host port pods" test introduced in #11328 was
  extended to track the ensuing events after a pod is deleted and then
  recreated (:taco: to @adleong for the test).
- Given that test consumes multiple events, we had to change the
  `profileStream` test helper to allow for the `GetProfile` call to
  block. Callers to `profileStream` now need to manually cancel the
  returned stream.
@alpeb alpeb changed the title stopgap fix for hostport staleness dst: Stop overriding Host IP with Pod IP on HostPort lookup Sep 5, 2023
Copy link
Member

@mateiidavid mateiidavid left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left two small comments, this looks great to me though! 🚢

controller/api/destination/server.go Outdated Show resolved Hide resolved
controller/api/destination/server.go Outdated Show resolved Hide resolved
@alpeb alpeb merged commit 3d1a3e0 into main Sep 6, 2023
@alpeb alpeb deleted the alpeb/hostport-fixup-stopgap branch September 6, 2023 15:35
alpeb added a commit that referenced this pull request Sep 6, 2023
Followup to #11328, based off of `alpeb/hostport-fixup-stopgap`.

Implements a new pod watcher, instantiated along the other ones in the
Destination server. It's generic enough to catch all pod events in the
cluster, so it's up to the subscribers to filter out the ones they're
interested in, and to set up any metrics.

In the Destination server's `subscribeToEndpointProfile` method, we
create a new `HostPortAdaptor` that is subscribed to the pod watcher,
and forwards the pod and protocol updates to the
`endpointProfileTranslator`. Handling of Server subscriptions are now
handled by this adaptor, which are recycled whenever the pod changes.

A new gauge metric `host_port_subscribers` has been created, tracking
the number of subscribers for a given HostIP+port combination.

## Other Changes

- Moved the `server.createAddress` method into a static function in
  `endpoints_watcher.go`, for better reusability.
- The "Return profile for host port pods" test introduced in #11328 was
  extended to track the ensuing events after a pod is deleted and then
  recreated (:taco: to @adleong for the test).
- Given that test consumes multiple events, we had to change the
  `profileStream` test helper to allow for the `GetProfile` call to
  block. Callers to `profileStream` now need to manually cancel the
  returned stream.
mateiidavid added a commit that referenced this pull request Sep 7, 2023
This edge release introduces a fix for service discovery on endpoints that use
hostPorts. Previously, the destination service would return the pod IP for the
discovery request which could break connectivity on pod restart. To fix this,
direct pod communication for a pod bound on a hostPort will always return the
hostIP. In addition, this change fixes a security vulnerability (CVE-2023-2603)
detected in the CNI plugin and proxy-init images and includes a number of other
fixes and small improvements.

* Addressed security vulnerability CVE-2023-2603 in proxy-init and CNI plugin
  ([11296])
* Introduced resource requests/limits for the policy controller resource in the
  control plane helm chart ([11301])
* Fixed an issue where an empty `remoteDiscoverySelector` field in a
  multicluster link would cause all services to be mirrored ([11309])
* Removed time out from `linkerd multicluster gateways` command; when no
  metrics exist the command will return instantly ([11265])
* Improved help messaging for `linkerd multicluster link` ([11265])
* Changed hostPort lookup behaviour in the destination service; previously,
  endpoint lookups for pods bound on a hostPort would return the Pod IP which
  would result in loss of connectivity on pod restart, hostIPs are now always
  returned when a pod uses a hostPort ([11328])
* Updated HTTPRoute webhook rule to validate all apiVersions of the resource
  (thanks @mikutas!) ([11149])
* Fixed erroneous `skipped` messages when injecting namespaces with `linkerd
  inject` (thanks @mikutas!) ([10231])

[11309]: #11309
[11296]: #11296
[11328]: #11328
[11301]: #11301
[11265]: #11265
[11149]: #11149
[10231]: #10231

Signed-off-by: Matei David <[email protected]>
@mateiidavid mateiidavid mentioned this pull request Sep 7, 2023
mateiidavid added a commit that referenced this pull request Sep 11, 2023
This edge release introduces a fix for service discovery on endpoints that use
hostPorts. Previously, the destination service would return the pod IP for the
discovery request which could break connectivity on pod restart. To fix this,
direct pod communication for a pod bound on a hostPort will always return the
hostIP. In addition, this release fixes a security vulnerability (CVE-2023-2603)
detected in the CNI plugin and proxy-init images, and includes a number of other
fixes and small improvements.

* Addressed security vulnerability CVE-2023-2603 in proxy-init and CNI plugin
  ([#11296])
* Introduced resource requests/limits for the policy controller resource in the
  control plane helm chart ([#11301])
* Fixed an issue where an empty `remoteDiscoverySelector` field in a
  multicluster link would cause all services to be mirrored ([#11309])
* Removed time out from `linkerd multicluster gateways` command; when no
  metrics exist the command will return instantly ([#11265])
* Improved help messaging for `linkerd multicluster link` ([#11265])
* Changed how hostPort lookups are handled in the destination service.
  Previously, when doing service discovery for an endpoint bound on a hostPort,
  the destination service would return the corresponding pod IP. On pod
  restart, this could lead to loss of connectivity on the client's side. The
  destination service now always returns host IPs for service discovery on an
  endpoint that uses hostPorts ([#11328])
* Updated HTTPRoute webhook rule to validate all apiVersions of the resource
  (thanks @mikutas!) ([#11149])
* Fixed erroneous `skipped` messages when injecting namespaces with `linkerd
  inject` (thanks @mikutas!) ([#10231])

[#11309]: #11309
[#11296]: #11296
[#11328]: #11328
[#11301]: #11301
[#11265]: #11265
[#11149]: #11149
[#10231]: #10231

---------

Signed-off-by: Matei David <[email protected]>
Co-authored-by: Eliza Weisman <[email protected]>
adamshawvipps pushed a commit to adamshawvipps/linkerd2 that referenced this pull request Sep 18, 2023
…11328)

* stopgap fix for hostport staleness

## Problem

When there's a pod with a `hostPort` entry, `GetProfile` requests
targetting the host's IP and that `hostPort` return an endpoint profile
with that pod's IP and `containerPort`. If that pod vanishes and another
one in that same host with that same `hostPort` comes up, the existing
`GetProfile` streams won't get updated with the new pod information
(metadata, identity, protocol).

That breaks the connectivity of the client proxy relying on that stream.

## Partial Solution

It should be less surprising for those `GetProfile` requests to return
an endpoint profile with the same host IP and port requested, and leave
to the cluster's CNI to peform the translation to the corresponding pod
IP and `containerPort`.

This PR performs that change, but continuing returning the corresponding
pod's information alongside.

If the pod associated to that host IP and port changes, the client proxy
won't loose connectivity, but the pod's information won't get updated
(that'll be fixed in a separate PR).

A new unit test validating this has been added, which will be expanded
to validate the changed pod information when that gets implemented.

## Details of Change

- We no longer do the HostPort->ContainerPort conversion, so the
  `getPortForPod` function was dropped.
- The `getPodByIp` function will now be split in two: `getPodByPodIP`
  and `getPodByHostIP`, the latter being called only if the former
  doesn't return anything.
- The `createAddress` function is now simplified in that it just uses
  the passed IP to build the address. The passed IP will depend on which
  of the two functions just mentioned returned the pod (host IP or pod
  IP)
adamshawvipps pushed a commit to adamshawvipps/linkerd2 that referenced this pull request Sep 18, 2023
This edge release introduces a fix for service discovery on endpoints that use
hostPorts. Previously, the destination service would return the pod IP for the
discovery request which could break connectivity on pod restart. To fix this,
direct pod communication for a pod bound on a hostPort will always return the
hostIP. In addition, this release fixes a security vulnerability (CVE-2023-2603)
detected in the CNI plugin and proxy-init images, and includes a number of other
fixes and small improvements.

* Addressed security vulnerability CVE-2023-2603 in proxy-init and CNI plugin
  ([linkerd#11296])
* Introduced resource requests/limits for the policy controller resource in the
  control plane helm chart ([linkerd#11301])
* Fixed an issue where an empty `remoteDiscoverySelector` field in a
  multicluster link would cause all services to be mirrored ([linkerd#11309])
* Removed time out from `linkerd multicluster gateways` command; when no
  metrics exist the command will return instantly ([linkerd#11265])
* Improved help messaging for `linkerd multicluster link` ([linkerd#11265])
* Changed how hostPort lookups are handled in the destination service.
  Previously, when doing service discovery for an endpoint bound on a hostPort,
  the destination service would return the corresponding pod IP. On pod
  restart, this could lead to loss of connectivity on the client's side. The
  destination service now always returns host IPs for service discovery on an
  endpoint that uses hostPorts ([linkerd#11328])
* Updated HTTPRoute webhook rule to validate all apiVersions of the resource
  (thanks @mikutas!) ([linkerd#11149])
* Fixed erroneous `skipped` messages when injecting namespaces with `linkerd
  inject` (thanks @mikutas!) ([linkerd#10231])

[linkerd#11309]: linkerd#11309
[linkerd#11296]: linkerd#11296
[linkerd#11328]: linkerd#11328
[linkerd#11301]: linkerd#11301
[linkerd#11265]: linkerd#11265
[linkerd#11149]: linkerd#11149
[linkerd#10231]: linkerd#10231

---------

Signed-off-by: Matei David <[email protected]>
Co-authored-by: Eliza Weisman <[email protected]>
adamshawvipps pushed a commit to adamshawvipps/linkerd2 that referenced this pull request Sep 18, 2023
…11328)

* stopgap fix for hostport staleness

## Problem

When there's a pod with a `hostPort` entry, `GetProfile` requests
targetting the host's IP and that `hostPort` return an endpoint profile
with that pod's IP and `containerPort`. If that pod vanishes and another
one in that same host with that same `hostPort` comes up, the existing
`GetProfile` streams won't get updated with the new pod information
(metadata, identity, protocol).

That breaks the connectivity of the client proxy relying on that stream.

## Partial Solution

It should be less surprising for those `GetProfile` requests to return
an endpoint profile with the same host IP and port requested, and leave
to the cluster's CNI to peform the translation to the corresponding pod
IP and `containerPort`.

This PR performs that change, but continuing returning the corresponding
pod's information alongside.

If the pod associated to that host IP and port changes, the client proxy
won't loose connectivity, but the pod's information won't get updated
(that'll be fixed in a separate PR).

A new unit test validating this has been added, which will be expanded
to validate the changed pod information when that gets implemented.

## Details of Change

- We no longer do the HostPort->ContainerPort conversion, so the
  `getPortForPod` function was dropped.
- The `getPodByIp` function will now be split in two: `getPodByPodIP`
  and `getPodByHostIP`, the latter being called only if the former
  doesn't return anything.
- The `createAddress` function is now simplified in that it just uses
  the passed IP to build the address. The passed IP will depend on which
  of the two functions just mentioned returned the pod (host IP or pod
  IP)

Signed-off-by: Adam Shaw <[email protected]>
adamshawvipps pushed a commit to adamshawvipps/linkerd2 that referenced this pull request Sep 18, 2023
This edge release introduces a fix for service discovery on endpoints that use
hostPorts. Previously, the destination service would return the pod IP for the
discovery request which could break connectivity on pod restart. To fix this,
direct pod communication for a pod bound on a hostPort will always return the
hostIP. In addition, this release fixes a security vulnerability (CVE-2023-2603)
detected in the CNI plugin and proxy-init images, and includes a number of other
fixes and small improvements.

* Addressed security vulnerability CVE-2023-2603 in proxy-init and CNI plugin
  ([linkerd#11296])
* Introduced resource requests/limits for the policy controller resource in the
  control plane helm chart ([linkerd#11301])
* Fixed an issue where an empty `remoteDiscoverySelector` field in a
  multicluster link would cause all services to be mirrored ([linkerd#11309])
* Removed time out from `linkerd multicluster gateways` command; when no
  metrics exist the command will return instantly ([linkerd#11265])
* Improved help messaging for `linkerd multicluster link` ([linkerd#11265])
* Changed how hostPort lookups are handled in the destination service.
  Previously, when doing service discovery for an endpoint bound on a hostPort,
  the destination service would return the corresponding pod IP. On pod
  restart, this could lead to loss of connectivity on the client's side. The
  destination service now always returns host IPs for service discovery on an
  endpoint that uses hostPorts ([linkerd#11328])
* Updated HTTPRoute webhook rule to validate all apiVersions of the resource
  (thanks @mikutas!) ([linkerd#11149])
* Fixed erroneous `skipped` messages when injecting namespaces with `linkerd
  inject` (thanks @mikutas!) ([linkerd#10231])

[linkerd#11309]: linkerd#11309
[linkerd#11296]: linkerd#11296
[linkerd#11328]: linkerd#11328
[linkerd#11301]: linkerd#11301
[linkerd#11265]: linkerd#11265
[linkerd#11149]: linkerd#11149
[linkerd#10231]: linkerd#10231

---------

Signed-off-by: Matei David <[email protected]>
Co-authored-by: Eliza Weisman <[email protected]>
Signed-off-by: Adam Shaw <[email protected]>
mateiidavid pushed a commit that referenced this pull request Sep 20, 2023
* stopgap fix for hostport staleness

Problem:

When there's a pod with a `hostPort` entry, `GetProfile` requests
targetting the host's IP and that `hostPort` return an endpoint profile
with that pod's IP and `containerPort`. If that pod vanishes and another
one in that same host with that same `hostPort` comes up, the existing
`GetProfile` streams won't get updated with the new pod information
(metadata, identity, protocol).

That breaks the connectivity of the client proxy relying on that stream.

Partial Solution:

It should be less surprising for those `GetProfile` requests to return
an endpoint profile with the same host IP and port requested, and leave
to the cluster's CNI to peform the translation to the corresponding pod
IP and `containerPort`.

This PR performs that change, but continuing returning the corresponding
pod's information alongside.

If the pod associated to that host IP and port changes, the client proxy
won't loose connectivity, but the pod's information won't get updated
(that'll be fixed in a separate PR).

A new unit test validating this has been added, which will be expanded
to validate the changed pod information when that gets implemented.

Details of Change:

- We no longer do the HostPort->ContainerPort conversion, so the
  `getPortForPod` function was dropped.
- The `getPodByIp` function will now be split in two: `getPodByPodIP`
  and `getPodByHostIP`, the latter being called only if the former
  doesn't return anything.
- The `createAddress` function is now simplified in that it just uses
  the passed IP to build the address. The passed IP will depend on which
  of the two functions just mentioned returned the pod (host IP or pod
  IP)
mateiidavid added a commit that referenced this pull request Sep 20, 2023
This stable releases addresses backports two fixes that address security
vulnerabilities. The proxy's dependency on the webpki library has been updated
to patch [RUSTSEC-2023-0052], a potential CPU usage denial-of-service attack
when accepting a TLS handshake from an untrusted peer. In addition, the CNI and
proxy-init images have been updated to patch [CVE-2023-2603] surfaced in the
runtime image's libcap library. Finally, the release contains a backported fix
for service discovery on endpoints that use hostPorts which could potentially
disrupt connections on pod restarts.

* Control Plane
  * Changed how hostPort lookups are handled in the destination service.
    Previously, when doing service discovery for an endpoint bound on a
    hostPort, the destination service would return the corresponding pod IP. On
    pod restart, this could lead to loss of connectivity on the client's side.
    The destination service now always returns host IPs for service discovery
    on an endpoint that uses hostPorts [#11328]

* Proxy
  * Addressed security vulnerability [RUSTSEC-2023-0052] [#11389]

* CNI
  * Addressed security vulnerability [CVE-2023-2603] in proxy-init and CNI
    plugin [#11348]

[#11328]: #11328
[#11348]: #11348
[#11389]: #11389
[RUSTSEC-2023-0052]: https://rustsec.org/advisories/RUSTSEC-2023-0052.html
[CVE-2023-2603]: GHSA-wp54-pwvg-rqq5

Signed-off-by: Matei David <[email protected]>
@mateiidavid mateiidavid mentioned this pull request Sep 20, 2023
mateiidavid added a commit that referenced this pull request Sep 21, 2023
This stable release introduces a fix for service discovery on endpoints that
use hostPorts. Previously, the destination service would return the pod IP
associated with the endpoint which could break connectivity on pod restarts.
Discovery responses have been changed to instead return the host IP. This
release also fixes an issue in the multicluster extension where an empty
`remoteDiscoverySelector` field in the `Link` resource would cause all services
to be exported. Finally, this release addresses two security vulnerabilities,
[CVE-2023-2603] and [RUSTSEC-2023-0052] respectively, and includes numerous
other fixes and enhancements.

* CLI
  * Fixed `linkerd check --proxy` incorrectly checking the proxy version of
    pods in the `completed` state (thanks @mikutas!) ([#11295]; fixes [#11280])
  * Fixed erroneous `skipped` messages when injecting namespaces with `linkerd
    inject` (thanks @mikutas!) ([#10231])

* CNI
  * Addressed security vulnerability [CVE-2023-2603] in proxy-init and CNI
    plugin ([#11296])

* Control Plane
  * Changed how hostPort lookups are handled in the destination service.
    Previously, when doing service discovery for an endpoint bound on a
    hostPort, the destination service would return the corresponding pod IP. On
    pod restart, this could lead to loss of connectivity on the client's side.
    The destination service now always returns host IPs for service discovery
    on an endpoint that uses hostPorts ([#11328])
  * Updated HTTPRoute webhook rule to validate all apiVersions of the resource
    (thanks @mikutas!) ([#11149])

* Helm
  * Removed unnecessary `linkerd.io/helm-release-version` annotation from the
    `linkerd-control-plane` Helm chart (thanks @mikutas!) ([#11329]; fixes
    [#10778])
  * Introduced resource requests/limits for the policy controller resource in
    the control plane helm chart ([#11301])

* Multicluster
  * Fixed an issue where an empty `remoteDiscoverySelector` field in a
    multicluster link would cause all services to be mirrored ([#11309])
  * Removed time out from `linkerd multicluster gateways` command; when no
    metrics exist the command will return instantly ([#11265])
  * Improved help messaging for `linkerd multicluster link` ([#11265])

* Proxy
  * Addressed security vulnerability [RUSTSEC-2023-0052] in the proxy
    ([#11361])

[CVE-2023-2603]: GHSA-wp54-pwvg-rqq5
[RUSTSEC-2023-0052]: https://rustsec.org/advisories/RUSTSEC-2023-0052.html
[#11295]: #11295
[#11280]: #11280
[#11361]: #11361
[#11329]: #11329
[#10778]: #10778
[#11309]: #11309
[#11296]: #11296
[#11328]: #11328
[#11301]: #11301
[#11265]: #11265
[#11149]: #11149
[#10231]: #10231

Signed-off-by: Matei David <[email protected]>
@mateiidavid mateiidavid mentioned this pull request Sep 21, 2023
mateiidavid added a commit that referenced this pull request Sep 25, 2023
* stable-2.14.1

This stable release introduces a fix for service discovery on endpoints that
use hostPorts. Previously, the destination service would return the pod IP
associated with the endpoint which could break connectivity on pod restarts.
Discovery responses have been changed to instead return the host IP. This
release also fixes an issue in the multicluster extension where an empty
`remoteDiscoverySelector` field in the `Link` resource would cause all services
to be exported. Finally, this release addresses two security vulnerabilities,
[CVE-2023-2603] and [RUSTSEC-2023-0052] respectively, and includes numerous
other fixes and enhancements.

* CLI
  * Fixed `linkerd check --proxy` incorrectly checking the proxy version of
    pods in the `completed` state (thanks @mikutas!) ([#11295]; fixes [#11280])
  * Fixed erroneous `skipped` messages when injecting namespaces with `linkerd
    inject` (thanks @mikutas!) ([#10231])

* CNI
  * Addressed security vulnerability [CVE-2023-2603] in proxy-init and CNI
    plugin ([#11296])

* Control Plane
  * Changed how hostPort lookups are handled in the destination service.
    Previously, when doing service discovery for an endpoint bound on a
    hostPort, the destination service would return the corresponding pod IP. On
    pod restart, this could lead to loss of connectivity on the client's side.
    The destination service now always returns host IPs for service discovery
    on an endpoint that uses hostPorts ([#11328])
  * Updated HTTPRoute webhook rule to validate all apiVersions of the resource
    (thanks @mikutas!) ([#11149])

* Helm
  * Removed unnecessary `linkerd.io/helm-release-version` annotation from the
    `linkerd-control-plane` Helm chart (thanks @mikutas!) ([#11329]; fixes
    [#10778])
  * Introduced resource requests/limits for the policy controller resource in
    the control plane helm chart ([#11301])

* Multicluster
  * Fixed an issue where an empty `remoteDiscoverySelector` field in a
    multicluster link would cause all services to be mirrored ([#11309])
  * Removed time out from `linkerd multicluster gateways` command; when no
    metrics exist the command will return instantly ([#11265])
  * Improved help messaging for `linkerd multicluster link` ([#11265])

* Proxy
  * Addressed security vulnerability [RUSTSEC-2023-0052] in the proxy
    ([#11361])

[CVE-2023-2603]: GHSA-wp54-pwvg-rqq5
[RUSTSEC-2023-0052]: https://rustsec.org/advisories/RUSTSEC-2023-0052.html
[#11295]: #11295
[#11280]: #11280
[#11361]: #11361
[#11329]: #11329
[#10778]: #10778
[#11309]: #11309
[#11296]: #11296
[#11328]: #11328
[#11301]: #11301
[#11265]: #11265
[#11149]: #11149
[#10231]: #10231

Signed-off-by: Matei David <[email protected]>
Signed-off-by: Eliza Weisman <[email protected]>
Co-authored-by: Eliza Weisman <[email protected]>
mateiidavid added a commit that referenced this pull request Sep 25, 2023
This stable releases addresses backports two fixes that address security
vulnerabilities. The proxy's dependency on the webpki library has been updated
to patch [RUSTSEC-2023-0052], a potential CPU usage denial-of-service attack
when accepting a TLS handshake from an untrusted peer. In addition, the CNI and
proxy-init images have been updated to patch [CVE-2023-2603] surfaced in the
runtime image's libcap library. Finally, the release contains a backported fix
for service discovery on endpoints that use hostPorts which could potentially
disrupt connections on pod restarts.

* Control Plane
  * Changed how hostPort lookups are handled in the destination service.
    Previously, when doing service discovery for an endpoint bound on a
    hostPort, the destination service would return the corresponding pod IP. On
    pod restart, this could lead to loss of connectivity on the client's side.
    The destination service now always returns host IPs for service discovery
    on an endpoint that uses hostPorts [#11328]

* Proxy
  * Addressed security vulnerability [RUSTSEC-2023-0052] [#11389]

* CNI
  * Addressed security vulnerability [CVE-2023-2603] in proxy-init and CNI
    plugin [#11348]

[#11328]: #11328
[#11348]: #11348
[#11389]: #11389
[RUSTSEC-2023-0052]: https://rustsec.org/advisories/RUSTSEC-2023-0052.html
[CVE-2023-2603]: GHSA-wp54-pwvg-rqq5


Signed-off-by: Matei David <[email protected]>
Signed-off-by: Eliza Weisman <[email protected]>
Co-authored-by: Alejandro Pedraza <[email protected]>
Co-authored-by: Eliza Weisman <[email protected]>
alpeb added a commit that referenced this pull request Sep 27, 2023
Followup to #11328, based off of `alpeb/hostport-fixup-stopgap`.

Implements a new pod watcher, instantiated along the other ones in the
Destination server. It's generic enough to catch all pod events in the
cluster, so it's up to the subscribers to filter out the ones they're
interested in, and to set up any metrics.

In the Destination server's `subscribeToEndpointProfile` method, we
create a new `HostPortAdaptor` that is subscribed to the pod watcher,
and forwards the pod and protocol updates to the
`endpointProfileTranslator`. Handling of Server subscriptions are now
handled by this adaptor, which are recycled whenever the pod changes.

A new gauge metric `host_port_subscribers` has been created, tracking
the number of subscribers for a given HostIP+port combination.

## Other Changes

- Moved the `server.createAddress` method into a static function in
  `endpoints_watcher.go`, for better reusability.
- The "Return profile for host port pods" test introduced in #11328 was
  extended to track the ensuing events after a pod is deleted and then
  recreated (:taco: to @adleong for the test).
- Given that test consumes multiple events, we had to change the
  `profileStream` test helper to allow for the `GetProfile` call to
  block. Callers to `profileStream` now need to manually cancel the
  returned stream.
alpeb added a commit that referenced this pull request Sep 28, 2023
…kup changes (#11334)

Followup to #11328

Implements a new pod watcher, instantiated along the other ones in the Destination server. It also watches on Servers and carries all the logic from ServerWatcher, which has now been decommissioned.

The `CreateAddress()` function has been moved into a function of the PodWatcher, because now we're calling it on every update given the pod associated to an ip:port might change and we need to regenerate the Address object. That function also takes care of capturing opaque protocol info from associated Servers, which is not new and had some logic that was duped in the now defunct ServerWatcher. `getAnnotatedOpaquePorts()` got also moved for similar reasons.

Other things to note about PodWatcher:

- It publishes a new pair of metrics `ip_port_subscribers` and `ip_port_updates` leveraging the framework in `prometheus.go`.
- The complexity in `updatePod()` is due to only send stream updates when there are changes in the pod's readiness, to avoid sending duped messages on every pod lifecycle event.
- 
Finally, endpointProfileTranslator's `endpoint` (*pb.WeightedAddr) not being a static object anymore, the `Update()` function now receives an Address that allows it to rebuild the endpoint on the fly (and so `createEndpoint()` was converted into a method of endpointProfileTranslator).
mateiidavid pushed a commit that referenced this pull request Oct 26, 2023
…kup changes (#11334)

Followup to #11328

Implements a new pod watcher, instantiated along the other ones in the Destination server. It also watches on Servers and carries all the logic from ServerWatcher, which has now been decommissioned.

The `CreateAddress()` function has been moved into a function of the PodWatcher, because now we're calling it on every update given the pod associated to an ip:port might change and we need to regenerate the Address object. That function also takes care of capturing opaque protocol info from associated Servers, which is not new and had some logic that was duped in the now defunct ServerWatcher. `getAnnotatedOpaquePorts()` got also moved for similar reasons.

Other things to note about PodWatcher:

- It publishes a new pair of metrics `ip_port_subscribers` and `ip_port_updates` leveraging the framework in `prometheus.go`.
- The complexity in `updatePod()` is due to only send stream updates when there are changes in the pod's readiness, to avoid sending duped messages on every pod lifecycle event.
- 
Finally, endpointProfileTranslator's `endpoint` (*pb.WeightedAddr) not being a static object anymore, the `Update()` function now receives an Address that allows it to rebuild the endpoint on the fly (and so `createEndpoint()` was converted into a method of endpointProfileTranslator).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants