Skip to content

Add health checks to Kubernetes agent and proxy#59565

Merged
rana merged 1 commit intomasterfrom
rana/kube-healthchecks-6
Oct 7, 2025
Merged

Add health checks to Kubernetes agent and proxy#59565
rana merged 1 commit intomasterfrom
rana/kube-healthchecks-6

Conversation

@rana
Copy link
Copy Markdown
Contributor

@rana rana commented Sep 25, 2025

This is part of Kubernetes health check integration.

Implements core health check logic for Kubernetes.

In this PR:

  • Added functions CheckHealth and GetProtocol to kubeDetails
  • Added a HealthCheckManager field to the Kubernetes TLSServerConfig
  • Kube agent and proxy now instantiate a HealthCheckManager
  • Added a HealthCheckConfigReader to interfaces ReadKubernetesAccessPoint and ReadProxyAccessPoint
  • Added HealthCheckConfig read-only permissions for proxy and kube
  • Added HealthCheckConfig watching for proxy and kube
  • Added functions to KubeServer interface and KubernetesServerV3 struct:
    • GetTargetHealth() TargetHealth
    • SetTargetHealth(h TargetHealth)
    • GetTargetHealthStatus() TargetHealthStatus
    • SetTargetHealthStatus(status TargetHealthStatus)

Relates to:

@rana rana added kubernetes-access no-changelog Indicates that a PR does not require a changelog entry health-check Resource health check related labels Sep 25, 2025
@rana rana marked this pull request as ready for review September 25, 2025 03:25
@rana rana requested review from atburke, capnspacehook and tigrato and removed request for atburke September 25, 2025 03:29
Comment thread lib/kube/proxy/server.go
@rana rana force-pushed the rana/kube-healthchecks-6 branch 3 times, most recently from d4e9d7d to fe9c775 Compare September 25, 2025 23:20
Comment thread api/types/kubernetes_server.go
Comment thread api/types/kubernetes_server.go
Comment thread lib/kube/proxy/server.go
@rana rana requested a review from espadolini September 26, 2025 21:30
@rana rana force-pushed the rana/kube-healthchecks-6 branch from f2277eb to 3060106 Compare September 26, 2025 21:31
@rana rana requested a review from atburke September 26, 2025 22:17
Copy link
Copy Markdown
Contributor

@tigrato tigrato left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not supporting dynamic kube resources such as the resources discovered during eks/aks/gke discovery.

check registerKubeCluster functions

func (s *TLSServer) unregisterKubeCluster(ctx context.Context, name string) error {

Comment thread api/types/kubernetes_server.go Outdated
Comment thread lib/kube/proxy/cluster_details.go Outdated
Comment thread lib/kube/proxy/server.go Outdated
Comment thread lib/kube/proxy/server.go Outdated
@rana rana removed request for atburke and kopiczko September 29, 2025 16:58
Comment thread api/types/kubernetes_server.go Outdated
Comment thread api/types/kubernetes_server.go Outdated
Comment thread lib/kube/proxy/cluster_details.go Outdated
Comment thread lib/kube/proxy/server.go Outdated
Comment thread lib/kube/proxy/server.go Outdated
@rana rana requested review from rosstimothy and tigrato September 30, 2025 03:13
@rana rana force-pushed the rana/kube-healthchecks-6 branch from af9ff0c to f271107 Compare September 30, 2025 03:22
Comment thread lib/healthcheck/worker_experimental_test.go Outdated
Comment thread lib/kube/proxy/server.go Outdated
Comment thread lib/kube/proxy/server_test.go Outdated
Comment thread api/types/kubernetes_server.go
Comment thread lib/kube/proxy/cluster_details.go Outdated
Comment thread lib/kube/proxy/server.go Outdated
Comment thread lib/kube/proxy/server.go Outdated
Comment thread lib/kube/proxy/watcher.go Outdated
Comment thread lib/service/kubernetes.go
Comment thread lib/service/service.go
@rana rana force-pushed the rana/kube-healthchecks-6 branch from 8015477 to 1ebcbf2 Compare September 30, 2025 19:12
@rana rana force-pushed the rana/kube-healthchecks-6 branch 4 times, most recently from 09fe69e to b5222b9 Compare October 2, 2025 02:51
Comment thread lib/kube/proxy/forwarder_test.go Outdated
@rana rana force-pushed the rana/kube-healthchecks-6 branch 2 times, most recently from 631f87a to 45a3acb Compare October 3, 2025 21:30
Comment thread lib/kube/proxy/server.go Outdated
Comment thread lib/kube/proxy/server.go Outdated
Comment thread lib/service/kubernetes.go
@rana rana force-pushed the rana/kube-healthchecks-6 branch from 99d6078 to 4d8ca46 Compare October 6, 2025 18:00
@rana rana requested a review from rosstimothy October 6, 2025 18:08
Comment thread lib/healthcheck/manager.go Outdated
Comment thread lib/kube/proxy/server.go Outdated
@rana rana force-pushed the rana/kube-healthchecks-6 branch 2 times, most recently from ac4831a to 85f2a90 Compare October 6, 2025 22:33
Implements core health check logic for Kubernetes.

- Added functions `CheckHealth` and `GetProtocol` to `kubeDetails`
- Added a `HealthCheckManager` field to the Kubernetes `TLSServerConfig`
- Kube agent and proxy now instantiate a `HealthCheckManager`
- Added a `HealthCheckConfigReader` to interfaces `ReadKubernetesAccessPoint` and `ReadProxyAccessPoint`
- Added `HealthCheckConfig` read-only permissions for proxy and kube
- Added `HealthCheckConfig` watching for proxy and kube
- Added functions to `KubeServer` interface and `KubernetesServerV3` struct:
    - `GetTargetHealth() TargetHealth`
    - `SetTargetHealth(h TargetHealth)`
    - `GetTargetHealthStatus() TargetHealthStatus`
    - `SetTargetHealthStatus(status TargetHealthStatus)`

- Health check supports dynamic discovery of Kubernetes clusters. Calls to `startHealthCheck()` and `stopHealthCheck()` were rearranged.
    - Added functions `startHeartbeatAndHealthCheck()` and `stopHeartbeatAndHealthCheck()`
- Moved call to `HealthCheckManager.Start()` outside of the Kubernetes proxy server providing a future option to reuse `HealthCheckManager` in multiple proxy services
- Removed `Status` initialization in KubernetesServerV3 `CheckAndSetDefaults()`
- Added `kubernetesLabelMatchers` to `healthCheckConfig` struct. Default presets are omitted until the entire kube health check is complete.
- Added Kubernetes matcher checking to `ValidateHealthCheckConfig()`
- Changed `ValidateHealthCheckConfig()` to allow zero total DB matchers and Kubernetes matchers.
- Changed KubernetesServerV3 `GetTargetHealthStatus()` to return `TargetHealthStatusUnknown` instead of an empty string
- Changed health check worker `getTargetHealthTimeout` default timeout to `4s` from `10s`. This potentially reduces the response time of the initial heartbeat polling call to `GetServerInfo()`.

Part of #58413

Co-authored-by: rosstimothy <39066650+rosstimothy@users.noreply.github.com>
Co-authored-by: Edoardo Spadolini <edoardo.spadolini@goteleport.com>
@rana rana force-pushed the rana/kube-healthchecks-6 branch from 85f2a90 to 66cfad3 Compare October 7, 2025 21:57
@rana rana added this pull request to the merge queue Oct 7, 2025
Merged via the queue into master with commit af29034 Oct 7, 2025
41 checks passed
@rana rana deleted the rana/kube-healthchecks-6 branch October 7, 2025 22:39
rana added a commit that referenced this pull request Oct 23, 2025
Implements core health check logic for Kubernetes.

- Added functions `CheckHealth` and `GetProtocol` to `kubeDetails`
- Added a `HealthCheckManager` field to the Kubernetes `TLSServerConfig`
- Kube agent and proxy now instantiate a `HealthCheckManager`
- Added a `HealthCheckConfigReader` to interfaces `ReadKubernetesAccessPoint` and `ReadProxyAccessPoint`
- Added `HealthCheckConfig` read-only permissions for proxy and kube
- Added `HealthCheckConfig` watching for proxy and kube
- Added functions to `KubeServer` interface and `KubernetesServerV3` struct:
    - `GetTargetHealth() TargetHealth`
    - `SetTargetHealth(h TargetHealth)`
    - `GetTargetHealthStatus() TargetHealthStatus`
    - `SetTargetHealthStatus(status TargetHealthStatus)`

- Health check supports dynamic discovery of Kubernetes clusters. Calls to `startHealthCheck()` and `stopHealthCheck()` were rearranged.
    - Added functions `startHeartbeatAndHealthCheck()` and `stopHeartbeatAndHealthCheck()`
- Moved call to `HealthCheckManager.Start()` outside of the Kubernetes proxy server providing a future option to reuse `HealthCheckManager` in multiple proxy services
- Removed `Status` initialization in KubernetesServerV3 `CheckAndSetDefaults()`
- Added `kubernetesLabelMatchers` to `healthCheckConfig` struct. Default presets are omitted until the entire kube health check is complete.
- Added Kubernetes matcher checking to `ValidateHealthCheckConfig()`
- Changed `ValidateHealthCheckConfig()` to allow zero total DB matchers and Kubernetes matchers.
- Changed KubernetesServerV3 `GetTargetHealthStatus()` to return `TargetHealthStatusUnknown` instead of an empty string
- Changed health check worker `getTargetHealthTimeout` default timeout to `4s` from `10s`. This potentially reduces the response time of the initial heartbeat polling call to `GetServerInfo()`.

Part of #58413

Co-authored-by: rosstimothy <39066650+rosstimothy@users.noreply.github.com>
Co-authored-by: Edoardo Spadolini <edoardo.spadolini@goteleport.com>
@rana rana mentioned this pull request Oct 23, 2025
37 tasks
rana added a commit that referenced this pull request Oct 27, 2025
Implements core health check logic for Kubernetes.

- Added functions `CheckHealth` and `GetProtocol` to `kubeDetails`
- Added a `HealthCheckManager` field to the Kubernetes `TLSServerConfig`
- Kube agent and proxy now instantiate a `HealthCheckManager`
- Added a `HealthCheckConfigReader` to interfaces `ReadKubernetesAccessPoint` and `ReadProxyAccessPoint`
- Added `HealthCheckConfig` read-only permissions for proxy and kube
- Added `HealthCheckConfig` watching for proxy and kube
- Added functions to `KubeServer` interface and `KubernetesServerV3` struct:
    - `GetTargetHealth() TargetHealth`
    - `SetTargetHealth(h TargetHealth)`
    - `GetTargetHealthStatus() TargetHealthStatus`
    - `SetTargetHealthStatus(status TargetHealthStatus)`

- Health check supports dynamic discovery of Kubernetes clusters. Calls to `startHealthCheck()` and `stopHealthCheck()` were rearranged.
    - Added functions `startHeartbeatAndHealthCheck()` and `stopHeartbeatAndHealthCheck()`
- Moved call to `HealthCheckManager.Start()` outside of the Kubernetes proxy server providing a future option to reuse `HealthCheckManager` in multiple proxy services
- Removed `Status` initialization in KubernetesServerV3 `CheckAndSetDefaults()`
- Added `kubernetesLabelMatchers` to `healthCheckConfig` struct. Default presets are omitted until the entire kube health check is complete.
- Added Kubernetes matcher checking to `ValidateHealthCheckConfig()`
- Changed `ValidateHealthCheckConfig()` to allow zero total DB matchers and Kubernetes matchers.
- Changed KubernetesServerV3 `GetTargetHealthStatus()` to return `TargetHealthStatusUnknown` instead of an empty string
- Changed health check worker `getTargetHealthTimeout` default timeout to `4s` from `10s`. This potentially reduces the response time of the initial heartbeat polling call to `GetServerInfo()`.

Part of #58413

Co-authored-by: rosstimothy <39066650+rosstimothy@users.noreply.github.com>
Co-authored-by: Edoardo Spadolini <edoardo.spadolini@goteleport.com>
rana added a commit that referenced this pull request Oct 29, 2025
Implements core health check logic for Kubernetes.

- Added functions `CheckHealth` and `GetProtocol` to `kubeDetails`
- Added a `HealthCheckManager` field to the Kubernetes `TLSServerConfig`
- Kube agent and proxy now instantiate a `HealthCheckManager`
- Added a `HealthCheckConfigReader` to interfaces `ReadKubernetesAccessPoint` and `ReadProxyAccessPoint`
- Added `HealthCheckConfig` read-only permissions for proxy and kube
- Added `HealthCheckConfig` watching for proxy and kube
- Added functions to `KubeServer` interface and `KubernetesServerV3` struct:
    - `GetTargetHealth() TargetHealth`
    - `SetTargetHealth(h TargetHealth)`
    - `GetTargetHealthStatus() TargetHealthStatus`
    - `SetTargetHealthStatus(status TargetHealthStatus)`

- Health check supports dynamic discovery of Kubernetes clusters. Calls to `startHealthCheck()` and `stopHealthCheck()` were rearranged.
    - Added functions `startHeartbeatAndHealthCheck()` and `stopHeartbeatAndHealthCheck()`
- Moved call to `HealthCheckManager.Start()` outside of the Kubernetes proxy server providing a future option to reuse `HealthCheckManager` in multiple proxy services
- Removed `Status` initialization in KubernetesServerV3 `CheckAndSetDefaults()`
- Added `kubernetesLabelMatchers` to `healthCheckConfig` struct. Default presets are omitted until the entire kube health check is complete.
- Added Kubernetes matcher checking to `ValidateHealthCheckConfig()`
- Changed `ValidateHealthCheckConfig()` to allow zero total DB matchers and Kubernetes matchers.
- Changed KubernetesServerV3 `GetTargetHealthStatus()` to return `TargetHealthStatusUnknown` instead of an empty string
- Changed health check worker `getTargetHealthTimeout` default timeout to `4s` from `10s`. This potentially reduces the response time of the initial heartbeat polling call to `GetServerInfo()`.

Part of #58413

Co-authored-by: rosstimothy <39066650+rosstimothy@users.noreply.github.com>
Co-authored-by: Edoardo Spadolini <edoardo.spadolini@goteleport.com>
rana added a commit that referenced this pull request Oct 29, 2025
Implements core health check logic for Kubernetes.

- Added functions `CheckHealth` and `GetProtocol` to `kubeDetails`
- Added a `HealthCheckManager` field to the Kubernetes `TLSServerConfig`
- Kube agent and proxy now instantiate a `HealthCheckManager`
- Added a `HealthCheckConfigReader` to interfaces `ReadKubernetesAccessPoint` and `ReadProxyAccessPoint`
- Added `HealthCheckConfig` read-only permissions for proxy and kube
- Added `HealthCheckConfig` watching for proxy and kube
- Added functions to `KubeServer` interface and `KubernetesServerV3` struct:
    - `GetTargetHealth() TargetHealth`
    - `SetTargetHealth(h TargetHealth)`
    - `GetTargetHealthStatus() TargetHealthStatus`
    - `SetTargetHealthStatus(status TargetHealthStatus)`

- Health check supports dynamic discovery of Kubernetes clusters. Calls to `startHealthCheck()` and `stopHealthCheck()` were rearranged.
    - Added functions `startHeartbeatAndHealthCheck()` and `stopHeartbeatAndHealthCheck()`
- Moved call to `HealthCheckManager.Start()` outside of the Kubernetes proxy server providing a future option to reuse `HealthCheckManager` in multiple proxy services
- Removed `Status` initialization in KubernetesServerV3 `CheckAndSetDefaults()`
- Added `kubernetesLabelMatchers` to `healthCheckConfig` struct. Default presets are omitted until the entire kube health check is complete.
- Added Kubernetes matcher checking to `ValidateHealthCheckConfig()`
- Changed `ValidateHealthCheckConfig()` to allow zero total DB matchers and Kubernetes matchers.
- Changed KubernetesServerV3 `GetTargetHealthStatus()` to return `TargetHealthStatusUnknown` instead of an empty string
- Changed health check worker `getTargetHealthTimeout` default timeout to `4s` from `10s`. This potentially reduces the response time of the initial heartbeat polling call to `GetServerInfo()`.

Part of #58413

Co-authored-by: rosstimothy <39066650+rosstimothy@users.noreply.github.com>
Co-authored-by: Edoardo Spadolini <edoardo.spadolini@goteleport.com>
rana added a commit that referenced this pull request Oct 29, 2025
Implements core health check logic for Kubernetes.

- Added functions `CheckHealth` and `GetProtocol` to `kubeDetails`
- Added a `HealthCheckManager` field to the Kubernetes `TLSServerConfig`
- Kube agent and proxy now instantiate a `HealthCheckManager`
- Added a `HealthCheckConfigReader` to interfaces `ReadKubernetesAccessPoint` and `ReadProxyAccessPoint`
- Added `HealthCheckConfig` read-only permissions for proxy and kube
- Added `HealthCheckConfig` watching for proxy and kube
- Added functions to `KubeServer` interface and `KubernetesServerV3` struct:
    - `GetTargetHealth() TargetHealth`
    - `SetTargetHealth(h TargetHealth)`
    - `GetTargetHealthStatus() TargetHealthStatus`
    - `SetTargetHealthStatus(status TargetHealthStatus)`

- Health check supports dynamic discovery of Kubernetes clusters. Calls to `startHealthCheck()` and `stopHealthCheck()` were rearranged.
    - Added functions `startHeartbeatAndHealthCheck()` and `stopHeartbeatAndHealthCheck()`
- Moved call to `HealthCheckManager.Start()` outside of the Kubernetes proxy server providing a future option to reuse `HealthCheckManager` in multiple proxy services
- Removed `Status` initialization in KubernetesServerV3 `CheckAndSetDefaults()`
- Added `kubernetesLabelMatchers` to `healthCheckConfig` struct. Default presets are omitted until the entire kube health check is complete.
- Added Kubernetes matcher checking to `ValidateHealthCheckConfig()`
- Changed `ValidateHealthCheckConfig()` to allow zero total DB matchers and Kubernetes matchers.
- Changed KubernetesServerV3 `GetTargetHealthStatus()` to return `TargetHealthStatusUnknown` instead of an empty string
- Changed health check worker `getTargetHealthTimeout` default timeout to `4s` from `10s`. This potentially reduces the response time of the initial heartbeat polling call to `GetServerInfo()`.

Part of #58413

Co-authored-by: rosstimothy <39066650+rosstimothy@users.noreply.github.com>
Co-authored-by: Edoardo Spadolini <edoardo.spadolini@goteleport.com>
github-merge-queue bot pushed a commit that referenced this pull request Oct 29, 2025
* Add protobufs for Kubernetes health checks (#58415)

- Add Kubernetes label matchers to `Matcher` for `HealthCheckConfig`
- Add message `KubernetesServerStatusV3`
- Add `status` field to `KubernetesServerV3`
- Add `target_health` field to `Kube` for UI
- Regenerate Terraform schema and docs for `HealthCheckConfig`
- Add Kubernetes label matchers to Terraform test `TestImportHealthCheckConfig`

Relates to #58413

Co-authored-by: Edoardo Spadolini <edoardo.spadolini@goteleport.com>

* Refactor `healthcheck` for Kubernetes extensibility with `HealthChecker` interface (#59396)

The main intent of refactoring is to provide health check extensibility for Kubernetes while supporting the existing DB health checks. A `HealthChecker` interface is added to support the different health check approaches of DBs and Kubernetes. Existing DB TCP health check logic is moved to a new `TargetDialer` struct.

Changes:
- Added `HealthChecker` interface with two functions:
    - `CheckHealth(ctx context.Context) ([]string, error)`
    - `GetProtocol() types.TargetHealthProtocol`
- Added `TargetDialer` struct which encapsulates existing TCP health check logic
- Changed `Target` struct to use the `HealthChecker` interface
- Changed `worker.checkHealth` to call the new `CheckHealth` function
- Removed a `protocol` field from `healthCheckConfig`
- Added `TargetHealthProtocolHTTP` for use with Kubernetes health checks
- Moved and renamed test `Test_dialEndpoints` to `TestTargetDialer_dialEndpoints`
- Added files `net.go` and `net_test.go` for `TargetDialer`

Part of #58413

* Add health checks to Kubernetes agent and proxy (#59565)

Implements core health check logic for Kubernetes.

- Added functions `CheckHealth` and `GetProtocol` to `kubeDetails`
- Added a `HealthCheckManager` field to the Kubernetes `TLSServerConfig`
- Kube agent and proxy now instantiate a `HealthCheckManager`
- Added a `HealthCheckConfigReader` to interfaces `ReadKubernetesAccessPoint` and `ReadProxyAccessPoint`
- Added `HealthCheckConfig` read-only permissions for proxy and kube
- Added `HealthCheckConfig` watching for proxy and kube
- Added functions to `KubeServer` interface and `KubernetesServerV3` struct:
    - `GetTargetHealth() TargetHealth`
    - `SetTargetHealth(h TargetHealth)`
    - `GetTargetHealthStatus() TargetHealthStatus`
    - `SetTargetHealthStatus(status TargetHealthStatus)`

- Health check supports dynamic discovery of Kubernetes clusters. Calls to `startHealthCheck()` and `stopHealthCheck()` were rearranged.
    - Added functions `startHeartbeatAndHealthCheck()` and `stopHeartbeatAndHealthCheck()`
- Moved call to `HealthCheckManager.Start()` outside of the Kubernetes proxy server providing a future option to reuse `HealthCheckManager` in multiple proxy services
- Removed `Status` initialization in KubernetesServerV3 `CheckAndSetDefaults()`
- Added `kubernetesLabelMatchers` to `healthCheckConfig` struct. Default presets are omitted until the entire kube health check is complete.
- Added Kubernetes matcher checking to `ValidateHealthCheckConfig()`
- Changed `ValidateHealthCheckConfig()` to allow zero total DB matchers and Kubernetes matchers.
- Changed KubernetesServerV3 `GetTargetHealthStatus()` to return `TargetHealthStatusUnknown` instead of an empty string
- Changed health check worker `getTargetHealthTimeout` default timeout to `4s` from `10s`. This potentially reduces the response time of the initial heartbeat polling call to `GetServerInfo()`.

Part of #58413

Co-authored-by: rosstimothy <39066650+rosstimothy@users.noreply.github.com>
Co-authored-by: Edoardo Spadolini <edoardo.spadolini@goteleport.com>

* Add UI for Kubernetes health checks (#59929)

Adds health status indicators for Kubernetes clusters on the Resources
page. Unhealthy clusters are highlighted, and clicking them opens a side
panel displaying server information.

Changes include:
- New `KubeServer` protobuf message and `ListKubernetesServers` RPC
- Web and Connect API endpoints for fetching Kubernetes server data
- Health status filtering in `matchAndFilterKubeClusters`
- `TargetHealth` fields added to frontend/backend types
- Updated `StatusInfo.tsx` to display `kube_cluster` data

Part of #58413

Co-authored-by: rosstimothy <39066650+rosstimothy@users.noreply.github.com>

* Add health-based connection routing for Kubernetes (#60083)

Kubernetes servers are grouped by health, and dialed in order of healthy, unknown, then unhealthy, with random shuffling within each group for load distribution.

The grouping and shuffling is implemented generically in a separate iterator function `OrderByTargetHealthStatus()` for reuse and testability.

Changes:
- Added `healthcheck.OrderByTargetHealthStatus()` function with tests

Part of #58413

Co-authored-by: rosstimothy <39066650+rosstimothy@users.noreply.github.com>

* Allow disabling health check configs (#60447)

Health check configuration can be disabled and enabled through the matcher. Disabling the matcher disables health checks for this configuration's resources including databases, Kubernetes clusters, and any future resources.

Changes:
- Added `disabled` field to health check matcher proto
- Updated matcher selection logic
- Updated unit tests

Part of #58413

* Add documentation for Kubernetes health checks (#60201)

A new Kubernetes-specific health check page is added. The existing Kubernetes troubleshooting documentation page is updated with health check specific error resolutions.

Part of #58413

Co-authored-by: rosstimothy <39066650+rosstimothy@users.noreply.github.com>
Co-authored-by: Gavin Frazar <gavin.frazar@goteleport.com>
Co-authored-by: Paul Gottschling <paul.gottschling@goteleport.com>

* Enable health checks for Kubernetes with virtual defaults (#60544)

Health checks are enabled for all Kubernetes clusters by default.

A design of creating one health check config default per resource is implemented. The choice eases adoption of health checks, supports existing clusters that already have database health checks, and avoids migrating the backend database. A new Kubernetes-specific `default-kube` health check config is added. And a database-specific `default` health check config already exists, and is preserved.

A virtual default design is implemented by returning health check configs from memory if they don't exist in the backend database. The approach has the benefit of not re-inserting default values to the backend after they're deleted, which a prior approach had.

Virtual defaults are added at the local health check service level, and returned from functions `GetHealthCheckConfig` and `ListHealthCheckConfigs`. Virtual defaults may be written, updated, and deleted to and from the backend. While virtual defaults may be deleted, it has the net effect of resetting the config to default settings, and matching all resources of that type (db, kube). Virtual defaults are always returned from health check `get` and `list` functions.

Changes:
- Added `default-kube` health check config specific to Kubernetes only
- Updated local service functions `GetHealthCheckConfig` and `ListHealthCheckConfigs` to return virtual defaults
- Added unit tests
- Updated health check documentation with `default-kube` and info about virtual defaults

Part of #58413

Co-authored-by: Edoardo Spadolini <edoardo.spadolini@goteleport.com>

* Add `Resources` function for v18 backport

The generic `Resources` function is backported to the generic `Service` type.
`Resources` is used in a related commit enabling health checks for Kubernetes (#60544).

Part of #58413

* Fix test for terraform health check config (#60640)

Filters virtual defaults to allow explicit config references.

Part of #58413

---------

Co-authored-by: Edoardo Spadolini <edoardo.spadolini@goteleport.com>
Co-authored-by: rosstimothy <39066650+rosstimothy@users.noreply.github.com>
Co-authored-by: Gavin Frazar <gavin.frazar@goteleport.com>
Co-authored-by: Paul Gottschling <paul.gottschling@goteleport.com>
rhammonds-teleport pushed a commit that referenced this pull request Nov 6, 2025
Implements core health check logic for Kubernetes.

- Added functions `CheckHealth` and `GetProtocol` to `kubeDetails`
- Added a `HealthCheckManager` field to the Kubernetes `TLSServerConfig`
- Kube agent and proxy now instantiate a `HealthCheckManager`
- Added a `HealthCheckConfigReader` to interfaces `ReadKubernetesAccessPoint` and `ReadProxyAccessPoint`
- Added `HealthCheckConfig` read-only permissions for proxy and kube
- Added `HealthCheckConfig` watching for proxy and kube
- Added functions to `KubeServer` interface and `KubernetesServerV3` struct:
    - `GetTargetHealth() TargetHealth`
    - `SetTargetHealth(h TargetHealth)`
    - `GetTargetHealthStatus() TargetHealthStatus`
    - `SetTargetHealthStatus(status TargetHealthStatus)`

- Health check supports dynamic discovery of Kubernetes clusters. Calls to `startHealthCheck()` and `stopHealthCheck()` were rearranged.
    - Added functions `startHeartbeatAndHealthCheck()` and `stopHeartbeatAndHealthCheck()`
- Moved call to `HealthCheckManager.Start()` outside of the Kubernetes proxy server providing a future option to reuse `HealthCheckManager` in multiple proxy services
- Removed `Status` initialization in KubernetesServerV3 `CheckAndSetDefaults()`
- Added `kubernetesLabelMatchers` to `healthCheckConfig` struct. Default presets are omitted until the entire kube health check is complete.
- Added Kubernetes matcher checking to `ValidateHealthCheckConfig()`
- Changed `ValidateHealthCheckConfig()` to allow zero total DB matchers and Kubernetes matchers.
- Changed KubernetesServerV3 `GetTargetHealthStatus()` to return `TargetHealthStatusUnknown` instead of an empty string
- Changed health check worker `getTargetHealthTimeout` default timeout to `4s` from `10s`. This potentially reduces the response time of the initial heartbeat polling call to `GetServerInfo()`.

Part of #58413

Co-authored-by: rosstimothy <39066650+rosstimothy@users.noreply.github.com>
Co-authored-by: Edoardo Spadolini <edoardo.spadolini@goteleport.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

health-check Resource health check related kubernetes-access no-changelog Indicates that a PR does not require a changelog entry size/md

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants