-
Notifications
You must be signed in to change notification settings - Fork 1.7k
KEP-0000: Pod Network Health API #5754
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,199 @@ | ||
| # Pod Network Health API | ||
|
|
||
| ## Summary | ||
| Kubernetes currently lacks a native mechanism to represent basic | ||
| pod-to-pod network health such as reachability and latency. | ||
| This KEP proposes a Kubernetes-native API to express these signals | ||
| in a standardized and extensible way. | ||
|
|
||
| ## Motivation | ||
| Network issues are one of the most common causes of outages in Kubernetes. | ||
| Today, operators rely on ad-hoc scripts, CNI-specific tools, or | ||
| external observability systems to diagnose pod-to-pod connectivity issues. | ||
|
|
||
| A standardized API enables: | ||
| - Faster diagnosis of networking issues | ||
| - Vendor-neutral observability | ||
| - Better tooling integration | ||
|
|
||
| ## Goals | ||
| - Define a Kubernetes-native abstraction for pod network health | ||
| - Represent basic signals such as reachability and latency | ||
| - Remain CNI-agnostic and implementation-neutral | ||
| - Introduce the API as alpha behind a feature gate | ||
| - Avoid requiring complete or full-mesh pod-to-pod coverage | ||
|
|
||
| ## Non-Goals | ||
| - Deep packet inspection | ||
| - Mandatory probing behavior | ||
| - Automatic remediation | ||
| - Replacing service meshes or observability platforms | ||
|
|
||
| ## User Stories | ||
|
|
||
| ### Cluster Operator | ||
| As a cluster operator, I want to know whether two pods can communicate | ||
| so that I can debug outages faster. | ||
|
|
||
| ### Platform Engineer | ||
| As a platform engineer, I want a standard API to surface network health | ||
| signals that can be consumed by monitoring systems. | ||
|
|
||
| ## Proposal | ||
| Introduce an alpha Kubernetes API resource that represents observed | ||
| network health between a source pod and a target pod. | ||
|
|
||
| The API focuses on **representation**, not how data is collected. | ||
|
|
||
| ## API Design (High-Level) | ||
| The API may include: | ||
| - Source pod reference | ||
| - Target pod reference | ||
| - Reachability status | ||
| - Optional latency metrics | ||
| - Timestamp of last observation | ||
|
|
||
| Exact fields will be refined during review. | ||
|
|
||
| ## Implementation Details | ||
| - Introduced as alpha | ||
| - Feature gated | ||
| - No default probing required | ||
| - Implementations may be controller-based, node-based, or vendor-provided | ||
|
|
||
| ## Why Consider Standardizing This API in Kubernetes? | ||
|
|
||
| While pod-to-pod network health can be measured using an external | ||
| controller and CRD, the motivation for a core Kubernetes API is to | ||
| standardize semantics across clusters, CNIs, and tooling. | ||
|
|
||
| Existing tools (for example, kube-latency: | ||
| https://github.com/simonswine/kube-latency) demonstrate that connectivity | ||
| and latency can be measured externally. However, these tools define | ||
| their own schemas and reporting mechanisms, making it difficult for | ||
| operators and platforms to build portable integrations. | ||
|
|
||
| A core API would define a common contract and vocabulary, while allowing | ||
| actual data collection and implementation to remain pluggable and | ||
| external. As a validation step, this proposal is compatible with first | ||
| prototyping the design as an external controller + CRD before considering | ||
| promotion into core Kubernetes. | ||
|
Comment on lines
+76
to
+80
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. A core API not only define a contract and vocabulary but also the semantics, that are enforced with conformance test . This will require a default implementation so we are able to test this, on topics of pod to pod network health the main problem is that there is no consistency on dataplanes, there are kubernetes clusters that use the linux kernel as dataplane , but there are dpdk, openvswith and ebpf only dataplanes, ... if we want to standardize we need to ensure it covers all existing projects .... |
||
|
|
||
| ## Intersection with Existing Kubernetes APIs | ||
|
|
||
| A key question for this proposal is whether this functionality should | ||
| exist as an external CRD or as a core Kubernetes API. | ||
|
|
||
| This proposal argues for a core API based on how the resource naturally | ||
| intersects with existing Kubernetes objects and control plane behavior. | ||
|
|
||
| Specifically, PodNetworkHealth observations are tightly coupled to: | ||
|
|
||
| - **Pods** (as first-class core objects being referenced) | ||
| - **Nodes** (where observations may originate) | ||
| - **NetworkPolicy** debugging and validation workflows | ||
| - **kubectl describe / status** style operational diagnostics | ||
| - Potential future integration with **Events** and **Conditions** | ||
| - Standard tooling that today understands only core APIs | ||
|
|
||
| While a CRD can represent similar data, it remains opaque to: | ||
|
|
||
| - Native `kubectl` UX and discovery | ||
| - Generic controllers and ecosystem tooling | ||
| - Future integration with Pod Status, Events, or Conditions | ||
| - Consistent behavior across Kubernetes distributions and platforms | ||
|
|
||
| If standardized within Kubernetes, such an API could make it easier for | ||
| tools and operators to rely on a common vocabulary for reporting | ||
| network health observations without depending on tool-specific CRDs | ||
| or schemas. | ||
|
|
||
| However, this proposal does not assume that this must immediately be a | ||
| core API. An external CRD-based prototype is a valid and encouraged | ||
| first step to validate usefulness, scalability, and semantics before | ||
| considering any form of standardization within Kubernetes. | ||
|
|
||
|
|
||
| ## Scalability and Semantics Considerations | ||
|
|
||
| Any standardized Kubernetes API would imply defined semantics and eventual conformance. | ||
| This proposal intentionally limits scope to avoid implying universal | ||
| guarantees or required implementations. | ||
|
|
||
| In particular: | ||
| - The API does not require full pod-to-pod coverage | ||
| - It does not mandate probing or measurement strategies | ||
| - It does not imply a complete or global view of cluster network health | ||
|
|
||
| Full mesh measurement of pod-to-pod health is infeasible at scale | ||
| (O(N²)) and would generate unacceptable dataplane load in large | ||
| clusters. As such, any data exposed via this API is expected to be: | ||
| - sampled, targeted, or workload-specific | ||
| - partial and best-effort | ||
| - explicitly bounded in scope | ||
|
|
||
| The API represents *reported observations*, not guarantees of reachability | ||
| or latency across the entire cluster. | ||
|
|
||
| The design must remain compatible with diverse dataplanes, including | ||
| kernel-based, DPDK, Open vSwitch, and eBPF-only implementations, without | ||
| assuming common probing or traffic interception mechanisms. | ||
|
|
||
| ```yaml | ||
| apiVersion: networking.k8s.io/v1alpha1 | ||
| kind: PodNetworkHealth | ||
| metadata: | ||
| name: pod-a-to-pod-b | ||
| spec: | ||
| sourcePodRef: | ||
| namespace: default | ||
| name: pod-a | ||
| targetPodRef: | ||
| namespace: default | ||
| name: pod-b | ||
| status: | ||
| reachable: true | ||
| latencyMillis: 3 | ||
| lastObservedTime: "2026-01-22T10:15:30Z" | ||
| observedFromNode: worker-node-1 | ||
| ``` | ||
|
|
||
| --- | ||
|
|
||
| This example illustrates how the resource relates to existing core objects: | ||
|
|
||
| - `core/v1 Pod` | ||
| - `core/v1 Node` | ||
| - NetworkPolicy debugging workflows | ||
| - Potential future integration with Pod Conditions or Events | ||
| - Native `kubectl describe` diagnostics | ||
|
|
||
|
|
||
| ## Alternatives Considered | ||
| - CNI-specific tooling (not portable) | ||
| - External observability systems (not Kubernetes-native) | ||
| - CLI-only debugging tools (not programmatic) | ||
|
|
||
| ## Risks and Mitigations | ||
| **API stability risk** | ||
| Mitigated by alpha status and feature gating. | ||
|
|
||
| **Performance impact** | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This has a very important impact on scalability, think that kubernetes network model defines that each pod should be able to talk with any other pod in the cluster, so if we want to have the latency of all pods we need to consider a full mesh, that is a O(N^2) problem , so in a cluster with just 5000 pods we require 24,995,000 measurements. If we go down into the traffic generated , things are getting worse as we are going to flood the dataplane |
||
| Mitigated by avoiding mandatory probing. | ||
|
|
||
| ## Graduation Criteria | ||
|
|
||
| ### Alpha | ||
| - API introduced behind feature gate | ||
| - Experimental usage | ||
|
|
||
| ### Beta | ||
| - Feedback from users | ||
| - Stable semantics | ||
|
|
||
| ### GA | ||
| - Production usage | ||
| - Documented best practices | ||
|
|
||
| ## References | ||
| - SIG-Network discussions (TBD) | ||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,9 @@ | ||
| title: Pod Network Health API | ||
| kep-number: 0000 | ||
| authors: | ||
| - Sahichowdary | ||
| owning-sig: sig-network | ||
| participating-sigs: | ||
| - sig-node | ||
| status: provisional | ||
| creation-date: 2025-01-24 |
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be good to describe in more detail why this makes sense to be built into a core Kubernetes functionality vs as a standalone project. From what is described, it sounds like this could be implemented as a controller + CRD outside of K8s? It might even be helpful to to a prototype and link it in the proposal.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here is an example just using web search: https://github.com/simonswine/kube-latency