diff --git a/keps/prod-readiness/sig-scheduling/4815.yaml b/keps/prod-readiness/sig-scheduling/4815.yaml index 63186f7cb325..f55de6c9c30f 100644 --- a/keps/prod-readiness/sig-scheduling/4815.yaml +++ b/keps/prod-readiness/sig-scheduling/4815.yaml @@ -1,3 +1,5 @@ kep-number: 4815 alpha: approver: "@johnbelamaric" +beta: + approver: "@johnbelamaric" diff --git a/keps/sig-scheduling/4815-dra-partitionable-devices/README.md b/keps/sig-scheduling/4815-dra-partitionable-devices/README.md index b4a0367e6f31..00f4f95aad18 100644 --- a/keps/sig-scheduling/4815-dra-partitionable-devices/README.md +++ b/keps/sig-scheduling/4815-dra-partitionable-devices/README.md @@ -1260,6 +1260,11 @@ extending the production code to implement this enhancement. - ``: `` - `` --> + + Start of v1.32 development cycle (v1.32.0-alpha.1-178-gd9c46d8ecb1): - `k8s.io/dynamic-resource-allocation/cel`: 88.8% @@ -1267,10 +1272,21 @@ Start of v1.32 development cycle (v1.32.0-alpha.1-178-gd9c46d8ecb1): - `k8s.io/kubernetes/pkg/controller/resourceclaim`: 70.0% - `k8s.io/kubernetes/pkg/scheduler/framework/plugins/dynamicresources`: 72.9% -We also plan to add unit tests to verify that the theoretical maximum size -of the ResourceSlice resource remains within the size limitations of etcd. As -the resource has become more complex with additional fields, it has become -harder to do simple back-of-the-envelope calculations. +We have integration tests that validates the theoretical maximum size of the +ResourceSlice resource to make sure it remains within the size limitations +of etcd. + +Start of v1.36 development cycle (01/23/2026): +- `k8s.io/dynamic-resource-allocation/cel`: 85.2% +- `k8s.io/dynamic-resource-allocation/structured`: 33.3% +- `k8s.io/dynamic-resource-allocation/structured/internal/experimental`: 93.1% +- `k8s.io/dynamic-resource-allocation/structured/internal/incubating`: 92.2% +- `k8s.io/dynamic-resource-allocation/structured/internal/stable`: 67.7% +- `k8s.io/kubernetes/pkg/controller/resourceclaim`: 74.6% +- `k8s.io/kubernetes/pkg/kubelet/cm/dra`: 83.3% +- `k8s.io/kubernetes/pkg/kubelet/cm/dra/plugin`: 83.5% +- `k8s.io/kubernetes/pkg/kubelet/cm/dra/state`: 44.2% +- `k8s.io/kubernetes/pkg/scheduler/framework/plugins/dynamicresources`: 80.0% ##### Integration tests @@ -1282,14 +1298,10 @@ For Beta and GA, add links to added tests together with links to k8s-triage for https://storage.googleapis.com/k8s-triage/index.html --> -The existing [integration tests for kube-scheduler which measure -performance](https://github.com/kubernetes/kubernetes/tree/master/test/integration/scheduler_perf#readme) -will be extended to cover the overhead of running the additional logic to -support the features in this KEP. These also serve as [correctness -tests](https://github.com/kubernetes/kubernetes/commit/cecebe8ea2feee856bc7a62f4c16711ee8a5f5d9) -as part of the normal Kubernetes "integration" jobs which cover [the dynamic -resource -controller](https://github.com/kubernetes/kubernetes/blob/294bde0079a0d56099cf8b8cf558e3ae7230de12/test/integration/scheduler_perf/util.go#L135-L139). +Integration tests to verify performance have been added +[here](https://github.com/kubernetes/kubernetes/tree/master/test/integration/scheduler_perf/dra/partitionabledevices). +These tests also serve as correctness tests, but additional integration tests will +be added to improve coverage. ##### e2e tests @@ -1303,12 +1315,11 @@ https://storage.googleapis.com/k8s-triage/index.html We expect no non-infra related flakes in the last month as a GA graduation criteria. --> -End-to-end testing depends on a working resource driver and a container runtime -with CDI support. A [test -driver](https://github.com/kubernetes/kubernetes/tree/master/test/e2e/dra/test-driver) -was developed as part of the overall DRA development effort. We are extending -this test driver to enable support for `PartitionableDevice`s and adding tests to -ensure they are handled by the scheduler as described in this KEP. +E2e tests have been added for the Partitionable Devices feature: + +- source code: https://github.com/kubernetes/kubernetes/blob/b2ac9e206fdd912f35f2ab5b3c5b5243303ba14b/test/e2e/dra/dra.go#L1789-L1867 +- job: https://testgrid.k8s.io/sig-node-dynamic-resource-allocation#ci-kind-dra-all&include-filter-by-regex=DRAPartitionableDevices +- triage: https://storage.googleapis.com/k8s-triage/index.html?test=DRAPartitionableDevices ### Graduation Criteria @@ -1394,7 +1405,7 @@ of the Partitionable Devices API, allocation of devices will fail as described i The scheduler may lose track of what devices it has allocated to what pods. Any pods that had previously allocated devices with the feature enabled will need -to be deleted to ensure they are freed back to their corresponding driver and +to be deleted to ensure they are freed back to their corresponding driver and the accounting for them is updated in the scheduler. ###### Are there any tests for feature enablement/disablement? @@ -1404,16 +1415,12 @@ Kubernetes components themselves. They are written by 3rd-party drivers. However, the scheduler does consume these objects and track information from them in order to make scheduling decisions. -Unit tests in will be written in the scheduler to verify that enabling / +Unit tests exist in the scheduler that verify that enabling / disabling of the DRAPartitionableDevices feature gate is non-disruptive to the scheduler. ### Rollout, Upgrade and Rollback Planning - - ###### How can a rollout or rollback fail? Can it impact already running workloads? We are making backwards-incompatible changes to the Partitionable Devices feature @@ -1437,132 +1444,94 @@ with counter sets defined in separate `ResourceSlices`. Drivers and `ResourceSlices` using the Partitionable Devices feature should be removed from the cluster before upgrade/downgrade between 1.34 and 1.35. +For upgrade to 1.36 where the Partitionable Devices feature has been promoted +to beta, workloads will not be impacted unless a driver is running that publishes +ResourceSlices that uses the Partitionable Devices feature. If a driver starts +publishing ResourceSlices that uses the feature before it has been completely +rolled out, it can cause failure to schedule pods or a failure to run the pods +on the nodes. Drivers should only use the feature once it has been fully +rolled out in the cluster. This will not affect running workloads unless they +have to be restarted. + ###### What specific metrics should inform a rollback? - -Will be considered for beta. +One indicator are unexpected restarts of the cluster control plane +components (kube-scheduler, apiserver, kube-controller-manager). + +If the scheduler_pending_pods metric in the kube-scheduler suddenly increases or +remains constant, it can suggest that pods are no longer getting scheduled which +might be due to a problem with the DRA scheduler plugin. Another measure is an increase +in the number of pods that fail to start, as indicated by the +kubelet_started_containers_errors_total metric. + +In all cases further analysis of logs and pod events is needed to determine +whether errors are related to this feature. ###### Were upgrade and rollback tested? Was the upgrade->downgrade->upgrade path tested? - -Will be considered for beta. +This will be done manually before transition to beta by bringing up a KinD cluster with kubeadm +and changing the feature gate for individual components. + +Roundtripping of API types is covered by unit tests. ###### Is the rollout accompanied by any deprecations and/or removals of features, APIs, fields of API types, flags, etc.? - -Will be considered for beta. +No ### Monitoring Requirements - -Will be considered for beta. +There will be `ResourceSlices` in the cluster with either: +* the `spec.SharedCounters` field set, meaning that counter sets are defined in at least one of + the resource pools +* the `spec.perDeviceNodeSelection` field is set to `true`, meaning that devices within a single + `ResourceSlice` might have different node selectors. -###### How can an operator determine if the feature is in use by workloads? +This would mean that there are drivers running that are using the Partitionable Devices feature and +that devices are being advertised that rely on the feature. - -Will be considered for beta. +The feature is in use by workloads if any of those devices have been allocated to `ResourceClaims`. ###### How can someone using this feature know that it is working for their instance? - -Will be considered for beta. +- [x] API .status + - Other field: `.status.allocation.devices.results.device` for a ResourceClaim references a device + from a resource pool that has `ResourceSlices` using the Partitionable Devices feature. ###### What are the reasonable SLOs (Service Level Objectives) for the enhancement? - -Will be considered for beta. +As for normal pod scheduling of pods using ResourceClaims, there is no SLO for scheduling with +partitionable devices. + +Since using the feature means more work is needed to determine if a device can be allocated, +we expect pod scheduling to be slower when this feature is used. Also, this feature is likely +to result in a higher number of devices listed in a resource pool, which also is likely to mean +the allocator needs to do more work to select devices. ###### What are the SLIs (Service Level Indicators) an operator can use to determine the health of the service? - -Will be considered for beta. +These are the same as for the main DRA feature: + +- [x] Metrics + - Metric name: resourceclaim_controller_create_total + - Metric name: resourceclaim_controller_create_failures_total + - Metric name: resourceclaim_controller_resource_claims + - Metric name: resourceclaim_controller_allocated_resource_claims + - Metric name: workqueue with name="resource_claim" + - Metric name: scheduler_pending_pods ###### Are there any missing metrics that would be useful to have to improve observability of this feature? - -Will be considered for beta. +No ### Dependencies - - ###### Does this feature depend on any specific services running in the cluster? - -Will be considered for beta. +This feature depends on the DRA structured parameters feature being enabled, and on DRA drivers being deployed. +There are no requirements beyond those already needed for DRA structured parameters. Core DRA is locked to on in +1.36, but it can still be disabled through emulation. ### Scalability @@ -1610,45 +1579,28 @@ No. ### Troubleshooting - +The troubleshooting section in https://github.com/kubernetes/enhancements/tree/master/keps/sig-node/4381-dra-structured-parameters#troubleshooting +still applies. The only additional failure modes comes from version skew +in the cluster and the troubleshooting steps provided through the link above +should be sufficient to determine the cause. ###### How does this feature react if the API server and/or etcd is unavailable? -Will be considered for beta. +See https://github.com/kubernetes/enhancements/tree/master/keps/sig-node/4381-dra-structured-parameters#how-does-this-feature-react-if-the-api-server-andor-etcd-is-unavailable. ###### What are other known failure modes? - -Will be considered for beta. +See https://github.com/kubernetes/enhancements/tree/master/keps/sig-node/4381-dra-structured-parameters#what-are-other-known-failure-modes. ###### What steps should be taken if SLOs are not being met to determine the problem? -Will be considered for beta. +N/A since this feature does not come with an SLO. ## Implementation History - Kubernetes 1.32: KEP accepted as "implementable". - Kubernetes 1.33: Implemented as an alpha feature. +- Kubernetes 1.36: Partitionable Devices graduates to beta. ## Drawbacks diff --git a/keps/sig-scheduling/4815-dra-partitionable-devices/kep.yaml b/keps/sig-scheduling/4815-dra-partitionable-devices/kep.yaml index 1f9772d6bb09..ec7987e6d66e 100644 --- a/keps/sig-scheduling/4815-dra-partitionable-devices/kep.yaml +++ b/keps/sig-scheduling/4815-dra-partitionable-devices/kep.yaml @@ -12,27 +12,28 @@ creation-date: 2024-09-25 reviewers: - "@pohly" - "@johnbelamaric" - - "@thockin" + - "@liggitt" approvers: - "@mrunalp" # SIG-Node - - "@alculquicondor" # SIG-Scheduling - - "@MaciekPytel" # SIG-Autoscaling - - "@thockin" # API Review + - "@dom4ha" # SIG-Scheduling + - "@jackfrancis" # SIG-Autoscaling + - "@liggitt" # API Review see-also: - "/keps/sig-node/4381-dra-structured-parameters" # The target maturity stage in the current dev cycle for this KEP. -stage: alpha +stage: beta # The most recent milestone for which work toward delivery of this KEP has been # done. This can be the current (upcoming) milestone, if it is being actively # worked on. -latest-milestone: "v1.35" +latest-milestone: "v1.36" # The milestone at which this feature was, or is targeted to be, at each stage. milestone: alpha: "v1.33" + beta: "v1.36" # The following PRR answers are required at alpha release # List the feature gate name and the components for which it must be enabled