kubernetes · k8s-ci-robot · Jun 23, 2022 · Jun 22, 2022 · Jun 23, 2022 · Jun 23, 2022
diff --git a/keps/sig-node/3063-dynamic-resource-allocation/README.md b/keps/sig-node/3063-dynamic-resource-allocation/README.md
@@ -84,6 +84,7 @@ SIG Architecture for cross-cutting KEPs).
   - [Risks and Mitigations](#risks-and-mitigations)
     - [Feature not used](#feature-not-used)
     - [Compromised node](#compromised-node)
+    - [Compromised resource driver plugin](#compromised-resource-driver-plugin)
     - [User permissions and quotas](#user-permissions-and-quotas)
     - [Usability](#usability)
 - [Design Details](#design-details)
@@ -250,10 +251,23 @@ limitations of the current approach for the following use cases:
   containers should be able to use other free resources on the same
   device.
 
-  *Limitation*: Current implementation of the device plugin doesn’t
-  allow one to allocate part of the device because parameters are too limited
-  and Kubernetes doesn't have enough information about the extended
-  resources on a node to decide whether they can be shared.
+  *Limitation*: For example, newer generations of NVIDIA GPUs have a mode of
+  operation called MIG, that allow them to be sub-divided into a set of
+  mini-GPUs (called MIG devices) with varying amounts of memory and compute
+  resources provided by each. From a hardware-standpoint, configuring a GPU
+  into a set of MIG devices is highly-dynamic and creating a MIG device
+  tailored to the resource needs of a particular application is well
+  supported. However, with the current device plugin API, the only way to make
+  use of this feature is to pre-partition a GPU into a set of MIG devices and
+  advertise them to the kubelet in the same way a full / static GPU is
+  advertised. The user must then pick from this set of pre-partitioned MIG
+  devices instead of having one created for them on the fly based on their
+  particular resource constraints. Without the ability to create MIG devices
+  dynamically (i.e. at the time they are requested) the set of pre-defined MIG
+  devices must be carefully tuned to ensure that GPU resources do not go unused
+  because some of the pre-partioned devices are in low-demand.  It also puts
+  the burden on the user to pick a particular MIG device type, rather than
+  declaring the resource constraints more abstractly.
 
 - *Optional allocation*: When deploying a workload I’d like to specify
   soft(optional) device requirements. If a device exists and it’s
@@ -563,6 +577,31 @@ driver vendor. Solutions like Akri which establish their own control plane and
 then communicate with Kubernetes through the device plugin API already need to
 address this.
 
+#### Compromised resource driver plugin
+
+This is the result of an attack against the resource driver, either from a
+container which uses a resource exposed by the driver, a compromised kubelet
+which interacts with the plugin, or through a successful attack against the
+node which led to root access.
+
+The resource driver plugin only needs read access to objects described in this
+KEP, so compromising it does not interfere with dynamic resource allocation for
+other drivers. It may need write access for [CRDs that communicate or
+coordinate resource
+availability](#implementing-a-plugin-for-node-resources). This could be used to
+attack scheduling involving the driver as outlined in the previous section.
+
+A resource driver may need root access on the node to manage
+hardware. Attacking the driver therefore may lead to root privilege
+escalation. Ideally, driver authors should try to avoid depending on root
+permissions and instead use capabilities or special permissions for the kernel
+APIs that they depend on.
+
+A resource driver may also need privileged access to remote services to manage
+network-attached devices. Resource driver vendors and cluster administrators
+have to consider what the effect of a compromise could be for that and how such
+privileges could get revoked.
+
 #### User permissions and quotas
 
 Similar to generic ephemeral inline volumes, the [ephemeral resource use

diff --git a/keps/sig-node/3063-dynamic-resource-allocation/kep.yaml b/keps/sig-node/3063-dynamic-resource-allocation/kep.yaml
@@ -5,7 +5,8 @@ authors:
 owning-sig: sig-node
 participating-sigs:
   - sig-scheduling
-status: provisional
+  - sig-autoscaling
+status: implementable
 creation-date: 2021-05-17
 reviewers:
   - "@ahg-g"
@@ -28,8 +29,8 @@ latest-milestone: "v1.25"
 # The milestone at which this feature was, or is targeted to be, at each stage.
 milestone:
   alpha: "v1.25"
-  beta: "v1.27"
-  stable: "v1.29"
+  beta: "v1.28"
+  stable: "v1.30"
 
 feature-gates:
   - name: DynamicResourceAllocation