Skip to content

Commit f509e16

Browse files
Update device-plugins.md
1 parent 38e94ee commit f509e16

File tree

1 file changed

+76
-51
lines changed

1 file changed

+76
-51
lines changed

docs/concepts/cluster-administration/device-plugins.md

+76-51
Original file line numberDiff line numberDiff line change
@@ -3,52 +3,65 @@ approvers:
33
title: Device Plugins
44
---
55

6-
* TOC
7-
{:toc}
6+
{% include feature-state-alpha.md %}
87

9-
__Disclaimer__: Device plugins are in alpha. Its contents may change rapidly.
10-
11-
Starting from 1.8 release, Kubernetes provides a [device plugin framework](https://github.com/kubernetes/community/blob/master/contributors/design-proposals/device-plugin.md)
12-
for vendors to advertise their resources to Kubelet without changing Kubernetes core code.
8+
{% capture overview %}
9+
Starting in version 1.8, Kubernetes provides a [device plugin framework](https://github.com/kubernetes/community/blob/master/contributors/design-proposals/device-plugin.md)
10+
for vendors to advertise their resources to the kubelet without changing Kubernetes core code.
1311
Instead of writing custom Kubernetes code, vendors can implement a device plugin that can
1412
be deployed as a DaemonSet or in bare metal mode. The targeted devices include GPUs,
1513
High-performance NICs, FPGAs, InfiniBand, and other similar computing resources
1614
that may require vendor specific initialization and setup.
15+
{% endcapture %}
16+
17+
{% capture body %}
18+
19+
## Device plugin registration
20+
21+
The device plugins feature is gated by the `DevicePlugins` feature gate and is disabled by default.
22+
When the device plugins feature is enabled, the kubelet exports a `Registration` gRPC service:
1723

18-
## Overview
19-
The 1.8 Kubernetes release supports device plugin as an alpha feature that
20-
is gated by DevicePlugins feature gate and is disabled by default.
21-
When the DevicePlugins feature is enabled, Kubelet will export a `Registration` gRPC service:
2224
```gRPC
2325
service Registration {
2426
rpc Register(RegisterRequest) returns (Empty) {}
2527
}
2628
```
27-
A device plugin can register itself to Kubelet through this gRPC service.
28-
During the registration, the device plugin needs to send
29-
* The name of their unix socket
30-
* The API version against which they were built
31-
* The `ResourceName` they want to advertise. Here `ResourceName` needs to follow
32-
the [extended resource naming scheme](https://github.com/kubernetes/kubernetes/pull/48922) as `vendor-domain/resource`.
33-
E.g., Nvidia GPUs are advertised as `nvidia.com/gpu`
34-
35-
Following a successful registration, the device plugin will send Kubelet the
36-
list of devices it manages, and Kubelet will be in charge of advertising those
37-
resources to the API server as part of the Kubelet node status update.
38-
E.g., after a device plugin registers `vendor-domain/foo` with Kubelet
39-
and reports two healthy devices on a node, the node status will be updated
29+
A device plugin can register itself with the kubelet through this gRPC service.
30+
During the registration, the device plugin needs to send:
31+
32+
* The name of its Unix socket.
33+
* The API version against which it was built.
34+
* The `ResourceName` it wants to advertise. Here `ResourceName` needs to follow the
35+
[extended resource naming scheme](https://github.com/kubernetes/kubernetes/pull/48922)
36+
as `vendor-domain/resource`.
37+
For example, an Nvidia GPU is advertised as `nvidia.com/gpu`.
38+
39+
Following a successful registration, the device plugin sends the kubelet the
40+
list of devices it manages, and the kubelet is then in charge of advertising those
41+
resources to the API server as part of the kubelet node status update.
42+
For example, after a device plugin registers `vendor-domain/foo` with the kubelet
43+
and reports two healthy devices on a node, the node status is updated
4044
to advertise 2 `vendor-domain/foo`.
41-
Devices can then be selected using the same process as for OIRs in the pod spec.
42-
Currently, extended resources are only spported as integer resources and expect
43-
to always have limits == requests in container resource Spec.
4445

45-
## Device Plugin Implementation
46+
Then, developers can request devices in a
47+
[Container](/docs/api-reference/{{page.version}}/#container-v1-core)
48+
specification by using the same process that is used for
49+
[opaque integer resources](/docs/tasks/configure-pod-container/opaque-integer-resource/).
50+
In version 1.8, extended resources are spported only as integer resources and must have
51+
`limit` equal to `request` in the Container specification.
52+
53+
## Device plugin implementation
4654

4755
The general workflow of a device plugin includes the following steps:
48-
* Initialization. During this phase, the device plugin performs vendor specific initialization and setup to make sure the devices are in ready state.
49-
* The plugin starts a gRPC service with a unix socket under host path: `/var/lib/kubelet/device-plugins/` that implements the following interfaces:
50-
```gRPC
51-
service DevicePlugin {
56+
57+
* Initialization. During this phase, the device plugin performs vendor specific
58+
initialization and setup to make sure the devices are in a ready state.
59+
60+
* The plugin starts a gRPC service, with a Unix socket under host path
61+
`/var/lib/kubelet/device-plugins/`, that implements the following interfaces:
62+
63+
```gRPC
64+
service DevicePlugin {
5265
// ListAndWatch returns a stream of List of Devices
5366
// Whenever a Device state change or a Device disapears, ListAndWatch
5467
// returns the new list
@@ -58,31 +71,43 @@ service DevicePlugin {
5871
// Plugin can run device specific operations and instruct Kubelet
5972
// of the steps to make the Device available in the container
6073
rpc Allocate(AllocateRequest) returns (AllocateResponse) {}
61-
}
62-
```
63-
* The plugin registers itself with Kubelet through unix socket at host path `/var/lib/kubelet/device-plugins/kubelet.sock`.
64-
* After successfully registering itself, the device plugin runs in serving mode during which it keeps
65-
monitoring device health and reports back to Kubelet upon any device state changes.
66-
It is also responsible for serving Allocate gRPC requests. During Allocate, the device plugin may
67-
perform device specific preparing operations (e.g., gpu cleanup, QRNG initialization, and etc.).
68-
If the operations succeed, the device plugin will return an AllocateResponse that contains container
69-
runtime configurations for accessing the allocated devices that Kubelet will pass to container runtime.
70-
71-
A device plugin is expected to detect Kubelet restarts and re-register itself with the new
72-
Kubelet instance. Currently, a new Kubelet instance will clean up all the existing unix sockets
74+
}
75+
```
76+
77+
* The plugin registers itself with the kubelet through the Unix socket at host
78+
path `/var/lib/kubelet/device-plugins/kubelet.sock`.
79+
80+
* After successfully registering itself, the device plugin runs in serving mode, during which it keeps
81+
monitoring device health and reports back to the kubelet upon any device state changes.
82+
It is also responsible for serving `Allocate` gRPC requests. During `Allocate`, the device plugin may
83+
do device-specific preparation; for example, GPU cleanup or QRNG initialization.
84+
If the operations succeed, the device plugin returns an `AllocateResponse` that contains container
85+
runtime configurations for accessing the allocated devices. The kubelet passes this information
86+
to the container runtime.
87+
88+
A device plugin is expected to detect kubelet restarts and re-register itself with the new
89+
kubelet instance. In version 1.8, a new kubelet instance cleans up all the existing Unix sockets
7390
under `/var/lib/kubelet/device-plugins` when it starts. A device plugin can monitor the deletion
74-
of its unix socket and re-register itself upon such event.
91+
of its Unix socket and re-register itself upon such an event.
7592

76-
## Device Plugin Deployment
93+
## Device plugin deployment
7794

7895
A device plugin can be deployed as a DaemonSet or in bare metal mode. Being deployed as a DaemonSet has
79-
the benefit that Kubernetes can restart the device plugins when they fail.
80-
Otherwise, extra mechanism is needed to recover from device plugin failures.
81-
The canonical directory `/var/lib/kubelet/device-plugins` requires priveledge access
82-
so device plugin needs to run in privileged security context. It also needs to be mounted
83-
as a volume in device plugin pod spec when running as a DaemonSet.
96+
the benefit that Kubernetes can restart the device plugin if it fails.
97+
Otherwise, an extra mechanism is needed to recover from device plugin failures.
98+
The canonical directory `/var/lib/kubelet/device-plugins` requires priveledge access,
99+
so a device plugin must run in a privileged security context.
100+
If a device plugin is running as a DaemonSet, `/var/lib/kubelet/device-plugins`
101+
must be mounted as a
102+
[Volume](/docs/api-reference/{{page.version}}/#volume-v1-core)
103+
in the plugin's
104+
[PodSpec](/docs/api-reference/{{paage.version}}/#podspec-v1-core).
84105

85106
## Examples
86107

87-
For an example device plugin implementation, please check
108+
For an example device plugin implementation, see
88109
[nvidia GPU device plugin for COS base OS](https://github.com/GoogleCloudPlatform/container-engine-accelerators/tree/master/nvidia_gpu).
110+
111+
{% endcapture %}
112+
113+
{% include templates/concept.md %}

0 commit comments

Comments
 (0)