|
1 | 1 | # XeLink sidecar for Intel XPU Manager |
2 | 2 |
|
3 | | -Table of Contents |
4 | | - |
5 | | -* [Introduction](#introduction) |
6 | | -* [Modes and Configuration Options](#modes-and-configuration-options) |
7 | | -* [Installation](#installation) |
8 | | - * [Install XPU Manager with the Sidecar](#install-xpu-manager-with-the-sidecar) |
9 | | - * [Install Sidecar to an Existing XPU Manager](#install-sidecar-to-an-existing-xpu-manager) |
10 | | -* [Verify Sidecar Functionality](#verify-sidecar-functionality) |
11 | | -* [Use HTTPS with XPU Manager](#use-https-with-xpu-manager) |
12 | | - |
13 | | -## Introduction |
14 | | - |
15 | | -Intel GPUs can be interconnected via an XeLink. In some workloads it is beneficial to use GPUs that are XeLinked together for optimal performance. XeLink information is provided by [Intel XPU Manager](https://www.github.com/intel/xpumanager) via its metrics API. Xelink sidecar retrieves the information from XPU Manager and stores it on the node under ```/etc/kubernetes/node-feature-discovery/features.d/``` as a feature label file. [NFD](https://github.com/kubernetes-sigs/node-feature-discovery) reads this file and converts it to Kubernetes node labels. These labels are then used by [GAS](https://github.com/intel/platform-aware-scheduling/tree/master/gpu-aware-scheduling) to make [scheduling decisions](https://github.com/intel/platform-aware-scheduling/blob/master/gpu-aware-scheduling/docs/usage.md#multi-gpu-allocation-with-xe-link-connections) for Pods. |
16 | | - |
17 | | -## Modes and Configuration Options |
18 | | - |
19 | | -| Flag | Argument | Default | Meaning | |
20 | | -|:---- |:-------- |:------- |:------- | |
21 | | -| -lane-count | int | 4 | Minimum lane count for an XeLink interconnect to be accepted | |
22 | | -| -interval | int | 10 | Interval for XeLink topology fetching and label writing (seconds, >= 1) | |
23 | | -| -startup-delay | int | 10 | Startup delay before the first topology fetching (seconds, >= 0) | |
24 | | -| -label-namespace | string | gpu.intel.com | Namespace or prefix for the labels. i.e. **gpu.intel.com**/xe-links | |
25 | | -| -allow-subdeviceless-links | bool | false | Include xelinks also for devices that do not have subdevices | |
26 | | -| -cert | string | "" | Use HTTPS and verify server's endpoint | |
27 | | - |
28 | | -The sidecar also accepts a number of other arguments. Please use the -h option to see the complete list of options. |
29 | | - |
30 | | -## Installation |
31 | | - |
32 | | -The following sections detail how to obtain, deploy and test the XPU Manager XeLink sidecar. |
33 | | - |
34 | | -### Pre-built Images |
35 | | - |
36 | | -[Pre-built images](https://hub.docker.com/r/intel/intel-xpumanager-sidecar) |
37 | | -of this component are available on the Docker hub. These images are automatically built and uploaded |
38 | | -to the hub from the latest main branch of this repository. |
39 | | - |
40 | | -Release tagged images of the components are also available on the Docker hub, tagged with their |
41 | | -release version numbers in the format `x.y.z`, corresponding to the branches and releases in this |
42 | | -repository. |
43 | | - |
44 | | -Note: Replace `<RELEASE_VERSION>` with the desired [release tag](https://github.com/intel/intel-device-plugins-for-kubernetes/tags) or `main` to get `devel` images. |
45 | | - |
46 | | -See [the development guide](../../DEVEL.md) for details if you want to deploy a customized version of the plugin. |
47 | | - |
48 | | -#### Install XPU Manager with the Sidecar |
49 | | - |
50 | | -Install XPU Manager daemonset with the XeLink sidecar |
51 | | - |
52 | | -```bash |
53 | | -$ kubectl apply -k 'https://github.com/intel/intel-device-plugins-for-kubernetes/deployments/xpumanager_sidecar/overlays/http?ref=<RELEASE_VERSION>' |
54 | | -``` |
55 | | - |
56 | | -Please see XPU Manager Kubernetes files for additional info on [installation](https://github.com/intel/xpumanager/tree/master/deployment/kubernetes). |
57 | | - |
58 | | -#### Install Sidecar to an Existing XPU Manager |
59 | | - |
60 | | -Use patch to add sidecar into the XPU Manager daemonset. |
61 | | - |
62 | | -```bash |
63 | | -$ kubectl patch daemonsets.apps intel-xpumanager --patch-file 'https://github.com/intel/intel-device-plugins-for-kubernetes/deployments/xpumanager_sidecar/overlays/http/xpumanager.yaml?ref=<RELEASE_VERSION>' |
64 | | -``` |
65 | | - |
66 | | -NOTE: The sidecar patch will remove other resources from the XPU Manager container. If your XPU Manager daemonset is using, for example, the smarter device manager resources, those will be removed. |
67 | | - |
68 | | -### Verify Sidecar Functionality |
69 | | - |
70 | | -You can verify the sidecar's functionality by checking node's xe-links labels: |
71 | | - |
72 | | -```bash |
73 | | -$ kubectl get nodes -A -o=jsonpath="{range .items[*]}{.metadata.name},{.metadata.labels.gpu\.intel\.com\/xe-links}{'\n'}{end}" |
74 | | -master,0.0-1.0_0.1-1.1 |
75 | | -``` |
76 | | - |
77 | | -### Use HTTPS with XPU Manager |
78 | | - |
79 | | -There is an alternative deployment that uses HTTPS instead of HTTP. The reference deployment requires `cert-manager` to provide a certificate for TLS. To deploy: |
80 | | - |
81 | | -```bash |
82 | | -$ kubectl apply -k 'https://github.com/intel/intel-device-plugins-for-kubernetes/deployments/xpumanager_sidecar/overlays/cert-manager?ref=<RELEASE_VERSION>' |
83 | | -``` |
84 | | - |
85 | | -The deployment requests a certificate and key from `cert-manager`. They are then provided to the gunicorn container as secrets and are used in the HTTPS interface. The sidecar container uses the same certificate to verify the server. |
86 | | - |
87 | | -> *NOTE*: The HTTPS deployment uses self-signed certificates. For production use, the certificates should be properly set up. |
88 | | -
|
89 | | -<details> |
90 | | -<summary>Enabling HTTPS manually</summary> |
91 | | - |
92 | | -If one doesn't want to use `cert-manager`, the same can be achieved manually by creating certificates with openssl and then adding it to the deployment. The steps are roughly: |
93 | | -1) Create a certificate with [openssl](https://www.linode.com/docs/guides/create-a-self-signed-tls-certificate/) |
94 | | -1) Create a secret from the [certificate & key](https://kubernetes.io/docs/reference/kubectl/generated/kubectl_create/kubectl_create_secret_tls/). |
95 | | -1) Change the deployment: |
96 | | - |
97 | | -* Add certificate and key to gunicorn container: |
98 | | -``` |
99 | | - - command: |
100 | | - - gunicorn |
101 | | -... |
102 | | - - --certfile=/certs/tls.crt |
103 | | - - --keyfile=/certs/tls.key |
104 | | -... |
105 | | - - xpum_rest_main:main() |
106 | | -``` |
107 | | - |
108 | | -* Add secret mounting to the Pod: |
109 | | -``` |
110 | | - containers: |
111 | | - - name: python-exporter |
112 | | - volumeMounts: |
113 | | - - mountPath: /certs |
114 | | - name: certs |
115 | | - readOnly: true |
116 | | - volumes: |
117 | | - - name: certs |
118 | | - secret: |
119 | | - defaultMode: 420 |
120 | | - secretName: xpum-server-cert |
121 | | - ``` |
122 | | - |
123 | | -* Add use-https and cert to sidecar |
124 | | -``` |
125 | | - name: xelink-sidecar |
126 | | - volumeMounts: |
127 | | - - mountPath: /certs |
128 | | - name: certs |
129 | | - readOnly: true |
130 | | - args: |
131 | | -... |
132 | | - - --cert=/certs/tls.crt |
133 | | -... |
134 | | -``` |
135 | | - |
136 | | -</details> |
| 3 | +Use of XeLink sidecar is deprecated as GAS has been deprecated. The sources are left for future use. |
0 commit comments