Skip to content

Commit b457cdd

Browse files
committed
drop xpu-manager sidecar
As GAS was deprecated, xpu-manager sidecar also becomes not needed. Its functionality was to provide info to GAS for scheduling. Signed-off-by: Tuomas Katila <[email protected]>
1 parent 83f342d commit b457cdd

File tree

4 files changed

+1
-142
lines changed

4 files changed

+1
-142
lines changed

.github/workflows/lib-build.yaml

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -26,7 +26,6 @@ jobs:
2626
- intel-dsa-plugin
2727
- intel-iaa-plugin
2828
- intel-idxd-config-initcontainer
29-
- intel-xpumanager-sidecar
3029

3130
# # Demo images
3231
- crypto-perf

.github/workflows/lib-publish.yaml

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -56,7 +56,6 @@ jobs:
5656
- intel-dsa-plugin
5757
- intel-iaa-plugin
5858
- intel-idxd-config-initcontainer
59-
- intel-xpumanager-sidecar
6059
steps:
6160
- uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4
6261
- uses: actions/setup-go@d35c59abb061a4a6fb18e82ac0862c26744d6ab5 # v5

README.md

Lines changed: 0 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -196,12 +196,6 @@ The [Device plugins operator README](cmd/operator/README.md) gives the installat
196196

197197
The [Device plugins Operator for OpenShift](https://github.com/intel/intel-technology-enabling-for-openshift) gives the installation and usage details for the operator available on [Red Hat OpenShift Container Platform](https://catalog.redhat.com/software/operators/detail/61e9f2d7b9cdd99018fc5736).
198198

199-
## XeLink XPU Manager Sidecar
200-
201-
To support interconnected GPUs in Kubernetes, XeLink sidecar is needed.
202-
203-
The [XeLink XPU Manager sidecar README](cmd/xpumanager_sidecar/README.md) gives information how the sidecar functions and how to use it.
204-
205199
## Intel GPU Level-Zero sidecar
206200

207201
Sidecar uses Level-Zero API to provide additional GPU information for the GPU plugin that it cannot get through sysfs interfaces.

cmd/xpumanager_sidecar/README.md

Lines changed: 1 addition & 134 deletions
Original file line numberDiff line numberDiff line change
@@ -1,136 +1,3 @@
11
# XeLink sidecar for Intel XPU Manager
22

3-
Table of Contents
4-
5-
* [Introduction](#introduction)
6-
* [Modes and Configuration Options](#modes-and-configuration-options)
7-
* [Installation](#installation)
8-
* [Install XPU Manager with the Sidecar](#install-xpu-manager-with-the-sidecar)
9-
* [Install Sidecar to an Existing XPU Manager](#install-sidecar-to-an-existing-xpu-manager)
10-
* [Verify Sidecar Functionality](#verify-sidecar-functionality)
11-
* [Use HTTPS with XPU Manager](#use-https-with-xpu-manager)
12-
13-
## Introduction
14-
15-
Intel GPUs can be interconnected via an XeLink. In some workloads it is beneficial to use GPUs that are XeLinked together for optimal performance. XeLink information is provided by [Intel XPU Manager](https://www.github.com/intel/xpumanager) via its metrics API. Xelink sidecar retrieves the information from XPU Manager and stores it on the node under ```/etc/kubernetes/node-feature-discovery/features.d/``` as a feature label file. [NFD](https://github.com/kubernetes-sigs/node-feature-discovery) reads this file and converts it to Kubernetes node labels. These labels are then used by [GAS](https://github.com/intel/platform-aware-scheduling/tree/master/gpu-aware-scheduling) to make [scheduling decisions](https://github.com/intel/platform-aware-scheduling/blob/master/gpu-aware-scheduling/docs/usage.md#multi-gpu-allocation-with-xe-link-connections) for Pods.
16-
17-
## Modes and Configuration Options
18-
19-
| Flag | Argument | Default | Meaning |
20-
|:---- |:-------- |:------- |:------- |
21-
| -lane-count | int | 4 | Minimum lane count for an XeLink interconnect to be accepted |
22-
| -interval | int | 10 | Interval for XeLink topology fetching and label writing (seconds, >= 1) |
23-
| -startup-delay | int | 10 | Startup delay before the first topology fetching (seconds, >= 0) |
24-
| -label-namespace | string | gpu.intel.com | Namespace or prefix for the labels. i.e. **gpu.intel.com**/xe-links |
25-
| -allow-subdeviceless-links | bool | false | Include xelinks also for devices that do not have subdevices |
26-
| -cert | string | "" | Use HTTPS and verify server's endpoint |
27-
28-
The sidecar also accepts a number of other arguments. Please use the -h option to see the complete list of options.
29-
30-
## Installation
31-
32-
The following sections detail how to obtain, deploy and test the XPU Manager XeLink sidecar.
33-
34-
### Pre-built Images
35-
36-
[Pre-built images](https://hub.docker.com/r/intel/intel-xpumanager-sidecar)
37-
of this component are available on the Docker hub. These images are automatically built and uploaded
38-
to the hub from the latest main branch of this repository.
39-
40-
Release tagged images of the components are also available on the Docker hub, tagged with their
41-
release version numbers in the format `x.y.z`, corresponding to the branches and releases in this
42-
repository.
43-
44-
Note: Replace `<RELEASE_VERSION>` with the desired [release tag](https://github.com/intel/intel-device-plugins-for-kubernetes/tags) or `main` to get `devel` images.
45-
46-
See [the development guide](../../DEVEL.md) for details if you want to deploy a customized version of the plugin.
47-
48-
#### Install XPU Manager with the Sidecar
49-
50-
Install XPU Manager daemonset with the XeLink sidecar
51-
52-
```bash
53-
$ kubectl apply -k 'https://github.com/intel/intel-device-plugins-for-kubernetes/deployments/xpumanager_sidecar/overlays/http?ref=<RELEASE_VERSION>'
54-
```
55-
56-
Please see XPU Manager Kubernetes files for additional info on [installation](https://github.com/intel/xpumanager/tree/master/deployment/kubernetes).
57-
58-
#### Install Sidecar to an Existing XPU Manager
59-
60-
Use patch to add sidecar into the XPU Manager daemonset.
61-
62-
```bash
63-
$ kubectl patch daemonsets.apps intel-xpumanager --patch-file 'https://github.com/intel/intel-device-plugins-for-kubernetes/deployments/xpumanager_sidecar/overlays/http/xpumanager.yaml?ref=<RELEASE_VERSION>'
64-
```
65-
66-
NOTE: The sidecar patch will remove other resources from the XPU Manager container. If your XPU Manager daemonset is using, for example, the smarter device manager resources, those will be removed.
67-
68-
### Verify Sidecar Functionality
69-
70-
You can verify the sidecar's functionality by checking node's xe-links labels:
71-
72-
```bash
73-
$ kubectl get nodes -A -o=jsonpath="{range .items[*]}{.metadata.name},{.metadata.labels.gpu\.intel\.com\/xe-links}{'\n'}{end}"
74-
master,0.0-1.0_0.1-1.1
75-
```
76-
77-
### Use HTTPS with XPU Manager
78-
79-
There is an alternative deployment that uses HTTPS instead of HTTP. The reference deployment requires `cert-manager` to provide a certificate for TLS. To deploy:
80-
81-
```bash
82-
$ kubectl apply -k 'https://github.com/intel/intel-device-plugins-for-kubernetes/deployments/xpumanager_sidecar/overlays/cert-manager?ref=<RELEASE_VERSION>'
83-
```
84-
85-
The deployment requests a certificate and key from `cert-manager`. They are then provided to the gunicorn container as secrets and are used in the HTTPS interface. The sidecar container uses the same certificate to verify the server.
86-
87-
> *NOTE*: The HTTPS deployment uses self-signed certificates. For production use, the certificates should be properly set up.
88-
89-
<details>
90-
<summary>Enabling HTTPS manually</summary>
91-
92-
If one doesn't want to use `cert-manager`, the same can be achieved manually by creating certificates with openssl and then adding it to the deployment. The steps are roughly:
93-
1) Create a certificate with [openssl](https://www.linode.com/docs/guides/create-a-self-signed-tls-certificate/)
94-
1) Create a secret from the [certificate & key](https://kubernetes.io/docs/reference/kubectl/generated/kubectl_create/kubectl_create_secret_tls/).
95-
1) Change the deployment:
96-
97-
* Add certificate and key to gunicorn container:
98-
```
99-
- command:
100-
- gunicorn
101-
...
102-
- --certfile=/certs/tls.crt
103-
- --keyfile=/certs/tls.key
104-
...
105-
- xpum_rest_main:main()
106-
```
107-
108-
* Add secret mounting to the Pod:
109-
```
110-
containers:
111-
- name: python-exporter
112-
volumeMounts:
113-
- mountPath: /certs
114-
name: certs
115-
readOnly: true
116-
volumes:
117-
- name: certs
118-
secret:
119-
defaultMode: 420
120-
secretName: xpum-server-cert
121-
```
122-
123-
* Add use-https and cert to sidecar
124-
```
125-
name: xelink-sidecar
126-
volumeMounts:
127-
- mountPath: /certs
128-
name: certs
129-
readOnly: true
130-
args:
131-
...
132-
- --cert=/certs/tls.crt
133-
...
134-
```
135-
136-
</details>
3+
Use of XeLink sidecar is deprecated as GAS has been deprecated. The sources are left for future use.

0 commit comments

Comments
 (0)