Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding NUMA/TopologyManager support to gpu device plugin #150

Open
robertdavidsmith opened this issue Oct 21, 2020 · 1 comment
Open

Adding NUMA/TopologyManager support to gpu device plugin #150

robertdavidsmith opened this issue Oct 21, 2020 · 1 comment

Comments

@robertdavidsmith
Copy link

Hi,

I’m interested in adding TopologyManager/NUMA support to your GPU device plugin.

I believe this is a case of

  1. Upgrade container-engine-accelerators/vendor/k8s.io/kubernetes/pkg/kubelet/apis/deviceplugin/v1beta1/api.proto to a newer version that includes TopologyInfo
  2. Find a way of mapping paths under /dev to paths under /sys (for example map /dev/nvidia0 to /sys/devices/pci0000:00/0000:00:05.0)
  3. Read file such as /sys/devices/pci0000:00/0000:00:05.0/numa_node and return over protobuf

Steps 1 and 3 should be easy enough. Step 2 is harder because of the need to get the PCI Id for a device.

For device->pci id mapping, options I’m aware of are

  1. It would be great if we could do the mapping just by looking under /sys. This is easy for disks (just look under /sys/block) but doesn’t appear possible for GPUs (I’d love to be proven wrong here).
  2. Make use of NVML’s nvmlDeviceGetPciInfo function. Making the device plugin use NVML has already been attempted at https://github.com/GoogleCloudPlatform/container-engine-accelerators/pull/52/files, but this PR was never merged. If we could get this PR merged, adding a call to nvmlDeviceGetPciInfo would be trivial.
  3. Run nvidia-smi then parse the output.

What are your thoughts? It would be great to agree a design before I start work on a new PR.

Kind regards,

Rob

@robertdavidsmith
Copy link
Author

PR here #165

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant