-
Notifications
You must be signed in to change notification settings - Fork 616
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for devices with "service create" #1244
Comments
@flx42 For the container runtime, devices require special handling (a mknod syscall), so mounts won't work. We'll probably have to add some sort of support for this. (cc @crosbymichael) Ideally, we'd like to be able to schedule over devices, as well. |
@stevvooe Already have device support in the runtime, just not exposed in swarm. |
This question was raised here: moby/moby#24750 |
@stevvooe I quickly hacked a solution, it's not too difficult: |
Forgot to mention that I can now run GPU containers by mimicking what nvidia-docker does:
|
@flx42 I took a quick peak and the PR looks like a decent start. I am not sure about representing these as cluster-level resources for container startup. From an orchestration perspective, we have to match these up with announced resources at the node level, which might be okay. It might be better on Go ahead and file as a |
@stevvooe Yeah, that's the biggest discussion point for sure. In engine-api, devices are resources: But in swarmkit, resources are so far "fungible" objects like CPU shares and memory, with a base value and a limit. A device doesn't really fit that definition. For GPU apps we have devices that must be shared ( I decided to initially put devices into resources because there is already a function in swarmkit that creates a engine-api Resource object from a swarm Resource object: I will file a PR soon to continue the discussion. |
@flx42 Great! We really aren't planning on following the same resource model from Now, I would like to see scheduling of fungible GPUs but that might a wholly separate flow, keeping the initial support narrow. Such services would require manual constraint and device assignment, but you still achieve the goal. Let's discuss this in the context of the PR. |
Thanks @flx42 - I think GPU is definitly something we want to support medium term. /cc @mgoelzer |
Thanks @aluzzardi, PR created, it's quite basic. |
The --device option is really import for my use case too. I am trying to use swarm to manage 50 Raspberry Pi's to do computer vision, but I need to be able to access /dev/video0 to capture images. Without this option, I'm stuck, and have to manage them without swarm, which is painful. |
@mlhales We need someone who is willing to workout the issues with |
Using |
Using |
We found an easier way for Blinkt! LED strip to use sysfs. Now we can run Blinkt! in docker swarm mode without privileges. |
@StefanScherer is it a proper alternative for using e.g. --device=/dev/mem to access GPIO on a RPi ? Would love to see an example if you would care to share :) |
@mathiasimmer For the use-case with Blinkt! LED strip there are only eight RGB LED's. So using sysfs it not time critical for these few LED's. If you want to drive hundreds of them you still need faster GPIO access to have a higher clock rate. But for Blinkt! we have forked the Node.js module and adjusted in in this branch https://github.com/sealsystems/node-blinkt/tree/sysfs. |
/cc @cyli |
@aluzzardi I think we should resurrect the We can always add logic in the scheduler to prevent device contention in the future. |
Attempt to add devices to the container spec and plugin spec here: #1964 I've no objection to the |
|
@diogomonica I thought profiles mainly covered capabilities, etc? |
@cyli well, if we believe "devices" are easy enough to understand for easy user acceptance then we might not need them, but we should look critically at adding anything that allows escalation of privileges of a container to the cmd-line before we have agood way of informing everything the service will need from a security perspective to the user. |
Also following this. Very interested in access to character devices (/dev/bus/usb/...) in a docker swarm.
|
Hey, @allfro |
How can I help to finish that feature? |
I really like workaround from @BretFisher #1244 (comment) and here is how I adapted it for nodes that require a device:
Putting it all together, your services will have to change from this: services:
my-service-starter:
image: docker
command: 'docker run --name <name> --device /dev/bus/usb -e TOKEN=1234 -p 5000:5000 <image>'
volumes:
- /var/run/docker.sock:/var/run/docker.sock
deploy:
placement:
constraints:
- node.labels.device_required == true to this: services:
my-service-handler:
image: docker
command: 'docker-compose -f /docker-compose.yml up'
volumes:
- /var/run/docker.sock:/var/run/docker.sock
- /home/ubuntu/docker-compose.yml:/docker-compose.yml
deploy:
placement:
constraints:
- node.labels.device_required == true
networks:
default:
name: my_network
driver: overlay
attachable: true (on manager) services:
<name>:
image: <image>
restart: always
container_name: <name>
devices:
- /dev/bus/usb
environment:
- TOKEN=1234
ports:
- 5000:5000
networks:
default:
name: my_network
external: true ( |
Hi @pjalusic ,
But in the worker node, I need to depend on another servcie from manager node. |
I developed a plugin in the end that allows me to map devices to containers: https://github.com/allfro/device-volume-driver. Hope it helps others. Unfortunately, it only works on systems that use cgroup v1 (alpine). I am looking for some help to develop the cgroupv2 support into the plugin. It works really well and I've used it to containerize x11 desktops that require access to fuse and the vmware graphics devices. |
After planning to redo my home server setup with swarm (so I can have multiple nodes), I discovered that this wasn't supported, and I needed it for VAAPI. After looking through things, it seemed to me like this was a plumbing (and developer-hour) problem. Basing things on a previous PR series which added ulimit support to swarm, here is a chain of PRs which add devices in the most boring way; just plumbing it through the API as-is, no special management or API. Just what docker already supported outside of swarm.
I'm sure I've missed something, and I don't quite know how to get everything building together to test this (I typically run things from my package manager's installed docker), but maybe someone is willing to try the above out. |
Hi, I'm trying the workaround in #1244 (comment) and it indeed works, but when I remove the stack the handler is successfully removed, while the privileged container in docker-compose.yml continue running and has to be killed manually using docker kill. Any idea on what could be the issue? |
These docs are referencing the API for services:
my-gpu-service:
...
deploy:
resources:
reservations:
generic_resources:
- discrete_resource_spec:
kind: "NVIDIA-GPU"
value: 2 This works if you've registered your GPUs in the For anyone looking for device support for NVIDIA GPUs using Swarm I did a quick write up here summarizing two solutions. My write up was heavily inspired by the original gist I found on the subject here. |
Hello,
|
Also in this same situation. I was using this solution to passthrough the iGPU driver to PLEX on a dockerswarm host for hardware transcoding: https://pastebin.com/XY7GP18T I had some new hardware which required running the latest version of ubuntu to recognise it but this uses the cgroups v2. At the moment I reverted back to using cgroups v1 via these instructions to get this working again: https://sleeplessbeastie.eu/2021/09/10/how-to-enable-control-group-v2/ I will experiment with moving to cgroups v2 and a combination of generic resource advertising the iGPU to the service as soon as I have time via these two hints as outlined by @coltonbh : https://gist.github.com/coltonbh/374c415517dbeb4a6aa92f462b9eb287 If anyone has any idea how to correctly advertise a quicksync driver to a cgroup v2 using dockerswarm it would be highly appreciated. Alternatively, I guess I could migrate to kubernetes ;) |
I'm getting strong "the perfect is the enemy of the good" vibes from this issue. Strongly in favor of just passing through the devices options and letting buyer beware. |
I've written this hack and tried it with plex and it seems to work: https://github.com/allfro/device-mapping-manager. Essentially it runs a privileged container which listens for docker create events and inspects the mount points. If a mount is within the /dev folder, it will walk the mount path for character and block devices and apply the necessary device rules to make the devices available. This doesn't work with fuse yet because the default apparmor profile blocks mounts (ugh!) but it does work with graphics cards and other devices that don't require operations that are blocked by Docker's apparmor profile. It is inspired by the previous comments. |
@reisholmes check this out: https://github.com/allfro/device-mapping-manager |
Any ideas on how to make this work for cgroups2 ? |
This did the trick for me for running linuxserver/plex with hardware acceleration using Intel QuickSync with Docker Swarm |
Hi @cdalvaro ! I might be a massive idiot - but I'm going to ask anyway - how did you make this work? I still get stuck with this issue Did you clone the repo and build the image yourself locally? Something else? |
Hi @demaniak! It's really straight forward! I follow these steps: allfro/device-mapping-manager#12 (comment) Create a service deployed in all machines with devices you want to allow in your services: services:
device-mapping-manager:
image: alpinelinux/docker-cli
entrypoint: docker
command: |
run
--rm
-i
--name device-manager
--privileged
--cgroupns=host
--pid=host
--userns=host
-v /sys:/host/sys
-v /var/run/docker.sock:/var/run/docker.sock
ghcr.io/allfro/allfro/device-mapping-manager:sha-0651661
volumes:
- /var/run/docker.sock:/var/run/docker.sock:ro
deploy:
mode: global
restart_policy:
condition: any
placement:
constraints:
- node.labels.dmm == true I added the constrain This service will look for your containers (already created and newly created), and those devices mounted as volumes will be added to the allowed list of the container. For example: services:
plex:
image: ghcr.io/linuxserver/plex:latest
container_name: plex
hostname: "{{.Node.Hostname}}"
networks:
- traefik
ports:
- "32400:32400/tcp"
volumes:
- /dev/dri:/dev/dri # Hardware acceleration for Intel GPUs with Docker Swarm via DMM Hope this can help you! Edit: I've updated the |
Thank you @cdalvaro , gonna try it out a bit later, but his looks like it should work! EDIT: |
Absolutely amazing, thanks @cdalvaro volumes:
- /var/run/docker.sock:/var/run/docker.sock:ro And NOT using version: '3.8'
services:
device-mapping-manager:
image: alpinelinux/docker-cli
entrypoint: docker
command: |
run
--rm
-i
--name device-manager
--privileged
--cgroupns=host
--pid=host
--userns=host
-v /sys:/host/sys
-v /var/run/docker.sock:/var/run/docker.sock
ghcr.io/allfro/allfro/device-mapping-manager:sha-0651661
volumes:
- /var/run/docker.sock:/var/run/docker.sock:ro
deploy:
mode: global
restart_policy:
condition: any |
Last I checked @skjnldsv, read-only doesn't work on the Docker socket, and it is still writable even when |
Initially reported: moby/moby#24865, but I realized it actually belongs here. Feel free to close the other one if you want. Content of the original issue copied below.
Related: #1030
Currently, it's not possible to add devices with
docker service create
, there is no equivalent fordocker run --device=/dev/foo
.I'm an author of nvidia-docker with @3XX0 and we need to add devices files (the GPUs) and volumes to the starting containers in order to enable GPU apps as services.
See the discussion here: moby/moby#23917 (comment) (summarized below).
We figured out how to add a volume provided by a volume plugin:
But there is no solution for devices, @cpuguy83 and @justincormack suggested using
--mount type=bind
. But it doesn't seem to work, it's probably like doing a mknod but without the proper device cgroup whitelisting.It's probably equivalent to this:
Whereas the following works (invalid arg is normal, but no permission error):
The text was updated successfully, but these errors were encountered: