-
Notifications
You must be signed in to change notification settings - Fork 18.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
NVIDIA GPU support #23917
Comments
+@cpuguy83 we briefly discussed that with you during DockerCon 16. |
ping @mlaventure @crosbymichael wdyt? |
Worth noting that opencontainers/runtime-spec#483 might affect this |
@3XX0 I guess another option would be to have a patched version of The image detection is a similar issue we have for multiarch in general, and there are flags on hub, but not well exposed yet. This might also work for driver version, let me find the spec. |
Yes I thought about it too but I would rather not have to track upstream Thanks, It would be greatly appreciated. |
I just saw that #24750 was closed and redirected here. Related: NVIDIA/nvidia-docker#141 |
@3XX0 @flx42 I am the original author of #24750 . I am not sure this question is appropriate to post, if not, please forgive me. If I just want to implement |
@flx42 I think you may be able to bind mount in device nodes with |
|
@cpuguy83 thanks! |
@justincormack @cpuguy83 Yes, in NVIDIA/nvidia-docker#141 I figured out I can mount the user-level driver files like this:
But, unless I'm missing something, you can't bind mount a device, it seems to be like a
It's probably similar to doing something like this:
Whereas the following works (well, invalid arg is normal):
|
@NanXiao regarding your question, please look at NVIDIA/nvidia-docker#141 |
@flx42 ah yes, that would be an issue. Can you create a separate issue that you can't have a device in a service to track that specific problem? |
@justincormack created #24865, thank you! |
Few comments
The way we handled this for multi-arch is by explicitly introducing the "arch" filed into the image. I would suggest that we introduce an "accelerator" field to address not only GPUs but in the future for FPGA and other accelerator. On the compatibility check, I would implement it such that it is optional. A lot of applications can run with or without GPUs, if GPUs are there, they will take advantage of them but if they are not there, they will just run CPU only mode. If we make the driver check optional, it will make it easy to accommodate this requirement. |
Any update here? really look forward to a standard way to use accelerators in container :) |
See also kubernetes/kubernetes#19049 . k8s is going to release new version with GPU support. Swarm is very good for our system (k8s has something we don't need). However, GPU is defintely a key feature and if Swarm doesn't have any clear plan for it we have to go with k8s :D |
Hey guys, would really love to use a GPU supported swarm. This issue is still open, so I guess its not clear if this will be on the roadmap or not?! Any news on this topic? |
We would love support. It is unfortunately a very complex issue. I know
someone was looking at a simpler way, will see if it was viable.
Contributions are welcome, it really needs a detailed design proposal of
how to resolve the issues.
…On 11 Jan 2017 10:30 a.m., "Martin" ***@***.***> wrote:
Hey guys, would really love to use a GPU supported swarm. This issue is
still open, so I guess its not clear if this will be on the roadmap or
not?! Any news on this topic?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#23917 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAdcPB74d8IJOdocE_ygAImAeSxmQ1E8ks5rRK81gaJpZM4I9U2S>
.
|
thx for the update @justincormack |
What are the low level steps to give a container access to the GPUs? I was looking though your codebase and seen there are devices that must be added and volumes but I was not sure what the volumes were being used for? Is that for the libs? Other than placing the devices inside, are there any other settings that need to be applied? |
It's actually tricky, we explain most of it here but there are a lot of corner-cases to think about. We've been working on a library to abstract all these things, the idea is to integrate it as a RunC prestart hook. We have most of it working now and will publish it soon. The only issue with this approach is that we have to fork RunC and rely on Docker In the long term, we should probably think about a native GPU integration leveraging something like |
@3XX0 if you want, we can work together on the runc changes together. I think it makes sense to have some support for GPU at that level and will make implementations much cleaner at the higher levels. |
It should be pretty straightforward, the only things that need to be done are:
Once 1) is fixed and we have an option to add custom hooks from within Docker (e.g. |
@3XX0 why use hooks at all? Can you not populate everything in the spec itself to add the devices, add bind mounts, and give the correct permissions on the device level? |
@3XX0 if we have a |
Our solution is runtime agnostic and hooks are perfect for that. |
Right now we use an environment variable with a list of comma-separated IDs and/or UUIDs (similar to nvidia-docker
|
@crosbymichael We also hit some limitations with Docker, for example a lot of GPU images need large shmsize and memlock limits (our drivers need those). Not sure how to address that at the image level (Docker is not even relying on the OCI spec for The workaround is to configure everything at the daemon level once we have #29492 fixed but it's far from being ideal. |
You can open separate issues for shm limits etc, and we can resolve those
individually.
…On 12 Jan 2017 10:19 p.m., "Jonathan Calmels" ***@***.***> wrote:
@crosbymichael <https://github.com/crosbymichael> We also hit some
limitations with Docker, for example a lot of GPU images need large shmsize
and memlock limits (our drivers need those). Not sure how to address that
at the image level (Docker is not even relying on the spec for /dev/shm).
The workaround is to configure everything at the daemon level once we have
#29492 <#29492> fixed but it's far
from being ideal.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#23917 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAdcPMeUb1F7lBh9XVyRQfL1Xj_bd45eks5rRqb-gaJpZM4I9U2S>
.
|
@justincormack @crosbymichael Do you have a timeline on the containerd 1.0 release and integration in Docker? Right now the only option we have is to integrate at the runc level given that the containerd "runtime" is hardcoded. I would rather do it with containerd 1.0 if Docker were to support it. |
@3xxo containerd 1.0 is now integrated into Docker. |
@3XX0 thanks for your excellent works!
Can this container use GPU B? By your comments above , it should not.
And I did not notice anything different between those two containers. |
@WanLinghao Inside the |
@flx42 I have tried three kinds of command And then I dig into their bashes respectively, execute first container and third container:
second container:
As you can see, command 1 and command 3 makes no difference. |
Ah yes, nvidia-docker (version 1.0) will be passthrough to So, try again with |
Any news on this? Would ❤️ a LinuxKit distro with nvidia-docker onboard 🙌✨ |
Trying to revive the interest for supporting OCI hooks here: #36987 |
Docker 19.03 has |
Hello, author of nvidia-docker here.
As many of you may know, we recently released our NVIDIA Docker project in our effort to enable GPUs in containerized applications (mostly Deep Learning). This project is currently comprised of two parts:
More information on this here
While it has been working great so far, now that Docker 1.12 is coming out with configurable runtime and complete OCI support, we would like to move away from this approach (which is admittedly hacky) and work on something which is better integrated with Docker.
The way I see it would be to provide a prestart OCI hook which would effectively trigger our implementation and configure the cgroups/namespaces correctly.
However, there are several things we need to solve first, specifically:
Currently, we are using a special label
com.nvidia.volumes.needed
, but it is not exported as an OCI annotation (see Clarify how OCI configuration ("config.json") will be handled #21324)Currently, we are using an environment variable
NV_GPU
Currently, we are using a special label
XXX_VERSION
All of the above could be solved using environment variables but I'm not particularly fond of this idea (e.g.
docker run -e NVIDIA_GPU=0,1 nvidia/cuda
)So is there a way to pass runtime/hook parameters from the docker command line and if not would it be worth it? (e.g.
--runtime-opt
)The text was updated successfully, but these errors were encountered: