[RFE] Support for OCI hooks [2018 edition] #36987

flx42 · 2018-05-02T22:26:06Z

This subject has been mentioned multiple times already, starting back in 2015, but I would like to revive this discussion.

Current status

Today Docker supports swapping runc with another OCI compliant runtime:
https://docs.docker.com/engine/reference/commandline/dockerd/#docker-runtime-execution-options

However, this approach is not composable, you can't combine custom runtimes.
If I want a to customize the behavior of runc for Docker today, I need to fork runc and add my modifications, then ship the whole binary.

Proposal

For this reason, I'm asking the moby community to (re)consider exposing standard OCI hooks to the "docker run" API, those hooks are defined in the OCI spec:
https://github.com/opencontainers/runtime-spec/blob/master/config.md#posix-platform-hooks

As a reference, the OCI runtime spec defines when those hooks must be called:
https://github.com/opencontainers/runtime-spec/blob/master/runtime.md#lifecycle

Hooks could be declared at the docker daemon level, similarly to what we have today for OCI runtimes:

{
	"hooks": {
		"prestart": [
			"my-hook": {
				"path": "/usr/local/bin/my-prestart-hook",
				"args": ["--debug"]
			},
		...
		],
		"poststart": [
		...
		]
	}
}

Similar to what's proposed here: #28837

As mentioned in #33375 (comment), we could use it the following way:

docker run --exec-opt prestart=hook1,hook2

Or, if you want to avoid users having to remember the type of your hook:

docker run --exec-opt hooks=hook1,hook2

The exec-opt option already exists at the daemon level, it would make sense to reuse it.

Related discussions:

With @3XX0 we work at NVIDIA on enabling GPU applications to run in containers, mostly for machine learning and HPC applications.

The initial discussion is here: NVIDIA GPU support #23917
I briefly summarized our current approach here: Proposal: support custom hooks (prestart, poststart, poststop) #28837 (comment)
It's not ideal because we have to follow the docker releases and ship a full copy of runc (at the same revision).
For hooks in general, the original discussion is here (2015): Add dockerhooks program to run hooks under runc #17021
The patch is still applied in Project Atomic's fork of moby: https://github.com/projectatomic/docker/tree/docker-17.03.1-ce#add-dockerhooks-exec-custom-hooks-for-prestartpoststop-containerspatch
GPU support in Docker containers is actually simpler on this fork, instead of having a custom runc, you just drop our hook binary (much smaller) to a folder, and it works fine.
Note that this approach was carried in Kubernetes for the CRI-O project: https://github.com/kubernetes-incubator/cri-o/blob/master/hooks.md
It's also a good reference on some of the potential problems with hooks, which mean we might want to prohibit users from registering default hooks.
GPU support in CRI-O works fine too, thanks to this.
Finally, I believe it would also cover the use cases from this proposal: Mount Point Plugins Implementation #33375 (comment)

Next steps

If core moby maintainers agree on this RFE, I can submit a design doc so we can start discussing the details.

cc fest

@3XX0 @RenaudWasTaken: NVIDIA colleagues
@rhatdan: for the initial proposal in 2015
@WeiZhang555: for trying to revive the idea in 2016
@crosbymichael @stevvooe: since you looked at libnvidia-container recently, maybe you are preparing something at the containerd level directly?
@mrunalp: since we discussed this recently
@cmluciano: since you filed an issue against our runtime, asking for other options

The text was updated successfully, but these errors were encountered:

crosbymichael · 2018-05-07T19:21:56Z

@flx42 Thanks for pinging me on this. I have been looking into what it takes to get nvidia gpus in containers and I actually just opened a PR today adding support to containerd.

containerd/containerd#2330

We don't use your hooks though and opt'd to just configure the spec with the devices, cgroups, and bind mounts for drivers. Which maybe a good thing for you if you are tired of supporting the hooks for docker integration. This will allow us to build out the API in a type safe way and not have to worry about env vars anymore. Let me know if you think we are going in the right direction for the low level components in containerd and then we can start on building this out into Docker as an alternate way of fixing your above issue. I'm pretty confident that we can support everything with spec configurations and bind mounts ( write a custom app profile in the container's bundle and bind mount that in-place of the tmp+writing the file inside the container ) .

flx42 · 2018-05-07T21:41:58Z

Hi @crosbymichael,

Exciting to see the current work towards improving support for the use cases we care about! Welcome to the HPC/ML containers party :)

This request is not only about GPU support in Docker, there are other use cases.
I believe you will agree that certain operations require access to the container namespaces, right? You might also need access to the runtime namespaces and the container namespaces at the same time.
Networking is a prime example, some operations must be ran inside the network namespace of the container. Some other operations might need to perform an action based on the content on the container filesystem, or deal with LSMs.

We believe that properly supporting our GPUs in a container require a prestart hook, there are many complicated interactions with our driver stack that need to be taken into account.
Supporting additional features in the future such as MPS or Vulkan will require additional logic.

Now, this is our opinion, it doesn't preclude you from adding another implementation in your product.
Our prestart hook is what to want to support and maintain as a product for OCI-compatible runtimes. It allows us to have a single implementation we can use behind runc, docker, Kubernetes CRI-O, and soon LXC.
It's not sustainable if we need to implement and support a different solution in each container stack. I hope you understand our point of view.

Totally agree that the interface based on environment variables is not structured, but that's the only element that really traverses the stack from the (image, user spec) pair to the runtime. On the other hand there are many flavors of annotations/labels but they are usually not passed down to the container runtime.

We can discuss more about the details of enabling support for NVIDIA GPUs in containers in your PR, but what do you think of this proposal for enabling other use cases?

ChristianKniep · 2018-05-08T07:29:48Z

@flx42 thanks for the input and I agree with your statement, that GPU is just the poster-child use-case of kernel-bypassing devices. That said, I also agree with @crosbymichael that hooks have their drawbacks to; handing over responsibility to an external scripts sounds scary even to me (and I am in general fearless).
I know the use-case pretty well and we will work towards a solution that puts this discussion to rest once and for all (to be modest). :)

flx42 · 2018-05-08T16:53:35Z

thanks for the input and I agree with your statement, that GPU is just the poster-child use-case of kernel-bypassing devices

Maybe we don't have the same definition of "kernel-bypassing". But we do have a kernel module, i.e. our driver is not userspace. Maybe you're thinking of DPDK?

That said, I also agree with @crosbymichael that hooks have their drawbacks to; handing over responsibility to an external scripts sounds scary even to me (and I am in general fearless).
I know the use-case pretty well and we will work towards a solution that puts this discussion to rest once and for all (to be modest). :)

My point is that it's already possible today for registering new runtimes instead of runc. Adding the capability to inject hooks is similar and just a small step further for supporting the OCI ecosystem.

If you're afraid of binaries, it's a little too late :) But for your own product this risk is understandable, it makes sense to want to avoid relying on a third-party tool. But I reiterate that we do need this RFE :)

mathstuf · 2018-09-30T17:00:32Z

Any progress here?

flx42 · 2018-09-30T17:38:26Z

Not on my side. I would still like to see this moving forward, but the Docker maintainers prefer to have a GPU API. So that wouldn't solve other use cases.

…

On Sun, Sep 30, 2018, 10:02 Ben Boeckel ***@***.***> wrote: Any progress here? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#36987 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ADegjTbQ0OCPb1OMTME44YITOHiX8LDRks5ugPk3gaJpZM4TwQGm> .

mathstuf · 2018-09-30T17:47:16Z

Ugh, how unfortunate. I just did something similar to what nvidia is doing now to add a "systemd" runtime so I can use the relevant OCI hooks.

basvandijk · 2019-05-21T16:07:54Z

I have a host running many docker containers. The services running in each docker container need to log structured messages to journald via its sd-journal API.

It would be great if these structured messages end up in the journald of the host so that I can query them using journalctl --machine <container> and have a single log forwarder (like journalbeat) that sends all logs to my central log database. It would also be nice if the containers show up in machinectl.

If docker had support for hooks I could run a hook like oci-register-machine and oci-systemd-hook to nicely integrate the containers with my host's systemd.

Any chance a patch like #17021 can be reconsidered? It seems like that PR got closed because it could break docker if the user supplied a broken hook. That's a valid concern but can't we address it by declaring hooks to be for advanced users only? We could specify in the documentation that using hooks is not advised and at your own risk for example.

flx42 mentioned this issue May 2, 2018

NVIDIA GPU support #23917

Closed

flx42 mentioned this issue Jul 11, 2018

[RFC] GPU support in CLI docker/cli#1200

Closed

mathstuf mentioned this issue Sep 30, 2018

Support specifying runc's config.json when building images docker/compose#6238

Closed

basvandijk mentioned this issue May 22, 2019

Integrate the container's systemd with the host's systemd hercules-ci/arion#48

Open

thaJeztah added area/runtime kind/feature Functionality or other elements that the project doesn't currently have. Features are new and shiny labels Jun 4, 2019

neersighted mentioned this issue Jan 15, 2023

Add CDI support to --devices docker/cli#3864

Closed

tmc mentioned this issue Apr 13, 2023

Support GPUs kubernetes-sigs/kind#3164

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[RFE] Support for OCI hooks [2018 edition] #36987

[RFE] Support for OCI hooks [2018 edition] #36987

flx42 commented May 2, 2018 •

edited

Loading

crosbymichael commented May 7, 2018 •

edited

Loading

flx42 commented May 7, 2018 •

edited

Loading

ChristianKniep commented May 8, 2018 •

edited

Loading

flx42 commented May 8, 2018 •

edited

Loading

mathstuf commented Sep 30, 2018

flx42 commented Sep 30, 2018 via email

mathstuf commented Sep 30, 2018

basvandijk commented May 21, 2019

[RFE] Support for OCI hooks [2018 edition] #36987

[RFE] Support for OCI hooks [2018 edition] #36987

Comments

flx42 commented May 2, 2018 • edited Loading

Current status

Proposal

Related discussions:

Next steps

cc fest

crosbymichael commented May 7, 2018 • edited Loading

flx42 commented May 7, 2018 • edited Loading

ChristianKniep commented May 8, 2018 • edited Loading

flx42 commented May 8, 2018 • edited Loading

mathstuf commented Sep 30, 2018

flx42 commented Sep 30, 2018 via email

mathstuf commented Sep 30, 2018

basvandijk commented May 21, 2019

flx42 commented May 2, 2018 •

edited

Loading

crosbymichael commented May 7, 2018 •

edited

Loading

flx42 commented May 7, 2018 •

edited

Loading

ChristianKniep commented May 8, 2018 •

edited

Loading

flx42 commented May 8, 2018 •

edited

Loading