Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RFE] Support for OCI hooks [2018 edition] #36987

Open
flx42 opened this issue May 2, 2018 · 8 comments
Open

[RFE] Support for OCI hooks [2018 edition] #36987

flx42 opened this issue May 2, 2018 · 8 comments
Labels
area/runtime kind/feature Functionality or other elements that the project doesn't currently have. Features are new and shiny

Comments

@flx42
Copy link
Contributor

flx42 commented May 2, 2018

This subject has been mentioned multiple times already, starting back in 2015, but I would like to revive this discussion.

Current status

Today Docker supports swapping runc with another OCI compliant runtime:
https://docs.docker.com/engine/reference/commandline/dockerd/#docker-runtime-execution-options

However, this approach is not composable, you can't combine custom runtimes.
If I want a to customize the behavior of runc for Docker today, I need to fork runc and add my modifications, then ship the whole binary.

Proposal

For this reason, I'm asking the moby community to (re)consider exposing standard OCI hooks to the "docker run" API, those hooks are defined in the OCI spec:
https://github.com/opencontainers/runtime-spec/blob/master/config.md#posix-platform-hooks

As a reference, the OCI runtime spec defines when those hooks must be called:
https://github.com/opencontainers/runtime-spec/blob/master/runtime.md#lifecycle

Hooks could be declared at the docker daemon level, similarly to what we have today for OCI runtimes:

{
	"hooks": {
		"prestart": [
			"my-hook": {
				"path": "/usr/local/bin/my-prestart-hook",
				"args": ["--debug"]
			},
		...
		],
		"poststart": [
		...
		]
	}
}

Similar to what's proposed here: #28837

As mentioned in #33375 (comment), we could use it the following way:

docker run --exec-opt prestart=hook1,hook2

Or, if you want to avoid users having to remember the type of your hook:

docker run --exec-opt hooks=hook1,hook2

The exec-opt option already exists at the daemon level, it would make sense to reuse it.

Related discussions:

With @3XX0 we work at NVIDIA on enabling GPU applications to run in containers, mostly for machine learning and HPC applications.

Next steps

If core moby maintainers agree on this RFE, I can submit a design doc so we can start discussing the details.

cc fest

@3XX0 @RenaudWasTaken: NVIDIA colleagues
@rhatdan: for the initial proposal in 2015
@WeiZhang555: for trying to revive the idea in 2016
@crosbymichael @stevvooe: since you looked at libnvidia-container recently, maybe you are preparing something at the containerd level directly?
@mrunalp: since we discussed this recently
@cmluciano: since you filed an issue against our runtime, asking for other options

@crosbymichael
Copy link
Contributor

crosbymichael commented May 7, 2018

@flx42 Thanks for pinging me on this. I have been looking into what it takes to get nvidia gpus in containers and I actually just opened a PR today adding support to containerd.

containerd/containerd#2330

We don't use your hooks though and opt'd to just configure the spec with the devices, cgroups, and bind mounts for drivers. Which maybe a good thing for you if you are tired of supporting the hooks for docker integration. This will allow us to build out the API in a type safe way and not have to worry about env vars anymore. Let me know if you think we are going in the right direction for the low level components in containerd and then we can start on building this out into Docker as an alternate way of fixing your above issue. I'm pretty confident that we can support everything with spec configurations and bind mounts ( write a custom app profile in the container's bundle and bind mount that in-place of the tmp+writing the file inside the container ) .

@flx42
Copy link
Contributor Author

flx42 commented May 7, 2018

Hi @crosbymichael,

Exciting to see the current work towards improving support for the use cases we care about! Welcome to the HPC/ML containers party :)

This request is not only about GPU support in Docker, there are other use cases.
I believe you will agree that certain operations require access to the container namespaces, right? You might also need access to the runtime namespaces and the container namespaces at the same time.
Networking is a prime example, some operations must be ran inside the network namespace of the container. Some other operations might need to perform an action based on the content on the container filesystem, or deal with LSMs.

We believe that properly supporting our GPUs in a container require a prestart hook, there are many complicated interactions with our driver stack that need to be taken into account.
Supporting additional features in the future such as MPS or Vulkan will require additional logic.

Now, this is our opinion, it doesn't preclude you from adding another implementation in your product.
Our prestart hook is what to want to support and maintain as a product for OCI-compatible runtimes. It allows us to have a single implementation we can use behind runc, docker, Kubernetes CRI-O, and soon LXC.
It's not sustainable if we need to implement and support a different solution in each container stack. I hope you understand our point of view.

Totally agree that the interface based on environment variables is not structured, but that's the only element that really traverses the stack from the (image, user spec) pair to the runtime. On the other hand there are many flavors of annotations/labels but they are usually not passed down to the container runtime.

We can discuss more about the details of enabling support for NVIDIA GPUs in containers in your PR, but what do you think of this proposal for enabling other use cases?

@ChristianKniep
Copy link

ChristianKniep commented May 8, 2018

@flx42 thanks for the input and I agree with your statement, that GPU is just the poster-child use-case of kernel-bypassing devices. That said, I also agree with @crosbymichael that hooks have their drawbacks to; handing over responsibility to an external scripts sounds scary even to me (and I am in general fearless).
I know the use-case pretty well and we will work towards a solution that puts this discussion to rest once and for all (to be modest). :)

@flx42
Copy link
Contributor Author

flx42 commented May 8, 2018

thanks for the input and I agree with your statement, that GPU is just the poster-child use-case of kernel-bypassing devices

Maybe we don't have the same definition of "kernel-bypassing". But we do have a kernel module, i.e. our driver is not userspace. Maybe you're thinking of DPDK?

That said, I also agree with @crosbymichael that hooks have their drawbacks to; handing over responsibility to an external scripts sounds scary even to me (and I am in general fearless).
I know the use-case pretty well and we will work towards a solution that puts this discussion to rest once and for all (to be modest). :)

My point is that it's already possible today for registering new runtimes instead of runc. Adding the capability to inject hooks is similar and just a small step further for supporting the OCI ecosystem.

If you're afraid of binaries, it's a little too late :) But for your own product this risk is understandable, it makes sense to want to avoid relying on a third-party tool. But I reiterate that we do need this RFE :)

@mathstuf
Copy link

Any progress here?

@flx42
Copy link
Contributor Author

flx42 commented Sep 30, 2018 via email

@mathstuf
Copy link

Ugh, how unfortunate. I just did something similar to what nvidia is doing now to add a "systemd" runtime so I can use the relevant OCI hooks.

@basvandijk
Copy link

I have a host running many docker containers. The services running in each docker container need to log structured messages to journald via its sd-journal API.

It would be great if these structured messages end up in the journald of the host so that I can query them using journalctl --machine <container> and have a single log forwarder (like journalbeat) that sends all logs to my central log database. It would also be nice if the containers show up in machinectl.

If docker had support for hooks I could run a hook like oci-register-machine and oci-systemd-hook to nicely integrate the containers with my host's systemd.

Any chance a patch like #17021 can be reconsidered? It seems like that PR got closed because it could break docker if the user supplied a broken hook. That's a valid concern but can't we address it by declaring hooks to be for advanced users only? We could specify in the documentation that using hooks is not advised and at your own risk for example.

@thaJeztah thaJeztah added area/runtime kind/feature Functionality or other elements that the project doesn't currently have. Features are new and shiny labels Jun 4, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/runtime kind/feature Functionality or other elements that the project doesn't currently have. Features are new and shiny
Projects
None yet
Development

No branches or pull requests

6 participants