-
Notifications
You must be signed in to change notification settings - Fork 616
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Proposal: Device Support #2682
Comments
While I cannot claim to know anything about the implementation or protocols, I can say that this is a desperately needed feature for any sort of IoT development for which current solutions (however clever) are insufficient. +1 due to that. The user interface that's proposed also seems fairly intuitive. My question is, would this then support docker-compose files? |
I don't have a design for compose support, but I imagine it would be straightforward. You would just include version: '3'
services:
iot:
ports:
- "5000:5000"
volumes:
- .:/datastore
devices:
- target: sensor
path: /dev/sensor The only open question is whether a compose file should also be able to define device classes and devices per node. That's a better question for the compose team, after we've passed this phase of design. |
@dperny I like the plan, would be great to see this! |
@dperny This would cover our needs for using hardware security modules in containers. I cannot find anything wrong in the proposal. |
I'm... kind of a doofus? And totally forgot that swarm supports Generic Resource constraints, design doc here: https://github.com/docker/swarmkit/blob/master/design/generic_resources.md This work, which everyone seems to have forgotten even happened, handles the difficulty of managing which resources are in use on which nodes and by which tasks, which is the more complicated part of this proposal. However, there is a big problem with the generic resources: the resource availability is decoupled at the data model from the way the resource is used. Essentially, you can keep track of which and how many resources a node has, but not how to actually make use of those resources. This is an explicit non-goal of the Generic Resource design. Quote,
This implies that tasks should be responsible for requisitioning their own resources at run time. However, this is impossible for devices. A task, from within a container, cannot attach devices after it has started. So the task has an awareness of what resources are available to it, but no actual way to make use of them. This basically explains why nobody uses this feature; the only way to do so would be to create tasks mounting the docker socket that spawn new containers. The executor will have to be aware of how devices are accessed for devices to work. The responsibility for putting those devices into the task will have to live entirely within the agent. I'll need to rewrite this proposal to accommodate this existing GenericResource feature, so we don't have two overlapping features with different but similar purposes. |
I'm poking at how to leverage the existing GenericResource code, and it's honestly not that sensible. The use case is too different. The amount of mogrification to the GenericResource concept that one would have to do is untenable. Honestly... GenericResource isn't a super sensible implementation anyway. It totally decouples a task's resource demands from the actual use of resources, which is a serious problem. If a Task reserves a resource, but does not have any way to use it, the resource is wasted. However, if a Task specifies how to use a resource, but no such reservation was made, then the Task will fail in strange ways. I think, despite the slight duplication of efforts, the use case for actually using devices is sufficiently different to warrant a separate design. |
Updated the design document to include section on GenericResource |
@dperny I would love to see this implemented! This would allow us to proper use hardware security modules (HSM) which are required by our application in swarm mode. |
@dperny Any update on progress for those of us who are eagerly waiting? |
Yes, I'm gonna do it, I just keep getting pulled away on other things internally. But it's gonna happen. Soon™. |
I Swoon for Soon™
…On Wed, Aug 22, 2018 at 1:00 PM Drew Erny ***@***.***> wrote:
Yes, I'm gonna do it, I just keep getting pulled away on other things
internally. But it's gonna happen. Soon™
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#2682 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AVCjgz_19uYpBT75ivNe4rmYycHG6Rr7ks5uTapCgaJpZM4VAAKU>
.
|
Adds the protocol buffers needed to support allocation of devices in swarmkit. Part of the proposal in moby#2682. Signed-off-by: Drew Erny <[email protected]>
Adds the protocol buffers needed to support allocation of devices in swarmkit. Part of the proposal in moby#2682. Signed-off-by: Drew Erny <[email protected]>
Adds the protocol buffers needed to support allocation of devices in swarmkit. Part of the proposal in moby#2682. Signed-off-by: Drew Erny <[email protected]>
@dperny Any updates or timeline? Thanks! |
is any progress about this issue? |
Adds the protocol buffers needed to support allocation of devices in swarmkit. Part of the proposal in moby#2682. Signed-off-by: Drew Erny <[email protected]>
Seems not ^^ |
i had a bunch of free time for a little while, and then it rapidly became not a bunch of free time, and now i'm doing other things. i'm really sorry, i started promising this a year ago and i feel The Guilt over not delivering on it. |
@dperny Please do not feel any guilt. No matter how snarky the comments from people like @flopon are (and I am sure he didn't mean to put any pressure on you), without throwing loads of money towards you there is no right to expect any progress. Please do not ever feel bad for not delivering on a ticket on an (mostly) OSS project. Your work is highly appreciated and please do not let any comments get your motivation down! |
i mean, i am having loads of money thrown at me, it's just being thrown at me to work on other features. |
is the devices functionality alreday available in the swarm mode ???? if not how could i hack it i would like to access dev/video on a pc and pin usb port on the raspberry pi |
@dperny - Any update on this? Really hoping this is on the roadmap for swarm? |
I understand you may not want to hear yet another "any updates?" from me... Yet, with this issue running for years now, it would really be nice to hear something like "yea, we will deliver it in Q1 2021" or whatever is the plan for it or "no, we're not implementing this, swarm is not for you if you need to access devices". It'd really help making informed decisions. |
Okay, now I realize the magnitude of the problem, the project is apparently silently deprecated, looking at the support. I'm gonna start migrating to Kubernetes and would advise others to consider doing the same. |
I really don't think the project is deprecated at all |
@prologic What makes you think so? 😲 |
@prologic I admit, that I didn't expect docker engine getting updates in 2021, but still according to changelog link you sent and also according to Github there is no change for docker engine in 20.10.5. |
Just because it doesn't see any new hot 'n shiny new features doesn't mean it's dead 😀 -- Just because Kubernetes is getting all the attention also doesn't mean Docker is dead or somehow worth less. In fact AFAIK Kubernetes still uses Docker as the default container engine anyway. 🤷♂️ |
Well, Docker is now considered deprecated in Kubernetes and will switch to unsupported in some of the next version this year. I am using Swarm on my cluster of Raspberry PIs at home as I love it's simplicity and it is also not a resource hog as Kubernetes is for these small SBCs. |
I dunno @johny-mnemonic I think the reason for Docker's deprecation in Kubernetes is more "political" than "technical". But what would I know 🤷♂️ 😂 OTOH Docker is Open Source Software. Anyone is welcome to contribute to it. There is absolutely no reason Docker as a software, platform, set of libraries (whatever) should die -- But this happens all the time in open-source. I guess it's human nature? 🤔 |
Yeah, I don't know either. |
There are no updates on this issue? @dperny the last message from you in this thread is over a year ago. maybe something has changed? |
Volume support wrapping up soon, this is probably next on the list (but might not be). |
@dperny Where can I follow the volume support development? |
This issue was updated six months ago, could you tell me is the problem solved now please? I urgently need to mount the device in swarm mode. |
@se7enXF If you need it urgently, you can use workaround and use a wrapper service which runs your container: https://serverfault.com/a/1089792 |
@djmaze Thank you for your advice. In reality, I need to use compose-file to start the service, so the method you propose is not applicable. |
@se7enXF And I guess this doesn't count as using a compose file? version: "3.7"
services:
app:
image: docker
command: docker run --rm --device <DEVICE> <IMAGE> <ARGS>
volumes:
- /var/run/docker.sock:/var/run/docker.sock:ro |
I've created a docker plugin for this but unfortunately I do not know how to make it work for cgroup v2 yet. If you are using a cgroup v1 OS like alpine to run docker swarm then this plugin may work for you https://github.com/allfro/device-volume-driver. Help is appreciated if anyone knows how to manipulate cgroup v2 EBPF device controller programs. |
I sent a chain of PRs which implement something like this at #3106 (see also #1244 (comment)). But, I didn't go so far as to actually do everything in this proposal, instead opting for the plain plumbing which is done for other options. I personally feel like this is fine. |
I would also like just a basic implemetation to start. Just something that i can work with. I accept the risks associated with that. To me its better to have a simple version and let others guide the focus of development than go fully featured in one go. You may find that half of the implementation goes unused. |
everybody ready for this christmas present :) |
... |
This is a rough overview of a proposed design for device support in Swarm. This is a possible implementation of #1244. The objective is to implement, in a way sensible to the cluster, support for devices. Please note that this is not yet on the road map; this is an early-stage proposal.
For community members, even if you don't or haven't contributed directly to swarmkit:
Does this meet or exceed the community's needs for device support? Is the UI flexible, ergonomic, and easy to use? Feel free to leave a comment explaining what is good and bad about this proposal.
Overview
Devices will be added as a first-class feature of swarm. The user will be able to define device classes, to which devices belong to. The user will be able to register devices on specific nodes, indicating to what class the device belongs to and what path the device is located at. The user can then specify device classes that a task needs to execute, and the swarmkit scheduler will assign a device to the task and place the task on the node with that device.
Goals
The goal of this proposal is to implement the most basic device-aware scheduling system, to swarmkit to fully support devices in a clustered environment.
Non-Goals
Non-goals of this proposal are to support things like security profiles or permissions. Additionally, though the device management workflow presented in this PR is a bit onerous and requires manual registration of devices, implementing automatic device detection and registration is out of scope.
Detailed Design
Data Model
The basic data model of devices is as follows:
Devices are host-specific resources, but different devices on the same or different hosts may possibly be treated as interchangeable or equivalent. For example, many nodes in the cluster could possibly be attached to some GPU. Though the actual GPU on different nodes may be different, and there may even be more than one GPU per node, their functionality is equivalent, and any of these nodes is an equally suitable candidate for scheduling. Further, some devices should only be used by one task in the cluster, whereas others can be shared between as many tasks as needed.
Device classes are the object that represents the top-level concept of a device. Tasks can only specify devices in terms of device classes they desire. The specific device chosen is the prerogative of the swarmkit scheduler.
The individual devices available are a property of the node. A node may have as many devices specified as necessary. In keeping with the security pattern of not trusting workers, devices are always registered through the swarmkit manager, never self-reported or self-discovered.
Task Specs will include a list of device classes and options desired, including where in the task’s file system to place the device. Tasks must be prepared to accept any device in the class as equivalent. When a task is created, it will have the full run-time device parameter included in the object.
User Interface
Adding devices will introduce a new command and subcommands for the management of devices. The first command, and the biggest change, will be to add new subcommands to manage device classes:
The add command adds a new device to the the swarm:
The ls command will allow listing all available device classes
The inspect command will allow showing full information about device classes, as well as allowing the user to include all devices currently registered belonging to a device class.
The remove command is similar to all other rm commands, and its usage is obvious, with the caveat that removal of a device class will be disallowed if a device is in use by task. There is no update command, as device classes will not be treated as updateable.
To manage particular devices on nodes, the existing node update command will receive new flags:
Similar to other options like ports and volumes, devices will accept both short- and long-form versions.
The short form will take the format target:class, where path is the path of device on the host, and class is the device class to register with. as such
The long form of the command allows specifying these options independently, and allows future expansion of options for devices (such as host-specific cgroup options):
The device rm option for node update acts as expected, but will disallow removing a device that is in use.
Services would also support new flags. Service create will have a new option, --device, with both a long form and a short form. The short form will be reciprocal of the the --device flag on the node, taking the form class:path. It will also optionally support a third rwm field, mirroring the --device flag on docker run. The long form will take discrete arguments, and allow the user to specify cgroup options as supported in th
The short form, for mounting a GPU:
Services would also support a long form of the command:
Note: the long form of the command could possibly support further cgroup options, as allowed in the docker REST API for container creation.
Service update would include
--device-add
and--device-rm
flags.--device-add
syntax will be equivalent to the--device
flag of create. Because a task may have more than one device of a class mounted into its running container,--device-rm
would require both the class and path of the device to disambiguate the specific device that is to be removed.REST API
The Docker engine REST API would require a new set of endpoint to accommodate the concept of device classes. These endpoints would return the JSON representation of the objects described in the example Protocol Buffers. These endpoints would be as follows:
Protocol Buffers
In swarm, protocol buffers define the internal API and object structure.
The
DeviceClass
proto will form a new top-level type, like aNetwork
or aService
. It will have an ID and a name.The
Device
proto is included as a repeated field onNode
specs. It defines a particular available device belonging to a class.The DeviceAttachmentSpec is a repeated field found in the TaskSpec proto, and defines the devices that a task should be attached to.
The DeviceAttachment is a repeated field on Tasks which defines specifically the run-time parameters of a device attachment for a particular task.
Swarmkit Implementation
The device allocator will be implemented as a sub-component of the Scheduler. It will be created when a scheduler is created, and keep track of the available devices in the cluster. Scheduling for available devices forms part of the constraint-solving portion of the scheduler.
Task updates present a difficulty for devices. If devices in the class can be shared between tasks (marked
--shared
), then there is not problem. However, the start-first update strategy would fail if there were not at least one device in a class available, such that the new task could start with a fresh device, allowing the old task to shut down and free its in-use device. There is no easy solution for this, I think. We should instead document thoroughly that usingstart-first
with devices may cause trouble.Error Handling
Because of the nature of distrusting the workers, it is difficult or impossible for swarm to “prove” that a given device exists on a node, or performs as the user expects. Swarm will therefore make no attempts to verify the correctness of provided user data. If a device is mistakenly assigned to the wrong class, or if it does not exist at all, the task is expected to fail to start. It should enter a terminal state of FAILED and should include an error message explaining that the errant device is at fault.
Notably, in this proposal, there will be no attempt to “downweight” or otherwise attempt to avoid a node with a failing device. This functionality may come later, but not as part of this proposal.
Security
It must be understood that once on the host, swarmkit has no control over how a task uses devices. If improperly used, devices can be an extreme security hole for swarm tasks. For example, mounting block devices may allow read or write access to all of their contents. If the host’s primary block device were mounted into a task, that task could have full access to the host filesystem.
About Generic Resources
Swarmkit currently includes a feature called “Generic Resources”, which serves to allow scheduling based on kinds of resources. The design doc for Generic Resources [2] outlines their use, which overlaps with the use case of this proposal. Specifically, Generic Resource already keeps track of resources which are available and in use on a cluster.
However, GenericResource has a notable deficiency: it lacks context about the runtime usage of a particular reserved resource. Essentially, a task is only informed of a resource at runtime, and the swarmkit worker has no way to know how to make use of a particular resource, which makes the feature quite useless.
The obvious solution would be to include in the TaskSpec instructions for how to make use of a resource. However, this puts the information about how to use a resource separate from the information about what resource is required. A TaskSpec might, for example, request in its ResourceReservations 3 GPUs, but in its ContainerSpec in a hypothetic Devices field, only use 2 of them, leaving 1 wasted. Or, alternatively, a TaskSpec might include instructions for mounting an audio device, but not include a reservation for one. This means that run time checks would be needed to make sure that the requested resources match the runtime instructions for using resources. Instead, this proposal uses the type system to make this kind of mismatch impossible to express.
Additionally, we cannot simply annotate or augment the GenericResource type in the task resource reservations, because the same type is shared between the TaskSpec (requested resources), the Task itself (assigned resources), and the Node (available resources). The same type is used to express which resources are available, which resources are assigned, and which resources are requested. However, these types all serve different purposes. Available resources don’t need to be aware of how they should be used by a task and requested resources can’t be aware of what resource will be assigned. This means that fields on the GenericResource would either mean different things in different places, or there would only be a subset of fields in use on any given object.
[2] https://github.com/docker/swarmkit/blob/de950a7/design/generic_resources.md
The text was updated successfully, but these errors were encountered: