Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[proposal] Support abstract resource #594

Closed
dongluochen opened this issue May 13, 2016 · 10 comments
Closed

[proposal] Support abstract resource #594

dongluochen opened this issue May 13, 2016 · 10 comments

Comments

@dongluochen
Copy link
Contributor

dongluochen commented May 13, 2016

Currently we only support CPU and memory as resource type. In docker/swarm users have been asking to support arbitrary resources. Docker/swarm added a containerslots to limit number of containers as a mitigation. It falls short of supporting multi-category resource allocation. Here is an example docker-archive/classicswarm#2223.

It'd be great to allow user specify any resource, bandwidth, GPU count, disk space, or artificial ones. It could be scalar like 0.25 CPU, or discrete like GPU-1 (don't divide GPU). Swarm can handle these resource without knowing the physical meaning. CPU and memory are 2 resource instances.

@vieux provides pointer to Mesos resource description.
http://mesos.apache.org/documentation/latest/attributes-resources/

@stevvooe
Copy link
Contributor

I think this works well in mesos due to their resource offer mechanism.

Here is a code sketch for matching requirements with provisions or offers, abstracting the actual space of the resource (ie set, list, range):

Node.Provides() []ResourceOffer
Task.Requires() []ResourceRequirement

type AllocationRequest struct {
  Requirement ResourceRequirement
  Offer ResourceOffer
}

var requests []AllocationRequest
// see if a node can satisfy the resources
for _, requirement := range task.Requires() {
  for _, offer := range node.Provides() {
    if err := offer.Satisfies(requirement); err != nil {
      /* not satisfied, skip node, log, etc. */ 
      return err
    }

    // requirement, offer pair found
    requests = append(requests, AllocationRequest{requirement, offer})
  }
}

// applies each request such that the requirement is "subtracted" from offer.
if err := node.Allocate(requests); err != nil {
  /* allocation failed */
}

// Node.Allocate
for _, request := range requests {
  request.Offer.Reserve(request.Requirement)
}

Ideally, this is how the allocator component would work.

The above example also assumes only nodes provision resources, but we can probably parameterize the provisioner set through the requirement or let the requirement match the resource itself.

@runshenzhu
Copy link
Contributor

runshenzhu commented May 31, 2016

I'm going to implement abstracted resources filter in swarmkit. The idea is borrowed from mesos, here is the design details:

  • resources are desribed as scalar type or set type. For example, cpu and memory resources could be scalar type, while any other exclusively accessed resources, such as gpu, could be described in set type.
  • Resources Declaration: when an agent is booted, it will autodetect some pre-defined resources, such as cpu and memory. Also, users can explicitly declare which resources a node would provide, by using --resources flag. For example, --resources='cpus:24;gpus:{GTX-1080}' declares 24 cpus and 1 gpu in this node.
  • Allocate and Free Resources:
    • When creating services, client can ask for resources by specify resources section in yml file. Then scheduler will use this requirement as resources filter to find best candidate node and allocate coresponding resources from this node.
    • When task is finished, or terminated, resources it used would give back to node.

To simplify the design and implementation, the first step is to only implement scalar type resources, because set type could be handled as a special case in scalar. Then, if needed, set type (and range type) will be supported.

@dongluochen @stevvooe Please help review it.

@dongluochen
Copy link
Contributor Author

Thanks @runshenzhu. I don't think set type could be handled by scalar. But implementing scalar first is good. We need some design on resources collection from engine default (CPU/mem/disk), and from resource label (GPU, etc).

cc @aluzzardi .

@dongluochen
Copy link
Contributor Author

Case study: NVDIA GPU support moby/moby#23917

@NanXiao
Copy link
Contributor

NanXiao commented Jul 19, 2016

@dongluochen I posted moby/moby#24750 before, but think maybe this issue is more appropriate discussed here:

I am considering using Docker Swarm to orchestrate Swarm Cluster which supports GPU feature. Per my understanding, to implement this function, Docker need to envelope GPU resource like CPU and memory (E.g., add GPU in Cluster.Engine structure), and passes it to Docker Swarm when Swarm manager issues manage command. And Swarm manager and Docker engine-API all need do corresponding modifications to explain and use GPU resources (E.g., add GPU in node.Node structure).

My questions are as follows:

(1) Do my above statements make sense? Or are there some important technical issues which I ignore?

(2) What is the current status of Docker Swarm community supportting GPU feature?

Thanks very much in advance!

@dongluochen
Copy link
Contributor Author

@NanXiao Docker support for GPU is under discussion in docker/docker#23917. I don't see problem in your statement but there are technical problems need solutions. The proposal on abstract resources should accommodate GPU as a resource type so Swarm cluster can manage it. It's based on Docker's GPU support.

@NanXiao
Copy link
Contributor

NanXiao commented Jul 20, 2016

@dongluochen Thanks very much for your reply! So per my understanding, Swarm project won't make the first move unless Docker support GPU first. Is it right? thx!

@dongluochen
Copy link
Contributor Author

@NanXiao This proposal (abstract resource) can be independent of GPU support. We haven't fixed plan on it.

@cheyang
Copy link
Contributor

cheyang commented Aug 10, 2016

@runshenzhu ,
I agree with @dongluochen that Scalar type can't handle Set type, and Set type is a very important case to consider.

Take GPU as an example, if there are 2 GPU resources (gpu0 and gpu1) in the same node. The end user only need to ask for 1 GPU, but the scheduler should know which GPU has been assigned, and ask the provisioner to provision with the left.

My point is Scalar is good, but it may be not so helpful in the real case. Can you please also consider Set as the key feature. Thanks very much.

@aaronlehmann
Copy link
Collaborator

Implemented in #2090.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants