Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support Tagging when scheduling the worker load #32

Open
sharpener6 opened this issue Oct 10, 2024 · 1 comment
Open

Support Tagging when scheduling the worker load #32

sharpener6 opened this issue Oct 10, 2024 · 1 comment
Assignees
Labels
enhancement New feature or request

Comments

@sharpener6
Copy link
Collaborator

  • Worker should send supported tag when they connected
  • Client will send task along with tags
  • Scheduler will only schedule tasks to workers that supports tags
@rafa-be
Copy link
Collaborator

rafa-be commented Dec 13, 2024

Tags makes balancing tasks slightly more complex, as these add constraints on how tasks can be balanced from one worker to another.

My favored way to implement tags would be in a way similar to Dask's resources, by allowing workers to announce support for multiple string-based tags (e.g. "high-memory", "linux" or "GPU"). Tasks would then be declared with a set of tags that are required to run them. Tags would thus represent features provided by workers, and required by tasks.

A worker would be allowed to run a task if task.tags.issubset(worker.tags).

Balancing tasks like these is more complex than without tags.

For example, consider this cluster's state:

  • Worker 1

    • Supported tags: {Linux, GPU}
    • Tasks:
      • Task 1: {Linux}
  • Worker 2

    • Supported tags: {Linux}
    • Tasks: None
  • Worker 3:

    • Supported tags: {GPU}
    • Tasks:
      • Task 2: {GPU}
      • Task 3: {GPU}

Here the optimal way of balancing the cluster would be to move Task 1 to Worker 2 and then Task 3 to Worker 1.

By adding tags, balancing cannot be done only considering two individual workers (Worker 3 and Worker 2 in the example). The whole cluster (including Worker 1) must be considered in the balancing computation.

Balancing algorithms exist that can find this optimal balancing (see assignment problem), but these are complex and slow. These might also cause a lot of messages to be propagated through the cluster.

Instead, I propose we use a simpler but less optimal algorithm:

  1. We select the worker with the highest number of tasks;
  2. We try to remove one task from this worker by moving it to a worker that has less than the average number of tasks;
    • If we cannot find a compatible worker, we stop considering this worker in step 1.
  3. Repeat step 1. until either:
    • All workers reached the average number of tasks;
    • We cannot find a worker to move task from.

Worst-case for this algorithm seems to be O(|workers| • |tasks|), while the average should be closer to O(|tasks|).
If no tag is used in the cluster, worst-case will always be O(|tasks|), like the current balancing algorithm.

Dask uses a similar balancing algorithm.

I'm almost done with implementing this algorithm. It's actually not that complex.


@1597463007 suggested that we might get rid of this complexity by only allowing a single tag per worker, but allowing multiple tags on tasks.

A worker would be allowed to run a task if task.tags.contains(worker.tag).

It's functionally equivalent, as workers could de facto advertise multiple features or requirements with a single tag (e.g. "Linux+GPU").

Sadly, it does not solve the global balancing complexity, for example:

  • Worker 1

    • Supported tag: Linux+GPU
    • Tasks:
      • Task 1: {Linux}
  • Worker 2

    • Supported tag: Linux
    • Tasks: None
  • Worker 3:

    • Supported tag: GPU
    • Tasks:
      • Task 2: {GPU, Linux+GPU}
      • Task 3: {GPU, Linux+GPU}

Like in the first example, the whole cluster must be taken into account for optimal balancing. Thus it's is not simpler than in the approach I suggested here-above. Lookup for compatible workers would be faster, but this efficiency would be compensated by the increased number of tags.

Allowing only a single tag per worker and per task would prevent the worst-case inefficiency of the algorithm, but the use cases of the tagging feature would be greatly reduced.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants