Support Tagging when scheduling the worker load #32

sharpener6 · 2024-10-10T20:14:41Z

Worker should send supported tag when they connected
Client will send task along with tags
Scheduler will only schedule tasks to workers that supports tags

rafa-be · 2024-12-13T17:16:11Z

Tags makes balancing tasks slightly more complex, as these add constraints on how tasks can be balanced from one worker to another.

My favored way to implement tags would be in a way similar to Dask's resources, by allowing workers to announce support for multiple string-based tags (e.g. "high-memory", "linux" or "GPU"). Tasks would then be declared with a set of tags that are required to run them. Tags would thus represent features provided by workers, and required by tasks.

A worker would be allowed to run a task if task.tags.issubset(worker.tags).

Balancing tasks like these is more complex than without tags.

For example, consider this cluster's state:

Worker 1
- Supported tags: {Linux, GPU}
- Tasks:
  - Task 1: {Linux}
Worker 2
- Supported tags: {Linux}
- Tasks: None
Worker 3:
- Supported tags: {GPU}
- Tasks:
  - Task 2: {GPU}
  - Task 3: {GPU}

Here the optimal way of balancing the cluster would be to move Task 1 to Worker 2 and then Task 3 to Worker 1.

By adding tags, balancing cannot be done only considering two individual workers (Worker 3 and Worker 2 in the example). The whole cluster (including Worker 1) must be considered in the balancing computation.

Balancing algorithms exist that can find this optimal balancing (see assignment problem), but these are complex and slow. These might also cause a lot of messages to be propagated through the cluster.

Instead, I propose we use a simpler but less optimal algorithm:

We select the worker with the highest number of tasks;
We try to remove one task from this worker by moving it to a worker that has less than the average number of tasks;
- If we cannot find a compatible worker, we stop considering this worker in step 1.
Repeat step 1. until either:
- All workers reached the average number of tasks;
- We cannot find a worker to move task from.

Dask uses a similar balancing algorithm.

I'm almost done with implementing this algorithm. It's actually not that complex.

@1597463007 suggested that we might get rid of this complexity by only allowing a single tag per worker, but allowing multiple tags on tasks.

A worker would be allowed to run a task if task.tags.contains(worker.tag).

It's functionally equivalent, as workers could de facto advertise multiple features or requirements with a single tag (e.g. "Linux+GPU").

Sadly, it does not solve the global balancing complexity, for example:

Worker 1
- Supported tag: Linux+GPU
- Tasks:
  - Task 1: {Linux}
Worker 2
- Supported tag: Linux
- Tasks: None
Worker 3:
- Supported tag: GPU
- Tasks:
  - Task 2: {GPU, Linux+GPU}
  - Task 3: {GPU, Linux+GPU}

Like in the first example, the whole cluster must be taken into account for optimal balancing. Thus it's is not simpler than in the approach I suggested here-above. Lookup for compatible workers would be faster, but this efficiency would be compensated by the increased number of tags.

Allowing only a single tag per worker and per task would prevent the worst-case inefficiency of the algorithm, but the use cases of the tagging feature would be greatly reduced.

sharpener6 added the enhancement New feature or request label Oct 10, 2024

rafa-be self-assigned this Nov 4, 2024

rafa-be mentioned this issue Nov 26, 2024

Scheduler incorrectly returns a task not found error #45

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support Tagging when scheduling the worker load #32

Support Tagging when scheduling the worker load #32

sharpener6 commented Oct 10, 2024

rafa-be commented Dec 13, 2024

Support Tagging when scheduling the worker load #32

Support Tagging when scheduling the worker load #32

Comments

sharpener6 commented Oct 10, 2024

rafa-be commented Dec 13, 2024