-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support Tagging when scheduling the worker load #32
Comments
Tags makes balancing tasks slightly more complex, as these add constraints on how tasks can be balanced from one worker to another. My favored way to implement tags would be in a way similar to Dask's resources, by allowing workers to announce support for multiple string-based tags (e.g. "high-memory", "linux" or "GPU"). Tasks would then be declared with a set of tags that are required to run them. Tags would thus represent features provided by workers, and required by tasks. A worker would be allowed to run a task if Balancing tasks like these is more complex than without tags. For example, consider this cluster's state:
Here the optimal way of balancing the cluster would be to move Task 1 to Worker 2 and then Task 3 to Worker 1. By adding tags, balancing cannot be done only considering two individual workers (Worker 3 and Worker 2 in the example). The whole cluster (including Worker 1) must be considered in the balancing computation. Balancing algorithms exist that can find this optimal balancing (see assignment problem), but these are complex and slow. These might also cause a lot of messages to be propagated through the cluster. Instead, I propose we use a simpler but less optimal algorithm:
Worst-case for this algorithm seems to be O(|workers| • |tasks|), while the average should be closer to O(|tasks|). Dask uses a similar balancing algorithm. I'm almost done with implementing this algorithm. It's actually not that complex. @1597463007 suggested that we might get rid of this complexity by only allowing a single tag per worker, but allowing multiple tags on tasks. A worker would be allowed to run a task if It's functionally equivalent, as workers could de facto advertise multiple features or requirements with a single tag (e.g. "Linux+GPU"). Sadly, it does not solve the global balancing complexity, for example:
Like in the first example, the whole cluster must be taken into account for optimal balancing. Thus it's is not simpler than in the approach I suggested here-above. Lookup for compatible workers would be faster, but this efficiency would be compensated by the increased number of tags. Allowing only a single tag per worker and per task would prevent the worst-case inefficiency of the algorithm, but the use cases of the tagging feature would be greatly reduced. |
The text was updated successfully, but these errors were encountered: