Design and Discussion for One-off Tasks/Jobs/Whatever #2852

dperny · 2019-05-03T17:56:05Z

A highly requested feature in for Swarm Mode is the ability to run one-off operations of some kind. However, the scope of exactly what users need from these one-off jobs is too broad to make satisfactory progress on at the moment. Therefore, this issue is for design, discussion, and sharing of use cases, in order to pin down exactly what features users need.

The goal is to converge on a simple but powerful service mode that accomplishes one-off operations for the large majority of users while not compromising on Swarm's promises of simplicity.

In technical terms, currently, swarm Tasks can enter several different terminal states. Relevant among these is the COMPLETED state, which a Task enters if it exits with exit code zero. Therefore, it should be possible to create a new service mode (in addition to replicated or global) which has special handling for this terminal state in order to run one-off tasks. This would require a new Orchestrator component to handle this new service mode.

Some examples of open questions are below. This is a non-exhaustive list, and you should feel free to bring up anything else in this space.

Should one-off tasks be only one task per service, or should you be able to schedule more than one concurrent task as part of the same job?
What happens if a one-off task fails? Should it be rescheduled and retried? If there are multiple tasks in a job, what happens if some fail and some succeed? What happens if a task persistently fails? Should there be a failure threshold for when we stop trying?
Is the existing docker service CLI command adequate to express the desired behaviors, or is a new CLI command needed? What should it look like?
What kind of workloads do you want to run? What do you currently do to solve those use cases?
Is cron-style periodic scheduling support a necessary part of this for you?

The discussion in this issue will lead to a full design document for community review before we start building anything.

The text was updated successfully, but these errors were encountered:

mshirley · 2019-05-07T01:27:34Z

so i come from a python background and what i would love to see is something similar to celery which has a lot of the functionality that would be nice.

https://docs.celeryproject.org/en/latest/getting-started/introduction.html

• Should one-off tasks be only one task per service, or should you be able to schedule more than one concurrent task as part of the same job?

i think that each task should be a standalone entity which is scheduled and put on the queue. if you want concurrent tasks as part of a job perhaps you can simply tag each task with a parent job and abstract the linking of tasks to a job to a higher level.

• What happens if a one-off task fails? Should it be rescheduled and retried? If there are multiple tasks in a job, what happens if some fail and some succeed? What happens if a task persistently fails? Should there be a failure threshold for when we stop trying?

if a single task fails there should be an option to retry given a certin configurable limit in time or count. it would be nice to have the option to configure the job as a whole to be marked as failed if >n tasks fail but the status of every task should be reported.

• Is the existing docker service CLI command adequate to express the desired behaviors, or is a new CLI command needed? What should it look like?

existing cli would be fine for me

• What kind of workloads do you want to run? What do you currently do to solve those use cases?

any periodic or ondemand command. basically anything you can think of, long running and short running.

• cron-like jobs that need to be run async on a daily or hourly schedule
• submitting jobs to other systems based on progromatic input such as reading from a kafka queue and executing a task every time a specific message comes across. this could be used for alerting or simply
kicking off a job in another system when a certin condition is met.

• Is cron-style periodic scheduling support a necessary part of this for you?

yes

ohnotnow · 2019-05-07T16:39:44Z

My use-case for one-off tasks is mostly the problem of running DB migrations (or similar 'update something once that the new code needs'. So I'd be really, really delighted by this. Especially (yes, feature creep!) if there was something we could tag like k8s 'init containers'.

At the moment we have a special part of our entrypoint scripts that pretty much does migrate_db(); while true; do sleep 86400; done - which always feels a bit hacky.

olljanat · 2019-05-13T08:45:00Z

Btw. Portainer (which IMO is best open source management tool for swarm) recently introduced their cron type scheduler. It works on way that it schedules one-off containers to run those tasks.

You can try it on on their public demo instance http://demo.portainer.io/ (login with the username admin and the password tryportainer). Just enable Enable host management features from settings and you will see Scheduler -> Host jobs view.

I can see that one-off service support on swarmkit would allow them to improve it schedule jobs which example run once on all nodes with certain labels, etc.

usbrandon · 2019-05-20T19:10:57Z

We wanted to use swarm as a fabric of connected hosts that could take on various ETL and automation jobs we have. We wanted to schedule them, like we do now in cron and be able to capture the log output. Swarm should detect which host has enough free resources to run the job/container is important. Successful jobs should end and clean themselves up. Failed jobs should stick around in some way that we can study them to see what went wrong; but they should not get in the way of a next scheduled run.

markbirbeck · 2019-06-01T09:35:41Z

I don't know if this helps with specifying this functionality, but from the implementation standpoint I split the problem in two--first, the ability to run a one-off job on a swarm, which requires it to be specified as a service, and second, the ability to schedule jobs.

I've only tackled the first part so far, for which I developed 'Docker Job':

https://github.com/markbirbeck/docker-job

"The docker-job command-line application launches an image by wrapping it in a service and running it. Options are available to determine whether to output the logs when the job has completed, run more than one replica at the same time, and so on."

dperny · 2019-07-01T18:59:27Z

I've opened a proposal for this, which is at moby/moby#39447. PTAL if you're interested in this feature, and confirm that the proposal meets your needs as a user.

jnovack · 2019-10-27T13:42:25Z

Can you provide some examples (actual or contrived) of jobs ("community"-use, "enterprise"-use) that one would submit?

I'm just not that worldly, so my frame-of-reference is limited (I'm a low-impact swarm-user, home/hobby/small-business area; so I don't understand "enterprise-level" requirements); or would the intention be that a third-party product pick up scheduling (much like portainer provides UI management; Docker makes the backend, someone can make the front-end, like alexellis/jaas).

(1) What kind of job(s) would you want to create, on-demand, that wouldn't already be a running service sitting idle, waiting for it's next call?

(2) How would you envision these jobs to be created, and separately, to be run (on-demand), if not for the scheduler? (Through the third-party front-end which uses the API?)

Is one-of-the-potentially-many use-cases to be a CI/CD task-runner? Am I thinking about this right? Some third-party scheduler additionally has a queue, and then just sends it over to swarm to execute and capture output/return status?

Help me want this. :)

Sidebar: Having the discussion here to not clog moby/moby#39447.

cblomart · 2019-11-10T09:23:55Z

When reading the thread it feels that too much is sometimes required from jobs.

When thinking of jobs I mainly think about being able to:

run a one off task when requested
run tasks on schedules this includes equivalent of « at » and « cron »

One of the main use case I have is the integration of ci/cd in there:
Most of the existing tools (Jenkins/GitLab-runners/Drone/...) will have a way to integrate docker/swarm... most of the time they would add an agent per swarm node and run containers directly on the node... so dependent job (I.e a build of a git repo) will happen on only one node. I think this can limit scalability and parallelism.
Swarm could be there to handle loadbalancing (scheduling) and the CI/CD can still handle the job orchestration.

I don’t know the inner workings on faas (function as a service) but I can imagine the same principle there.

Going from there:

docker logs in a swarm: acces logs of a task from the swarm and not needing to address the specific node
docker exec in a swarm: same principle as previous point but for exécution. Some CI/CD platforms will think like... start a container and so things in it

dperny added kind/feature roadmap kind/proposal labels May 3, 2019

olljanat mentioned this issue May 13, 2019

Is there a roadmap for docker swarm? #2665

Open

chris-crone mentioned this issue Jun 13, 2019

Add support for non-service container docker/compose#6743

Closed

dperny mentioned this issue Jul 1, 2019

Swarm Jobs Proposal moby/moby#39447

Closed

dperny mentioned this issue Jan 9, 2020

Add support for swarm jobs moby/moby#40307

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Design and Discussion for One-off Tasks/Jobs/Whatever #2852

Design and Discussion for One-off Tasks/Jobs/Whatever #2852

dperny commented May 3, 2019

mshirley commented May 7, 2019 •

edited

Loading

ohnotnow commented May 7, 2019

olljanat commented May 13, 2019

usbrandon commented May 20, 2019

markbirbeck commented Jun 1, 2019

dperny commented Jul 1, 2019

jnovack commented Oct 27, 2019

cblomart commented Nov 10, 2019

Design and Discussion for One-off Tasks/Jobs/Whatever #2852

Design and Discussion for One-off Tasks/Jobs/Whatever #2852

Comments

dperny commented May 3, 2019

mshirley commented May 7, 2019 • edited Loading

ohnotnow commented May 7, 2019

olljanat commented May 13, 2019

usbrandon commented May 20, 2019

markbirbeck commented Jun 1, 2019

dperny commented Jul 1, 2019

jnovack commented Oct 27, 2019

cblomart commented Nov 10, 2019

mshirley commented May 7, 2019 •

edited

Loading