runner: static runners accept multiple jobs in parallel #3300

mitchellh · 2022-04-29T19:18:46Z

This modifies internal/runner to support accepting multiple jobs in
parallel. Not much work here since we always designed the runner struct
from the beginning to support this so there are no data races.

This modifies internal/cli so that runners in non-ODR run in parallel
mode by default. ODR doesn't make sense to have any parallelism since
they always run exactly one job. Non-ODR runners typically ONLY launch
ODR tasks, which are highly IO-bound, so we default to a multiple above
CPU count for concurrency.

This is a necessary pre-requisite for pipelines since they'll likely
perform blocking jobs on the static runners to "watch" tasks. Today,
tasks are launched and stopped, but not watched so this is not an issue.

This modifies `internal/runner` to support accepting multiple jobs in parallel. Not much work here since we always designed the runner struct from the beginning to support this so there are no data races. This modifies `internal/cli` so that runners in non-ODR run in parallel mode by default. ODR doesn't make sense to have any parallelism since they always run exactly one job. Non-ODR runners typically ONLY launch ODR tasks, which are highly IO-bound, so we default to a multiple above CPU count for concurrency. This is a necessary pre-requisite for pipelines since they'll likely perform blocking jobs on the static runners to "watch" tasks. Today, tasks are launched and stopped, but not watched so this is not an issue.

izaaklauer

Wohoo!

briancain

Look great!

evanphx

Looks good, just one question about the semantics of canceling in the worker goroutine.

evanphx · 2022-04-29T21:56:42Z

internal/runner/accept.go

+	wg.Add(count)
+	for i := 0; i < count; i++ {
+		go func() {
+			defer cancel()


Just saying this out loud, should the exit of ANY of the goroutines cause them all to exit?

So actually, yes I think this needs to do slightly better, although its edge case-y. Here is the thought:

If the user cancels the context, then this is effectively a no-op and all of the goroutines are canceled anyways. No issue.

If a goroutine has an error, its likely the error will impact all, because there is considerable retry logic already in each Accept call -- including reconnection -- so if it actually errors it is likely unrecoverable. So we DO want to exit all the goroutines.

However, for #2, right now we're canceling the context which just causes a cascade effect to cancel ASAP. I think we can do better by just letting each existing job try to finish gracefully, and then say "don't accept any more jobs thereafter."

I'll work on this Monday.

Actually, I think this is okay for now. Its a bit non-trivial to get this fix in and looking at the possible reasons for a return from AcceptMany, i do think things are really broken if they exit so cancelling all is okay for now. We can improve this later. I've added a TODO to note it.

mitchellh requested review from evanphx, briancain and a team April 29, 2022 19:18

github-actions bot added the core label Apr 29, 2022

changelog

b543b0e

izaaklauer approved these changes Apr 29, 2022

View reviewed changes

mitchellh added 3 commits April 29, 2022 12:23

lint

51efbbf

internal/runner: log how many jobs we will accept concurrently

62bcd9f

website: regen for new concurrency flag

70724c9

github-actions bot added the website label Apr 29, 2022

vercel bot deployed to Preview April 29, 2022 19:49 View deployment

briancain approved these changes Apr 29, 2022

View reviewed changes

evanphx approved these changes Apr 29, 2022

View reviewed changes

mitchellh merged commit c1dcceb into main May 1, 2022

mitchellh deleted the static-runner-par branch May 1, 2022 23:12

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

runner: static runners accept multiple jobs in parallel #3300

runner: static runners accept multiple jobs in parallel #3300

mitchellh commented Apr 29, 2022

izaaklauer left a comment

briancain left a comment

evanphx left a comment

evanphx Apr 29, 2022

mitchellh Apr 30, 2022

mitchellh May 1, 2022

runner: static runners accept multiple jobs in parallel #3300

runner: static runners accept multiple jobs in parallel #3300

Conversation

mitchellh commented Apr 29, 2022

izaaklauer left a comment

Choose a reason for hiding this comment

briancain left a comment

Choose a reason for hiding this comment

evanphx left a comment

Choose a reason for hiding this comment

evanphx Apr 29, 2022

Choose a reason for hiding this comment

mitchellh Apr 30, 2022

Choose a reason for hiding this comment

mitchellh May 1, 2022

Choose a reason for hiding this comment