feature: add label-based runner targeting for operations #3037

xiaolin-ninja · 2022-02-22T14:45:04Z

Now users can also target runners by runner labels. In the Waypoint configs, they can specify:

runner {
  target_labels = {
    "env" = {
        "default" = "test"
        "prod" = "prod"
    }[workspace.name]
  }
}

or

app {
  runner {
    target_labels = {
      "env" = {
        "default" = "test"
        "prod" = "prod"
      }[workspace.name]
    }
  }
}

It also adds a "labels" column in runner list.

❯ waypoint-dev runner list
      ID      |   STATE    |  KIND  |  LABELS   | LAST REGISTERED  
--------------+------------+--------+-----------+------------------
  test        | preadopted | remote | env:test  | 1 second ago

Some notes on implementation choices:

Profile takes precedence over labels, so if someone has both a runner profile and label defined, we ignore the labels. I also chose to ignore the project-level runner stanza completely if app-level runner configs are defined, instead of parsing and piecing together labels and profiles. We could do that... but the current product doesn't support targeting distinct runners for individual apps anyway, so we could cross that bridge when we get there? It also felt a little too "nice". heh.

I also opted to put the runner look-up and targeting logic on the client-side instead of server-side (differing from the RFC); i.e. looking up a list of all runners and finding the one with the right labels, as opposed to adding a "by label" lookup in the db. Because.... the infrastructure is already there and it's easy. But listing all the runners might bite us in the butt a few years down the line with database load... but also how many runners can a project have anyway?! I was also debating whether labels are something we want to store in the db, since they feel ephemeral (as in the English word)... but we are storing adoption state, which might have the same degree of permanence. So anyway I'm happy to take the server-side approach if we feel it's better.

briancain

Looks pretty good overall! Only had a few comments before trying this out 😄

internal/client/job.go

internal/server/singleprocess/service_job.go

briancain · 2022-03-04T20:21:25Z

internal/client/runner_test.go

+		remote, err := remoteOpPreferred(ctx, client, project, runnerCfgs, log)
+		require.Nil(err)
+		require.True(remote)
+	})


I think we should have a good test around checking that TargetLabels is a proper subset of the runner labels

We can do the positive case with a few key,vals in each, and then a bad case for when it's not a subset. That should cover it I think!

briancain · 2022-03-04T20:25:16Z

internal/client/job.go

+							if val, ok := r.Labels[k]; ok && v == val {
+								n -= 1
+							}
+						}


@XX and I paired on this - We can simplify this check for looking at when TargetLabels has to be a subset of Runners with a bool

mitchellh

Commentary was done in Zoom.

mitchellh · 2022-03-04T20:02:48Z

internal/client/job.go

+					}
+					targeted := false
+					for _, r := range runners.Runners {
+						n := len(configRunner.TargetLabels)


Minor feedback: The decreasing n approach works, though it feels slightly less clear than doing something like the following, which is a more typical pattern we see for this kind of check.

matches := true for k, v := range configRunner.TargetLabels { if runner-doesnt-have-label { matches = false break } }

mitchellh · 2022-03-04T20:07:52Z

internal/client/job.go

+									},
+								},
+							}
+							targeted = true


Should you break after this?

Bigger picture: if I'm understanding this logic right, it'll always select the last matching runner. Is that correct? And I think ListRunners doesn't guarantee any ordering (including random) so this could cause all label-targeted to always pick the exact same runner. Guaranteeing some sort of loaded fairness is a hard problem and we should probably solve this by just randomizing the order.

mitchellh · 2022-03-04T20:13:05Z

internal/client/job.go

@@ -202,6 +202,7 @@ func (c *Project) setupLocalJobSystem(ctx context.Context) (isLocal bool, newCtx
 // The receiver must be careful to not block sending to mon as it will
 // block the job state processing loop.
 func (c *Project) doJobMonitored(ctx context.Context, job *pb.Job, ui terminal.UI, monCh chan pb.Job_State) (*pb.Job_Result, error) {


This causes the "scheduling" logic to leak into the client side. I don't think we should put this here. Historically, all scheduling for jobs happens on the server and I think we should keep it that way.

I saw your message in the PR of why you did this, I'm not worried about server load, I'm worried about this being logically incorrect in relatively simple cases since we can reasonably expect jobs to be queued and for runners to come in and out.

I also think that while this was an "easy" approach, it complicates the overall logic because we've now split scheduling to two locations rather than a centralized location. It's also now split between time and space. Before, scheduling happened at the same time in the same place in the code. Now it happens in two places in the code at two different times.

See the examples below.

Example bad case using T to represent time:

T=0: user submits a new job, this searches for matching labels, targets a runner A.

T=1 The job gets queued because A is busy.

T=2: A disappears (for whatever reason: crash, operator takes it down, etc.)

T=3: Job remains queued because A is gone.

T=4: Operator introduces runner B, C, etc. with matching labels.

T=infinity: Job remains queued forever because it never reschedules.

Example bad case #2:

T=0: user submits a new job, this finds runner A with matching labels.

T=1: The job gets queued because A is busy.

T=2: Runner A is moved and changes labels, it no longer matches. (either maliciously or purposefully)

T=3: Job runs on runner A, which broke a constraint. At best, the job works anyways. At worst, this was a big security issue.

The solution to this is to introduce a new Ref type that refs by label, not by name. See Ref.Runner which already supports a couple ways. This should add labels as well.

xiaolin-ninja · 2022-03-04T21:22:34Z

After discussion, client-side targeting is not the right approach, especially because it wouldn't work with polling. Closing this PR and making 3 separate incremental PRs instead for a server-side implementation.

github-actions bot added the core label Feb 22, 2022

vercel bot temporarily deployed to Preview February 26, 2022 01:44 Inactive

vercel bot temporarily deployed to Preview February 26, 2022 02:25 Inactive

github-actions bot added plugin plugin/docker labels Feb 26, 2022

xiaolin-ninja force-pushed the targetable-runners-2 branch from 5d88b16 to 558feba Compare February 26, 2022 02:31

vercel bot temporarily deployed to Preview February 26, 2022 02:31 Inactive

vercel bot temporarily deployed to Preview February 26, 2022 02:35 Inactive

xiaolin-ninja added cli and removed plugin/docker labels Feb 26, 2022

xiaolin-ninja requested a review from mitchellh February 26, 2022 02:36

xiaolin-ninja added this to the 0.8.0 milestone Feb 26, 2022

xiaolin-ninja marked this pull request as ready for review February 26, 2022 02:58

xiaolin-ninja requested a review from a team February 28, 2022 17:24

briancain reviewed Feb 28, 2022

View reviewed changes

internal/client/job.go Outdated Show resolved Hide resolved

internal/server/singleprocess/service_job.go Outdated Show resolved Hide resolved

vercel bot temporarily deployed to Preview February 28, 2022 21:32 Inactive

github-actions bot added the plugin/docker label Feb 28, 2022

xiaolin-ninja added 9 commits February 28, 2022 14:35

add labels to runner list

227fba1

add remote if labels present

196eb4a

wip client-side target by target_labels

33a05cf

target by runner label

6403618

format

975430c

remove code that doesn't belong

ffa6d49

changelog

80edce7

fix comment

b21127d

fix error

bb312fa

xiaolin-ninja force-pushed the targetable-runners-2 branch from 8b125b0 to bb312fa Compare February 28, 2022 22:49

vercel bot temporarily deployed to Preview February 28, 2022 22:49 Inactive

xiaolin-ninja removed the plugin/docker label Mar 2, 2022

add targeting for individual labels

e188064

vercel bot temporarily deployed to Preview March 4, 2022 01:01 Inactive

github-actions bot added the plugin/docker label Mar 4, 2022

Merge branch 'main' into targetable-runners-2

cd278c0

vercel bot deployed to Preview March 4, 2022 01:03 View deployment

format and better error messages

4e99556

vercel bot temporarily deployed to Preview March 4, 2022 16:45 Inactive

xiaolin-ninja removed the plugin/docker label Mar 4, 2022

briancain self-requested a review March 4, 2022 19:50

briancain reviewed Mar 4, 2022

View reviewed changes

mitchellh suggested changes Mar 4, 2022

View reviewed changes

xiaolin-ninja closed this Mar 4, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feature: add label-based runner targeting for operations #3037

feature: add label-based runner targeting for operations #3037

xiaolin-ninja commented Feb 22, 2022 •

edited

Loading

briancain left a comment

briancain Mar 4, 2022

briancain Mar 4, 2022

briancain Mar 4, 2022

mitchellh left a comment

mitchellh Mar 4, 2022

mitchellh Mar 4, 2022

mitchellh Mar 4, 2022

xiaolin-ninja commented Mar 4, 2022

feature: add label-based runner targeting for operations #3037

feature: add label-based runner targeting for operations #3037

Conversation

xiaolin-ninja commented Feb 22, 2022 • edited Loading

briancain left a comment

Choose a reason for hiding this comment

briancain Mar 4, 2022

Choose a reason for hiding this comment

briancain Mar 4, 2022

Choose a reason for hiding this comment

briancain Mar 4, 2022

Choose a reason for hiding this comment

mitchellh left a comment

Choose a reason for hiding this comment

mitchellh Mar 4, 2022

Choose a reason for hiding this comment

mitchellh Mar 4, 2022

Choose a reason for hiding this comment

mitchellh Mar 4, 2022

Choose a reason for hiding this comment

xiaolin-ninja commented Mar 4, 2022

xiaolin-ninja commented Feb 22, 2022 •

edited

Loading