-
Notifications
You must be signed in to change notification settings - Fork 330
Display the runner type during CLI operations #2795
Conversation
@@ -337,7 +337,7 @@ func (s *service) onDemandRunnerStartJob( | |||
|
|||
// Arguments for the runner image. Waypoint is ALWAYS assumed to be | |||
// the entrypoint for ODR images. | |||
args := []string{"runner", "agent", "-vv", "-id", runnerId, "-odr"} | |||
args := []string{"runner", "agent", "-vv", "-id", runnerId, "-odr", "-odr-profile-id", od.Id} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't love this way of setting metadata on the runner - it feels very indirect. It would feel much cleaner to me if we called a server api here-ish to create a new "pending" runner
record of type odr, then just start the runner with the predetermined ID and no other flags. Then, when the runner registered itself to the server, it would just set its state from "pending" to "active", or something.
It also feels a little weird from a security perspective - fresh runners have the opportunity to lie about their profile id and odr-ness if we rely on them to tell the server.
If and when it's time to add the next runner flag like this, I think we should revisit this pattern.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to add some sort of runner "adoption" or "activation" step for some future work we have planned for exactly the reason you stated. Ideally: runners can relatively easy ATTEMPT to register with the server, but a human (or machine via more privileged API) has to "adopt" the runner to activate it. I think we can do this using some sort of certificate exchange mechanism (so if the runner restarts or something it can prove its adoption for a period of time). I have various thoughts here.
For now, I think this is okay.
304e14b
to
52c1d5d
Compare
internal/cli/base.go
Outdated
@@ -471,12 +471,7 @@ func remoteOpPreferred(ctx context.Context, client pb.WaypointClient, project *p | |||
} | |||
hasRemoteRunner := false | |||
for _, runner := range runnersResp.Runners { | |||
if !runner.Odr { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a great side-effect of this change! We can be more confident now about deciding if a remote op is likely to work.
940ad21
to
efce813
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code paths look generally good! I can merge this next week when you're out if you'd like to wait for @evanphx's feedback.
if err != nil { | ||
c.UI.Output("Failed to inspect the runner (id %q) assigned for this operation: %s", assignedRunner.Id, err, terminal.WithErrorStyle()) | ||
} | ||
switch runnerType := runner.Kind.(type) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I love this because it's like self-documenting of the different runners in our runner system.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One critical thing, the rest are just things to think about. Looks wonderful. Great job!
internal/cli/base.go
Outdated
// that as a remote runner, and this will return a false positive. | ||
|
||
// Also note that this is designed to run before se start our own CLI runner. | ||
if _, ok := runner.Kind.(*pb.Runner_Remote_); ok { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've always done v, ok := ... ; ok && v != nil
. What you did is PROBABLY FINE, but talking out loud I've always wondered if this is totally fine haha. You can keep it how it is, just thinking out loud.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh ha interesting - i assume v == nil
when runner.Kind
has never been set - I think it's OK here because nil will still bypass this block.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, you can do runner.Kind = (*pb.Runner_Remote_)(nil)
to make runner.Kind != nil
but once you cast it is == nil
. I can’t think of any time this would happen without someone acting weird though. (This is the interface vs typed nil)
5c8d341
to
a6a53c0
Compare
a6a53c0
to
14e2c04
Compare
14e2c04
to
5948abc
Compare
We don't expect users to ever set this when the use the `runner agent` command, even if they build their own custom images. It's only used by us internally.
5948abc
to
8b2c504
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me! 👍🏻
Per discussion with Izaak last week, I'm going to merge this in since it has all the necessary approvals and tests are passing! 🎉 |
Waypoint will soon automatically choose if operations will occur locally or remotely. Currently though, users have no visibility into where their operations take place. If there is some kind of error during the operation, users will find it valuable to know where that error occurred. For example, if it's a permissions error, they'll need to either update their local credentials providers, or go modify an ODR profile, depending on where the job ran.
With this change, the runner proto gets a new field,
kind
, which is either local, odr, or remote. Runners will know their own type on startup, and communicate that type back to the server when they first register.When the CLI creates a job, it watches the job events to see when job's
assigned_runner
ref is set. It then uses that runner ID to query the server for the complete runner proto, and displays the runner location.How to verify
Run a local build. Note the line
Performing operation locally
Turn on a remote runner (but don't set a default or project runner profile), and run the op again with
-local-false
to force the job to execute on the "static" remote runner.Note the line
Performing this operation on a remote runner with id "01FP0N64TDN3QN4D0M83RGRPD4"
Set up an ODR profile.
Note the line
Performing operation on docker with runner profile test
Future considerations
If an ODR operation fails, there's a good chance there will be some useful context in the ODR pod's logs, or it's env vars, or platform-specific setup. Even with these changes, there is no good way to figure out which odr instance (i.e. k8s pod) in particular executed the op. The best thing i've found is to compare launch timestamps, or grep the logs of all odr pods for strings I expect to see.
I don't see any obvious path to get that context.
Hacky idea: We could do something hacky, like have the task launcher plugin set some
WAYPOINT_TASK_INSTANCE_NAME
env var, and teach the runner to discover that and report it back to the server.Better (but harder) idea: We could have the task plugin report DeclaredResources via an outparamater (similar to how
status
works), and save those on a newtask
protobuf, and then tie thattask
id back to therunner
proto somehow. I think this would probably require an RFC.If anyone has any better ideas, i'm all ears.