Skip to content
This repository has been archived by the owner on Jan 8, 2024. It is now read-only.

Display the runner type during CLI operations #2795

Merged
merged 10 commits into from
Dec 13, 2021
Merged

Conversation

izaaklauer
Copy link
Contributor

@izaaklauer izaaklauer commented Dec 3, 2021

Waypoint will soon automatically choose if operations will occur locally or remotely. Currently though, users have no visibility into where their operations take place. If there is some kind of error during the operation, users will find it valuable to know where that error occurred. For example, if it's a permissions error, they'll need to either update their local credentials providers, or go modify an ODR profile, depending on where the job ran.

With this change, the runner proto gets a new field, kind, which is either local, odr, or remote. Runners will know their own type on startup, and communicate that type back to the server when they first register.

When the CLI creates a job, it watches the job events to see when job's assigned_runner ref is set. It then uses that runner ID to query the server for the complete runner proto, and displays the runner location.

How to verify

Run a local build. Note the line Performing operation locally

$ waypoint build -local

» Building web...
Performing operation locally
✓ Initializing Docker client...
✓ Building image...
 │ Step 1/3 : FROM nginx:stable
 │  ---> c8d03f6b8b91
 │ Step 2/3 : COPY ./public/ /var/www
 │  ---> Using cache

Turn on a remote runner (but don't set a default or project runner profile), and run the op again with -local-false to force the job to execute on the "static" remote runner.
Note the line Performing this operation on a remote runner with id "01FP0N64TDN3QN4D0M83RGRPD4"

$ waypoint build -local=false

» Building learn-waypoint-lambda...
  Performing this operation on a remote runner with id "01FP0N64TDN3QN4D0M83RGRPD4"

» Cloning data from Git
  URL: https://github.com/hashicorp/waypoint-examples
  Ref: us-east-1

Set up an ODR profile.
Note the line Performing operation on docker with runner profile test

$ ~/dev/waypoint/waypoint build -local=false

» Building learn-waypoint-lambda...
  Performing operation on docker with runner profile test

» Cloning data from Git
  URL: https://github.com/hashicorp/waypoint-examples
  Ref: us-east-1
...

Future considerations

If an ODR operation fails, there's a good chance there will be some useful context in the ODR pod's logs, or it's env vars, or platform-specific setup. Even with these changes, there is no good way to figure out which odr instance (i.e. k8s pod) in particular executed the op. The best thing i've found is to compare launch timestamps, or grep the logs of all odr pods for strings I expect to see.

I don't see any obvious path to get that context.

Hacky idea: We could do something hacky, like have the task launcher plugin set some WAYPOINT_TASK_INSTANCE_NAME env var, and teach the runner to discover that and report it back to the server.

Better (but harder) idea: We could have the task plugin report DeclaredResources via an outparamater (similar to how status works), and save those on a new task protobuf, and then tie that task id back to the runner proto somehow. I think this would probably require an RFC.

If anyone has any better ideas, i'm all ears.

@@ -337,7 +337,7 @@ func (s *service) onDemandRunnerStartJob(

// Arguments for the runner image. Waypoint is ALWAYS assumed to be
// the entrypoint for ODR images.
args := []string{"runner", "agent", "-vv", "-id", runnerId, "-odr"}
args := []string{"runner", "agent", "-vv", "-id", runnerId, "-odr", "-odr-profile-id", od.Id}
Copy link
Contributor Author

@izaaklauer izaaklauer Dec 3, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't love this way of setting metadata on the runner - it feels very indirect. It would feel much cleaner to me if we called a server api here-ish to create a new "pending" runner record of type odr, then just start the runner with the predetermined ID and no other flags. Then, when the runner registered itself to the server, it would just set its state from "pending" to "active", or something.

It also feels a little weird from a security perspective - fresh runners have the opportunity to lie about their profile id and odr-ness if we rely on them to tell the server.

If and when it's time to add the next runner flag like this, I think we should revisit this pattern.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to add some sort of runner "adoption" or "activation" step for some future work we have planned for exactly the reason you stated. Ideally: runners can relatively easy ATTEMPT to register with the server, but a human (or machine via more privileged API) has to "adopt" the runner to activate it. I think we can do this using some sort of certificate exchange mechanism (so if the runner restarts or something it can prove its adoption for a period of time). I have various thoughts here.

For now, I think this is okay.

@@ -471,12 +471,7 @@ func remoteOpPreferred(ctx context.Context, client pb.WaypointClient, project *p
}
hasRemoteRunner := false
for _, runner := range runnersResp.Runners {
if !runner.Odr {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a great side-effect of this change! We can be more confident now about deciding if a remote op is likely to work.

@izaaklauer izaaklauer force-pushed the store-runner-location branch from 940ad21 to efce813 Compare December 3, 2021 18:16
@izaaklauer izaaklauer marked this pull request as ready for review December 3, 2021 18:20
@izaaklauer izaaklauer changed the title DRAFT: display the runner type during CLI operations Display the runner type during CLI operations Dec 3, 2021
@izaaklauer izaaklauer requested a review from a team December 3, 2021 18:20
@izaaklauer izaaklauer requested a review from evanphx December 7, 2021 16:19
Copy link
Contributor

@krantzinator krantzinator left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code paths look generally good! I can merge this next week when you're out if you'd like to wait for @evanphx's feedback.

internal/cli/runner_agent.go Show resolved Hide resolved
if err != nil {
c.UI.Output("Failed to inspect the runner (id %q) assigned for this operation: %s", assignedRunner.Id, err, terminal.WithErrorStyle())
}
switch runnerType := runner.Kind.(type) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I love this because it's like self-documenting of the different runners in our runner system.

Copy link
Contributor

@mitchellh mitchellh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One critical thing, the rest are just things to think about. Looks wonderful. Great job!

// that as a remote runner, and this will return a false positive.

// Also note that this is designed to run before se start our own CLI runner.
if _, ok := runner.Kind.(*pb.Runner_Remote_); ok {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've always done v, ok := ... ; ok && v != nil. What you did is PROBABLY FINE, but talking out loud I've always wondered if this is totally fine haha. You can keep it how it is, just thinking out loud.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh ha interesting - i assume v == nil when runner.Kind has never been set - I think it's OK here because nil will still bypass this block.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, you can do runner.Kind = (*pb.Runner_Remote_)(nil) to make runner.Kind != nil but once you cast it is == nil. I can’t think of any time this would happen without someone acting weird though. (This is the interface vs typed nil)

internal/client/job.go Outdated Show resolved Hide resolved
internal/client/job.go Outdated Show resolved Hide resolved
internal/server/proto/server.proto Outdated Show resolved Hide resolved
internal/server/proto/server.proto Outdated Show resolved Hide resolved
internal/server/proto/server.proto Show resolved Hide resolved
Copy link
Member

@briancain briancain left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me! 👍🏻

@izaaklauer izaaklauer removed the request for review from evanphx December 10, 2021 13:38
@krantzinator
Copy link
Contributor

Per discussion with Izaak last week, I'm going to merge this in since it has all the necessary approvals and tests are passing! 🎉

@krantzinator krantzinator added this to the 0.7.0 milestone Dec 13, 2021
@krantzinator krantzinator merged commit a9db39e into main Dec 13, 2021
@krantzinator krantzinator deleted the store-runner-location branch December 13, 2021 16:40
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants