Skip to content
This repository was archived by the owner on Jan 8, 2024. It is now read-only.

"waypoint destroy" doesn't work on incomplete deployments #533

Open
akin-ozer opened this issue Oct 16, 2020 · 6 comments
Open

"waypoint destroy" doesn't work on incomplete deployments #533

akin-ozer opened this issue Oct 16, 2020 · 6 comments
Assignees
Labels
bug Something isn't working core

Comments

@akin-ozer
Copy link
Contributor

Describe the bug
"waypoint destroy" command doesn't destroy resources created by "waypoint up" if waypoint up didn't successfully completed.
Steps to Reproduce
Create new deployment and interrupt after kubernetes deployment. Then use "waypoint destroy", this command returns okay but deployment is still there.

Expected behavior
"waypoint destroy" should clear resources whether deployment is successful or not.

@mitchellh
Copy link
Contributor

Thanks yes this is something we know about and need to resolve for 0.2.

@krantzinator
Copy link
Contributor

Some logs from the deployment destroy command on unsuccessful deployments

$ waypoint-dev deployment destroy -vvv
...
2021-02-05T13:17:35.147-0500 [DEBUG] waypoint: will operate on app: name=example-nodejs
2021-02-05T13:17:35.563-0500 [DEBUG] waypoint: no release found to exclude any deployments

» 3 deployments will be destroyed.
2021-02-05T13:17:35.564-0500 [TRACE] waypoint: stopping signal listeners and cancelling the context

@krantzinator krantzinator self-assigned this Feb 9, 2021
@krantzinator krantzinator removed their assignment Mar 1, 2021
@krantzinator krantzinator removed this from the 0.2.x milestone Mar 26, 2021
@acaloiaro
Copy link
Contributor

acaloiaro commented Feb 3, 2022

Hi @mitchellh since this was reported in 2020, I'm curious if this issue has seen any progress.

The issue affects more than "interrupted" deploys. It also spills over to, e.g. periodic Nomad Jobspecs.

Example output from waypoint destroy -vvv:

2022-02-03T15:13:03.701-0800 [ERROR] waypoint.runner.app.main.deploy: Unable to destroy the Deployment as the proto message Deployment returned from the plugin's DeployFunc is nil. This situation occurs when the deployment process is interrupted by the user.: id=01FV0SA2F8FXP1THFRA3S64ZY8 job_id=01FV0XSMKEYHQE2NX8CFV24018 job_op=*gen.Job_Destroy deployment="application:{application:"main" project:"example-periodic-app"} workspace:{workspace:"default"} sequence:1 id:"01FV0SA2F8FXP1THFRA3S64ZY8" generation:{id:"7039c32d5c485746b74b5c7981bda846" initial_sequence:1} state:DESTROYED status:{state:ERROR error:{code:2 message:"No evaluation with id "" found"} start_time:{seconds:1643925277 nanos:57538401} complete_time:{seconds:1643925281 nanos:638490103}} component:{type:PLATFORM name:"nomad-jobspec"} artifact_id:"01FV0S9PP12SE1Y2JTV6S26JG7" labels:{key:"waypoint/workspace" value:"default"} job_id:"01FV0XSMKEYHQE2NX8CFV24018" has_entrypoint_config:true preload:{deploy_url:"internally-subtle-flamingo--v1.waypoint.run"}"

@xiaolin-ninja
Copy link
Contributor

This issue is connected to #2089 and #2636, and will be prioritized.

@paladin-devops
Copy link
Contributor

Summarizing the desired behavior based on comments in #2089, #2636, and this issue: if a deployment fails or is canceled, waypoint destroy should not only delete the deployment from the list of deployments (to match the behavior of a destroy operation on a successful deployment), but also attempt to execute the destroy operation for the given plugin.

My take on this is that if a deployment failed/was canceled, the attempt to destroy the resources is "best-effort" and not guaranteed. I think it's worth warning the user about this scenario as well.

Also of note, I have (so far) only been able to reproduce this with a local runner. Canceling a deployment with on-demand runners, and then attempting to delete it after the fact, has so far resulted in a successful destroy operation in my testing. Will keep pushing it though!

cc/@briancain @izaaklauer

@izaaklauer
Copy link
Contributor

Looks like this is still a problem. Here are two paths to reproduce it

Steps to reproduce:

  • Initialize the kubernetes/go waypoint example
  • Run waypoint deploy, and press ctrl-c at ⠇ Waiting on deployment to become available: requested=1 running=0 ready=0. This gives you a failed deployment according to waypoint, but leaves a deployment in kubernetes
  • Run waypoint deploy twice more. On the second run, observe waypoint claim to have pruned the failed deployment:
» Pruning old deployments...
  Deployment: 01H6PBX2RRYX2S1AYTXFA73G67 (v17)
✓ Running deployment destroy v17

In fact, the deployment is still present in the cluster, and the k8s platform plugin's deploy function is not invoked.

I spooled this up in the debugger, and found we're silently exiting here:

if op.Deployment.Deployment == nil {
log.Error("Unable to destroy the Deployment as the proto message Deployment returned from the plugin's DeployFunc is nil. This situation occurs when the deployment process is interrupted by the user.", "deployment", op.Deployment)
return nil, nil // Fail silently for now, this will be fixed in v0.2
}

It's not logging that message, so I suspect there's also a problem with that logger.

I've also observed this happen when the helm plugin returns an error and there is no user ctrl-c event.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug Something isn't working core
Projects
None yet
Development

No branches or pull requests

8 participants