Support clean shutdown #320

teosoft123 · 2018-10-15T20:34:37Z

It would be great to implement a clean shutdown mode, that handles some signal and waits for all the child processes to complete before really shutting down.

This is closely related to upgrades in situations where Atlantis Server code is cleanly separated from data, for example running atlantis sever in a Docker container with atlantis user home mounted as a Docker volume.

Tracking child processes can be nicely done by using cgroups, but I would discourage this implementation because it does not have a close analogs in the Docker/k8s world. AFAIK anyways.

atheiman · 2019-08-26T18:15:33Z

We run Atlantis in AWS Fargate and for upgrading Atlantis or pushing out configuration changes we first block ingress to the fargate task on it's AWS ELB. Then after some time we assume all Atlantis Terraform processes have completed and we recreate the fargate task with new configuration or container image tag using local terraform.

But we don't reliably know when all the terraform tasks are completed. I think this could be improved by adding an api endpoint to get the count of current terraform processes running.

Then our upgrade process would be:

Restrict ingress to IP address where I am running terraform
Poll that api endpoint for current count of tf processes - wait for count to drop to 0
Safely terraform apply the atlantis upgrade

lkysow · 2019-08-26T18:35:36Z

The work to know how many TF processes are running is the same to properly pass a context through to everything and then keep the Atlantis process running until all the TF processes are stopped so I'm not sure we need an API endpoint.

atheiman · 2019-08-26T18:38:21Z

Yea that would be fine as long as terraform applying fargate task changes can support waiting for the clean shutdown to happen - we use https://github.com/terraform-aws-modules/terraform-aws-atlantis

I could see this being a problem if a clean shutdown is waiting an hour for a long terraform process to finish

lkysow · 2019-08-26T18:40:04Z

Hmm, it looks like there's a 2m max (https://forums.aws.amazon.com/thread.jspa?messageID=907417) so that wouldn't necessarily work. Maybe an API endpoint like /drain or something would be necessary.

benoit74 · 2020-03-23T18:32:11Z

Hi,
I'm working on it (implementing a drain).
I need it since we are deploying Atlantis with Atlantis in a K8s cluster, so we need RollingUgrades + a clean pod termination.
As you suggested, I'm implementing a drain endpoint, with a POST to start the drain and a GET to check its completion.
I will probably implement an operation like "atlantis shutdown" which will call this endpoint locally and wait for completion of the drain, so that it can be used in the preStop hook on K8s. I will probably have a working prototype before the end of the week.
We will battlefield test it asap on our cluster.
This means that I will propose as well a chart update for rollingUgrades + preStop hook in lifecycle.

lkysow added the feature New functionality/enhancement label Apr 4, 2019

atheiman mentioned this issue Aug 26, 2019

UI should allow creating a lock for maintenance/upgrade purposes #341

Open

benoit74 mentioned this issue Mar 24, 2020

Issue 320: support clean shutdown #953

Merged

lkysow mentioned this issue May 25, 2020

Support graceful shutdown #1051

Merged

lkysow closed this as completed in #1051 May 25, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support clean shutdown #320

Support clean shutdown #320

teosoft123 commented Oct 15, 2018

atheiman commented Aug 26, 2019

lkysow commented Aug 26, 2019

atheiman commented Aug 26, 2019 •

edited

Loading

lkysow commented Aug 26, 2019

benoit74 commented Mar 23, 2020

Support clean shutdown #320

Support clean shutdown #320

Comments

teosoft123 commented Oct 15, 2018

atheiman commented Aug 26, 2019

lkysow commented Aug 26, 2019

atheiman commented Aug 26, 2019 • edited Loading

lkysow commented Aug 26, 2019

benoit74 commented Mar 23, 2020

atheiman commented Aug 26, 2019 •

edited

Loading