-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Change App#teardown! to delete apps immediately #195
Conversation
46d9e70
to
60e0412
Compare
- Breaking change: Delete apps on teardown. Previously hatchet would delete apps lazily to help with debugging. This behavior allowed developers to inspect logs and `heroku run bash` in the event of an unexpected failure. In practice, it is rarely needed and causes accounts to retain apps indefinitely. Previously there was no cost to retaining applications, but now `basic` applications incur a charge. Change details: - The application teardown process now deletes applications directly. - To skip destroying applications on teardown, set `HEROKU_DEBUG_EXPENSIVE=1`. This env var will cause `App#teardown!` to skip deletion so you can introspect why one failed. - When hatchet needs a new application, it will first delete all applications that are created at least `HATCHET_ALIVE_TTL_MINUTES` ago. If it cannot delete any applications and the account is already at `HATCHET_APP_LIMIT`, it will sleep and try again later. - Introduce `--older-than` flag to `hatchet destroy` CLI command. For example, the command `hatchet destroy --older-than=7`will remove any apps older than the provided value (in minutes). GUS-W-12027262
60e0412
to
cdb49d2
Compare
David had a good idea: to add a chron task to auto delete old apps every day (etc.) in case there's a problem with teardown. |
@schneems: FWIW, this PR has no reviewers set. Do you want the team to take a look? I don't feel competent enough with Ruby (and this code-base) to quickly review this at this point. But 👍🏻 for the change itself! :) |
@Malax I'm in need of a review. |
def over_limit? | ||
hatchet_app_count > hatchet_app_limit | ||
# Destroys apps that are older than the given argument (expecting integer minutes) | ||
def destroy_older_apps(minutes: TTL_MINUTES, force_refresh: @apps.empty?) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's no mutex on this method (only its parent), which means if it's called directly (as happens on app startup), then all Hatchet processes will be spamming the Heroku API at once. In heroku-buildpack-python's case, this means 60 processes (well minus the one that's actually doing the delete) all getting 404 app not found, and draining the API rate limit.
I think the mutex needs to be moved from clean_old_or_sleep
to destroy_older_apps
?
Also, it seems the destroyed apps never get removed from @apps
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Example log showing 10,000 lines of attempts to delete apps, before I killed the CI job:
https://github.com/heroku/heroku-buildpack-python/actions/runs/4284904818/jobs/7462502820#step:5:10031
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@schneems So even with locking, it seems that on-startup we're still going to have 60 calls to the Heroku API to fetch the apps list when processes=60. One way to avoid that, would be to remove the cleanup on startup entirely, and instead have users run the hatchet destroy --older-than
command themselves elsewhere in the CI run (which would only be run from one process). Thoughts?
heroku run bash
in the event of an unexpected failure. In practice it's rarely needed and means accounts retain a lot of Heroku apps. Previously there was no cost to retaining applications, but nowbasic
applications incur a charge. Change details:HEROKU_DEBUG_EXPENSIVE=1
. This will keep your applications around so that you can introspect why one failed.HATCHET_ALIVE_TTL_MINUTES
ago. If it cannot delete any applications and the account is already atHATCHET_APP_LIMIT
it will sleep and try again later.hatchet destroy --older-than=7
. This will remove any apps older than the provided value (in minutes)GUS-W-12027262