Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change App#teardown! to delete apps immediately #195

Merged
merged 1 commit into from
Feb 24, 2023
Merged

Conversation

schneems
Copy link
Contributor

@schneems schneems commented Feb 9, 2023

  • Breaking change: Delete apps on teardown. Previously hatchet would delete apps lazilly to help with debugging. This allowed developers to inspect logs and heroku run bash in the event of an unexpected failure. In practice it's rarely needed and means accounts retain a lot of Heroku apps. Previously there was no cost to retaining applications, but now basic applications incur a charge. Change details:
    • The application teardown process now deletes applications directly.
    • To skip destroying applications on teardown set HEROKU_DEBUG_EXPENSIVE=1. This will keep your applications around so that you can introspect why one failed.
    • When Hatchet needs a new application it will first delete all applications that are created at least HATCHET_ALIVE_TTL_MINUTES ago. If it cannot delete any applications and the account is already at HATCHET_APP_LIMIT it will sleep and try again later.
    • The hatchet cli has a new flag for destroy hatchet destroy --older-than=7. This will remove any apps older than the provided value (in minutes)

GUS-W-12027262

- Breaking change: Delete apps on teardown. Previously hatchet would delete apps lazily to help with debugging. This behavior allowed developers to inspect logs and `heroku run bash` in the event of an unexpected failure. In practice, it is rarely needed and causes accounts to retain apps indefinitely. Previously there was no cost to retaining applications, but now `basic` applications incur a charge. Change details:
  - The application teardown process now deletes applications directly.
  - To skip destroying applications on teardown, set `HEROKU_DEBUG_EXPENSIVE=1`. This env var will cause `App#teardown!` to skip deletion so you can introspect why one failed.
  - When hatchet needs a new application, it will first delete all applications that are created at least `HATCHET_ALIVE_TTL_MINUTES` ago. If it cannot delete any applications and the account is already at `HATCHET_APP_LIMIT`, it will sleep and try again later.
  - Introduce `--older-than` flag to `hatchet destroy` CLI command. For example, the command `hatchet destroy --older-than=7`will remove any apps older than the provided value (in minutes).

GUS-W-12027262
@schneems schneems marked this pull request as ready for review February 9, 2023 23:23
@schneems
Copy link
Contributor Author

David had a good idea: to add a chron task to auto delete old apps every day (etc.) in case there's a problem with teardown.

@Malax
Copy link
Member

Malax commented Feb 21, 2023

@schneems: FWIW, this PR has no reviewers set. Do you want the team to take a look? I don't feel competent enough with Ruby (and this code-base) to quickly review this at this point. But 👍🏻 for the change itself! :)

@schneems schneems requested a review from a team February 23, 2023 20:37
@schneems
Copy link
Contributor Author

@Malax I'm in need of a review.

@schneems schneems merged commit d0a96b0 into main Feb 24, 2023
@schneems schneems deleted the schneems/ya-basic branch February 24, 2023 18:02
def over_limit?
hatchet_app_count > hatchet_app_limit
# Destroys apps that are older than the given argument (expecting integer minutes)
def destroy_older_apps(minutes: TTL_MINUTES, force_refresh: @apps.empty?)
Copy link
Member

@edmorley edmorley Feb 27, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's no mutex on this method (only its parent), which means if it's called directly (as happens on app startup), then all Hatchet processes will be spamming the Heroku API at once. In heroku-buildpack-python's case, this means 60 processes (well minus the one that's actually doing the delete) all getting 404 app not found, and draining the API rate limit.

I think the mutex needs to be moved from clean_old_or_sleep to destroy_older_apps?

Also, it seems the destroyed apps never get removed from @apps?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Example log showing 10,000 lines of attempts to delete apps, before I killed the CI job:
https://github.com/heroku/heroku-buildpack-python/actions/runs/4284904818/jobs/7462502820#step:5:10031

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@schneems So even with locking, it seems that on-startup we're still going to have 60 calls to the Heroku API to fetch the apps list when processes=60. One way to avoid that, would be to remove the cleanup on startup entirely, and instead have users run the hatchet destroy --older-than command themselves elsewhere in the CI run (which would only be run from one process). Thoughts?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Issues resulting from Reaper "duplicate destroy attempted"
3 participants