Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Move health checks into Docker #219

Merged
merged 1 commit into from
Apr 28, 2023
Merged

Move health checks into Docker #219

merged 1 commit into from
Apr 28, 2023

Conversation

kevinmcconnell
Copy link
Collaborator

Replaces our current host-based HTTP healthchecks with Docker healthchecks, and adds a new healthcheck.cmd config option that can be used to define a custom health check command. Also removes Traefik's healthchecks, since they are no longer necessary (Traefik watches the Docker health change events).

When deploying a container that has a healthcheck defined, we wait for it to report a healthy status before stopping the old container that it replaces. Containers that don't have a healthcheck defined continue to wait for MRSK.config.readiness_delay.

There are some pros and cons to using Docker healthchecks rather than checking from the host. The main advantages are:

  • Supports non-HTTP checks, app-specific check scripts provided by a container, etc.
  • When booting a container, allows MRSK to wait for it to be healthy before shutting down the old container it replaces. This should be safer than relying on a timeout after start.
  • Containers with healthchecks won't become active in Traefik until they reach a healthy state, which prevents any traffic from being routed to them before they are ready.
  • Health status becomes visible in other tooling, like docker ps output, mrsk app details, etc.

The main disadvantage is that containers are now required to provide some way to check their health. MRSK's default check assumes that curl is available in the container which, while common, won't always be the case.

Comment on lines +18 to +19
app = MRSK.app(role: role)
auditor = MRSK.auditor(role: role)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We were starting to say MRSK.app(role: role) etc a lot in this method.

Comment on lines +21 to 23
def status
pipe container_id, xargs(docker(:inspect, "--format", DOCKER_HEALTH_STATUS_FORMAT))
end
Copy link
Collaborator Author

@kevinmcconnell kevinmcconnell Apr 13, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Healthcheck and App both define a status command to fetch container status. It's a bit same-y, but the container name/id differs in each, and letting them figure that out themselves ended up looking clearer.

Comment on lines -78 to -79
"traefik.http.services.#{traefik_service}.loadbalancer.healthcheck.path" => config.healthcheck["path"],
"traefik.http.services.#{traefik_service}.loadbalancer.healthcheck.interval" => "1s",
Copy link
Collaborator Author

@kevinmcconnell kevinmcconnell Apr 13, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't need Traefik to healthcheck any more, since Docker is doing it. Also this wouldn't work for any custom cmd anyway.

@@ -0,0 +1,39 @@
class Mrsk::Utils::HealthcheckPoller
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Named this HealthcheckPoller rather than something like Healthcheck, since the actual healthcheck is provided for us by Docker. This class just needs to know how to wait for the status to reach something runnable.

Copy link
Member

@intrip intrip left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fantastic 👏👏

README.md Outdated Show resolved Hide resolved
Replaces our current host-based HTTP healthchecks with Docker
healthchecks, and adds a new `healthcheck.cmd` config option that can be
used to define a custom health check command. Also removes Traefik's
healthchecks, since they are no longer necessary.

When deploying a container that has a healthcheck defined, we wait for
it to report a healthy status before stopping the old container that it
replaces. Containers that don't have a healthcheck defined continue to
wait for `MRSK.config.readiness_delay`.

There are some pros and cons to using Docker healthchecks rather than
checking from the host. The main advantages are:

- Supports non-HTTP checks, and app-specific check scripts provided by a
  container.
- When booting a container, allows MRSK to wait for a container to be
  healthy before shutting down the old container it replaces. This
  should be safer than relying on a timeout.
- Containers with healthchecks won't be active in Traefik until they
  reach a healthy state, which prevents any traffic from being routed to
  them before they are ready.

The main _disadvantage_ is that containers are now required to provide
some way to check their health. Our default check assumes that `curl` is
available in the container which, while common, won't always be the
case.
@tbuehlmann
Copy link
Contributor

Oh this seems nice.

@dhh dhh merged commit 4fa6a6c into basecamp:main Apr 28, 2023
@jeremy jeremy deleted the docker-health-checks branch May 1, 2023 17:31
@amerine
Copy link

amerine commented May 8, 2023

Excellent change!!!

duyanton added a commit to duyanton/rails that referenced this pull request May 12, 2023
So the container can work with mrsk healthcheck.

See: basecamp/kamal#219
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants