Skip to content
Merged
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
20 changes: 12 additions & 8 deletions howto/production.md
Original file line number Diff line number Diff line change
Expand Up @@ -174,17 +174,21 @@ Set `terminationGracePeriodSeconds` to at least `SHUTDOWN_DURATION_IN_SECONDS` t

## Graceful shutdown tuning

ColdBrew's shutdown sequence (bounded by `SHUTDOWN_DURATION_IN_SECONDS`, default 15s):
Kubernetes first sends `SIGTERM`, then ColdBrew begins its in-process shutdown sequence, which is bounded by `SHUTDOWN_DURATION_IN_SECONDS` (default 15s):

1. Receive SIGTERM from Kubernetes
1. `PreStop(ctx)` on `CBPreStopper` services — deregister from service discovery, flush buffers
Comment thread
ankurs marked this conversation as resolved.
Outdated
2. `FailCheck(true)` on `CBGracefulStopper` services — `/readycheck` starts failing
3. Wait `GRPC_GRACEFUL_DURATION_IN_SECONDS` (default: 7s, included in shutdown timeout) for the load balancer to drain
4. Shutdown admin server if configured (`ADMIN_PORT`)
5. Shutdown HTTP server (stop accepting new requests)
6. `GracefulStop()` gRPC server (finish in-flight RPCs, reject new ones)
7. Force-stop gRPC server if graceful shutdown didn't complete in time
8. Call `Stop()` on `CBStopper` services — close database pools, flush metrics, drain message producers
9. Exit
4. Cancel worker context, wait for workers to exit
5. Shutdown admin server if configured (`ADMIN_PORT`)
6. Shutdown HTTP server (stop accepting new requests)
7. `GracefulStop()` gRPC server (finish in-flight RPCs, reject new ones)
8. Force-stop gRPC server if graceful shutdown didn't complete in time
9. Call `Stop()` on `CBStopper` services — close database pools, flush metrics, drain message producers
10. `PostStop(ctx)` on `CBPostStopper` services — final cleanup, audit log close
11. Exit
Comment thread
ankurs marked this conversation as resolved.

See [Shutdown Lifecycle](/howto/signals) for the full interface table and [Readiness Patterns](/howto/readiness) for combining workers with health checks.

Tune these values based on your service:

Expand Down
Loading