Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implementing readiness probe for Kubernetes #253

Open
nept opened this issue Oct 9, 2024 · 4 comments
Open

Implementing readiness probe for Kubernetes #253

nept opened this issue Oct 9, 2024 · 4 comments

Comments

@nept
Copy link

nept commented Oct 9, 2024

Hi,

We're considering replacing Puma with Falcon in our Rails app running on top of Kubernetes, but we're struggling to implement effective readiness probes. Unlike Puma, where thread availability is easily checked, Falcon's use of fibers makes it more challenging.

We're considering using the availability of ActiveRecord connections as a metric for Falcon's readiness probe. However, we're uncertain about its efficiency and reliability. Do you have some best practices for implementing it?

Additionally, we're interested in learning how other users typically handle resource allocation (CPU/memory) and ActiveRecord pool management in their production environments.

Thanks

@ioquatix
Copy link
Member

ioquatix commented Oct 10, 2024

We already have a readiness interface as part of async-container which by default supports systemd and a few other mechanisms:

https://github.com/socketry/async-container/blob/main/lib/async/container/notify/client.rb

Wondering what would be the best way to expose this to Kubernetes? Maybe a command?

Regarding memory usage and pool size, that will be highly application specific. In general, a Rails app will be around ~500MiB per server instance and the number of database connections will depend on the number of clients you are hoping to serve simultaneously.

The new activerecord code path with Rails 7.2+ reduces contention on the connection pool. If you can measure then P95 time of your application, and give me a rough idea of how many clients you want to serve, we can figure out the appropriate pool size.

@nept
Copy link
Author

nept commented Oct 11, 2024

Thank you for your input @ioquatix.

Based on your feedback, we've considered two options for exposing the connection pool as a readiness to Kubernetes.
Each options must expose an http endpoint and check if some connections are still available in the pool. We could either develop a command calling this endpoint or just let Kubernetes call it directly.

Option A: Leveraging async-container

While async-container offers a readiness interface, it's unclear how we could directly expose the connection pool through it. Could you please elaborate on how we might achieve this?

Option B: Custom Health Check Endpoint

Given our experience with Puma, a custom health check endpoint seems like a straightforward approach. We could extend our existing Puma health check endpoint to include connection pool checks for Falcon. This would provide a consistent and well-understood mechanism.

Questions:

Do you see any significant drawbacks to using a custom health check endpoint?
We're eager to hear your thoughts on these options and any additional recommendations you may have.

@ioquatix
Copy link
Member

I think option B makes more sense if you want control over exactly what kind of things you are checking as healthy. The async-container readiness protocol indicates when Falcon has bound to the socket and will accept incoming connections, but that's different from "the server will meaningfully respond to those requests". Having an HTTP endpoint for this would be the easiest option.

Can you elaborate on exactly what things you want to check the health of? e.g. ActiveRecord?

@nept
Copy link
Author

nept commented Oct 14, 2024

Awesome. Option B is the one we'll like to focus on.

Can you elaborate on exactly what things you want to check the health of? e.g. ActiveRecord?

We want to return OK/KO based on the ActiveRecord stats.
If there is 10 connections in the pool and 10 are in use we return KO and start balancing the trafic to the pods in ready state.

This check will be done for the readiness probe only*

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants