-
Notifications
You must be signed in to change notification settings - Fork 834
Query frontend readiness #2733
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Query frontend readiness #2733
Conversation
All Cortex componets have /ready handler. I don’t quite understand why does frontend need its own. Would it make sense to extend /ready handler with frontend check, similar to ingester? (I guess I better read the proposal first) |
The current readiness handler returns 200 if all of the Cortex modules have successfully started and are running. This is normally fine, but in the case of the query frontend it is not actually ready to receive requests until a querier has established a GRPC connection to it. Depending on settings this could be 10s of seconds. In discussions surrounding the original proposal it was decided an additional handler was more appropriate then modifying the behavior of the original, but we can revisit this conversation. |
By using separate handler, we risk sending requests to Cortex instance which already has querier connected to it, but is otherwise still initializing. If initialization of other modules fails and Cortex exits, queued requests will be dropped. Benefit of using separate handler is that query-frontend may start serving requests faster, while e.g. WAL is being replayed or blocks being downloaded (both of which can take minutes). I am not sure what is better. I will review previous discussion to educate myself. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @joe-elliott for working on it! LGTM. I just have few little requests:
- Could you rebase
master
, please? - Could you add a CHANGELOG entry, please?
- What do you think to add
/ready
and/query-frontend/ready
todocs/apis.md
and explain the use case for/query-frontend/ready
?
I would be in favour of re-using the existing handler and adding the check to it.
If this is the case for single-binary, I am not sure how much benefit we'll have by getting the frontend ready but not the other components. |
Signed-off-by: Joe Elliott <[email protected]>
Signed-off-by: Joe Elliott <[email protected]>
Signed-off-by: Joe Elliott <[email protected]>
Signed-off-by: Joe Elliott <[email protected]>
Signed-off-by: Joe Elliott <[email protected]>
Signed-off-by: Joe Elliott <[email protected]>
Signed-off-by: Joe Elliott <[email protected]>
Signed-off-by: Joe Elliott <[email protected]>
2f3e77b
to
1309aba
Compare
Signed-off-by: Joe Elliott <[email protected]>
Done!
In the single binary scenario the current readiness endpoint works well for insertion into a gossip ring or readiness for ingestion, but not for reads. The idea behind this endpoint is that a query path load balancer or k8s service could use this endpoint to delay querying the new shard until the query-frontend has attached queriers. |
Discussion is whether to extend existing |
Also in single-binary mode, I assume that querier running in the same binary "connects" to the query-frontend quickly. |
You could configure the querier to use localhost. In this case each query-frontend would have a single querier attached. However, I believe that the recommended configuration would be exactly as we have it now: The querier component of each single binary will be configured to use a DNS entry for discovery of the frontend component of all the other single binaries. In this case the connection lag would be no different than microservices mode. |
Co-authored-by: Marco Pracucci <[email protected]> Signed-off-by: Joe Elliott <[email protected]>
1640287
to
67dfcfe
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@joe-elliott I think Peter and Goutham are right. We don't need a dedicated readiness endpoint. When we brainstormed the proposal, we weren't aware of publishNotReadyAddresses
existence, but thanks to it we can just change the query-frontend
readiness probe endpoint to also ensure that least 1 querier is connected (we customise the readiness also for ingesters, see pkg/cortex/cortex.go:391
). What's your take?
Signed-off-by: Joe Elliott <[email protected]>
ee14105
to
8fdf0f8
Compare
Co-authored-by: Marco Pracucci <[email protected]> Signed-off-by: Joe Elliott <[email protected]>
Signed-off-by: Joe Elliott <[email protected]>
Signed-off-by: Joe Elliott <[email protected]>
8fdf0f8
to
e8a3557
Compare
Signed-off-by: Joe Elliott <[email protected]>
Signed-off-by: Joe Elliott <[email protected]>
Ok, I think this is in a good place now. As discussed we settled on modifying the behavior of the I will also be submitting a PR to this repo: https://github.com/grafana/cortex-jsonnet to handle the k8s configuration changes. Deploying these changes will need to be coordinated. Also: Should we do more to highlight this change then the changelog and additional doc? I'm concerned this can have unexpected impacts when people roll it out. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, thank you!
Signed-off-by: Joe Elliott <[email protected]>
Signed-off-by: Joe Elliott <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @joe-elliott for addressing my feedback. LGTM (modulo a couple of nits)
Signed-off-by: Joe Elliott <[email protected]>
Signed-off-by: Joe Elliott <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for patiently addressing our feedback!
What this PR does:
/ready
to return 200 only if the frontend is ready to receive requestsWhich issue(s) this PR fixes:
Fixes #None
Addresses this accepted proposal: https://github.com/cortexproject/cortex/blob/master/docs/proposals/scalable-query-frontend.md#querier-discovery-lag
Checklist
CHANGELOG.md
updated - the order of entries should be[CHANGE]
,[FEATURE]
,[ENHANCEMENT]
,[BUGFIX]