Query frontend readiness #2733

joe-elliott · 2020-06-16T19:56:07Z

What this PR does:

Adjusts the behavior of the frontend readiness handler at /ready to return 200 only if the frontend is ready to receive requests
- Note that this also impacts the single binary
Adds appropriate unit/integration tests
Adds documentation on scaling the query frontend
Minor updates/fixes to scalable frontend proposal

Which issue(s) this PR fixes:
Fixes #None

Addresses this accepted proposal: https://github.com/cortexproject/cortex/blob/master/docs/proposals/scalable-query-frontend.md#querier-discovery-lag

Checklist

Tests updated
Documentation added
CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX]

pstibrany · 2020-06-16T20:28:49Z

All Cortex componets have /ready handler. I don’t quite understand why does frontend need its own. Would it make sense to extend /ready handler with frontend check, similar to ingester?

(I guess I better read the proposal first)

joe-elliott · 2020-06-16T20:48:55Z

The current readiness handler returns 200 if all of the Cortex modules have successfully started and are running. This is normally fine, but in the case of the query frontend it is not actually ready to receive requests until a querier has established a GRPC connection to it. Depending on settings this could be 10s of seconds.

In discussions surrounding the original proposal it was decided an additional handler was more appropriate then modifying the behavior of the original, but we can revisit this conversation.

pstibrany · 2020-06-16T21:03:41Z

In discussions surrounding the original proposal it was decided an additional handler was more appropriate then modifying the behavior of the original, but we can revisit this conversation.

By using separate handler, we risk sending requests to Cortex instance which already has querier connected to it, but is otherwise still initializing. If initialization of other modules fails and Cortex exits, queued requests will be dropped.

Benefit of using separate handler is that query-frontend may start serving requests faster, while e.g. WAL is being replayed or blocks being downloaded (both of which can take minutes).

I am not sure what is better. I will review previous discussion to educate myself.

pracucci

Thanks @joe-elliott for working on it! LGTM. I just have few little requests:

Could you rebase master, please?
Could you add a CHANGELOG entry, please?
What do you think to add /ready and /query-frontend/ready to docs/apis.md and explain the use case for /query-frontend/ready?

gouthamve · 2020-06-17T12:52:41Z

I would be in favour of re-using the existing handler and adding the check to it.

Benefit of using separate handler is that query-frontend may start serving requests faster, while e.g. WAL is being replayed or blocks being downloaded (both of which can take minutes).

If this is the case for single-binary, I am not sure how much benefit we'll have by getting the frontend ready but not the other components.

Signed-off-by: Joe Elliott <[email protected]>

joe-elliott · 2020-06-17T19:03:34Z

Thanks @joe-elliott for working on it! LGTM. I just have few little requests:

Done!

If this is the case for single-binary, I am not sure how much benefit we'll have by getting the frontend ready but not the other components.

In the single binary scenario the current readiness endpoint works well for insertion into a gossip ring or readiness for ingestion, but not for reads. The idea behind this endpoint is that a query path load balancer or k8s service could use this endpoint to delay querying the new shard until the query-frontend has attached queriers.

pstibrany · 2020-06-17T19:10:54Z

In the single binary scenario the current readiness endpoint works well for insertion into a gossip ring or readiness for ingestion, but not for reads. The idea behind this endpoint is that a query path load balancer or k8s service could use this endpoint to delay querying the new shard until the query-frontend has attached queriers.

Discussion is whether to extend existing /ready handler with check for query-frontend, or create new handler. I'm not convinced that new handler is necessary, but don't have strong opinion on it.

pstibrany · 2020-06-17T19:12:42Z

Also in single-binary mode, I assume that querier running in the same binary "connects" to the query-frontend quickly.

joe-elliott · 2020-06-17T19:24:25Z

Also in single-binary mode, I assume that querier running in the same binary "connects" to the query-frontend quickly.

You could configure the querier to use localhost. In this case each query-frontend would have a single querier attached.

However, I believe that the recommended configuration would be exactly as we have it now: The querier component of each single binary will be configured to use a DNS entry for discovery of the frontend component of all the other single binaries. In this case the connection lag would be no different than microservices mode.

docs/apis.md

Co-authored-by: Marco Pracucci <[email protected]> Signed-off-by: Joe Elliott <[email protected]>

pracucci

@joe-elliott I think Peter and Goutham are right. We don't need a dedicated readiness endpoint. When we brainstormed the proposal, we weren't aware of publishNotReadyAddresses existence, but thanks to it we can just change the query-frontend readiness probe endpoint to also ensure that least 1 querier is connected (we customise the readiness also for ingesters, see pkg/cortex/cortex.go:391). What's your take?

pkg/api/api.go

CHANGELOG.md

Signed-off-by: Joe Elliott <[email protected]>

docs/operations/scalable-query-frontend.md

pkg/querier/frontend/frontend.go

integration/query_frontend_test.go

Co-authored-by: Marco Pracucci <[email protected]> Signed-off-by: Joe Elliott <[email protected]>

Signed-off-by: Joe Elliott <[email protected]>

joe-elliott · 2020-06-30T15:42:07Z

Ok, I think this is in a good place now. As discussed we settled on modifying the behavior of the /ready endpoint.

I will also be submitting a PR to this repo: https://github.com/grafana/cortex-jsonnet to handle the k8s configuration changes. Deploying these changes will need to be coordinated.

Also: Should we do more to highlight this change then the changelog and additional doc? I'm concerned this can have unexpected impacts when people roll it out.

pstibrany

Looks good, thank you!

pkg/querier/frontend/frontend.go

pkg/querier/frontend/frontend_test.go

Signed-off-by: Joe Elliott <[email protected]>

pracucci

Thanks @joe-elliott for addressing my feedback. LGTM (modulo a couple of nits)

pkg/querier/frontend/frontend.go

Signed-off-by: Joe Elliott <[email protected]>

pstibrany

Thanks for patiently addressing our feedback!

pull-request-size bot added the size/L label Jun 16, 2020

joe-elliott marked this pull request as ready for review June 16, 2020 20:21

pracucci approved these changes Jun 17, 2020

View reviewed changes

joe-elliott added 8 commits June 17, 2020 14:04

Added basic functionality

be7218d

Signed-off-by: Joe Elliott <[email protected]>

scalable frontend proposal cleanup

de2a613

Signed-off-by: Joe Elliott <[email protected]>

Added readyForRequests test

3fdff32

Signed-off-by: Joe Elliott <[email protected]>

Added frontend ready tests

a807302

Signed-off-by: Joe Elliott <[email protected]>

Added integration tests

df88d96

Signed-off-by: Joe Elliott <[email protected]>

Return better http body

d31f22d

Signed-off-by: Joe Elliott <[email protected]>

Added PR to frontend doc

42cafa0

Signed-off-by: Joe Elliott <[email protected]>

Added scalable frontend docs

1309aba

Signed-off-by: Joe Elliott <[email protected]>

joe-elliott force-pushed the query-frontend-readiness branch from 2f3e77b to 1309aba Compare June 17, 2020 18:04

Added entry to api.md

3607204

Signed-off-by: Joe Elliott <[email protected]>

pracucci reviewed Jun 19, 2020

View reviewed changes

docs/apis.md Outdated Show resolved Hide resolved

Update docs/apis.md

67dfcfe

Co-authored-by: Marco Pracucci <[email protected]> Signed-off-by: Joe Elliott <[email protected]>

joe-elliott force-pushed the query-frontend-readiness branch from 1640287 to 67dfcfe Compare June 19, 2020 19:15

pracucci self-requested a review June 22, 2020 07:16

pracucci reviewed Jun 22, 2020

View reviewed changes

Merge branch 'master' into query-frontend-readiness

8450c98

pracucci reviewed Jun 29, 2020

View reviewed changes

pkg/api/api.go Outdated Show resolved Hide resolved

pracucci reviewed Jun 29, 2020

View reviewed changes

CHANGELOG.md Outdated Show resolved Hide resolved

joe-elliott marked this pull request as draft June 29, 2020 15:34

joe-elliott added 2 commits June 29, 2020 15:09

/query-frontend/ready => /ready

158f424

Signed-off-by: Joe Elliott <[email protected]>

Merge branch 'master' into query-frontend-readiness

65af481

pracucci reviewed Jun 30, 2020

View reviewed changes

docs/operations/scalable-query-frontend.md Outdated Show resolved Hide resolved

pracucci reviewed Jun 30, 2020

View reviewed changes

docs/operations/scalable-query-frontend.md Outdated Show resolved Hide resolved

pracucci reviewed Jun 30, 2020

View reviewed changes

pkg/querier/frontend/frontend.go Outdated Show resolved Hide resolved

integration/query_frontend_test.go Outdated Show resolved Hide resolved

joe-elliott force-pushed the query-frontend-readiness branch from ee14105 to 8fdf0f8 Compare June 30, 2020 14:36

joe-elliott and others added 3 commits June 30, 2020 10:52

Update CHANGELOG.md

ecee207

Co-authored-by: Marco Pracucci <[email protected]> Signed-off-by: Joe Elliott <[email protected]>

Cleaned up tests

9d69ad3

Signed-off-by: Joe Elliott <[email protected]>

Reworked integration test to work with new /ready mechanics

e8a3557

Signed-off-by: Joe Elliott <[email protected]>

joe-elliott force-pushed the query-frontend-readiness branch from 8fdf0f8 to e8a3557 Compare June 30, 2020 14:52

joe-elliott added 3 commits June 30, 2020 10:55

Merge branch 'master' into query-frontend-readiness

c5c2d32

Corrected flag

7a2c47f

Signed-off-by: Joe Elliott <[email protected]>

Added comment on latency

685e82b

Signed-off-by: Joe Elliott <[email protected]>

joe-elliott marked this pull request as ready for review June 30, 2020 15:42

pstibrany reviewed Jun 30, 2020

View reviewed changes

pkg/querier/frontend/frontend.go Outdated Show resolved Hide resolved

pkg/querier/frontend/frontend.go Outdated Show resolved Hide resolved

pkg/querier/frontend/frontend_test.go Outdated Show resolved Hide resolved

joe-elliott added 2 commits June 30, 2020 12:38

removed unnecessary test method

ade7543

Signed-off-by: Joe Elliott <[email protected]>

Swapped to uber atomic

abfc7e2

Signed-off-by: Joe Elliott <[email protected]>

joe-elliott mentioned this pull request Jun 30, 2020

Added publish not ready addresses grafana/cortex-jsonnet#118

Merged

pracucci approved these changes Jul 1, 2020

View reviewed changes

pkg/querier/frontend/frontend.go Outdated Show resolved Hide resolved

pkg/querier/frontend/frontend.go Outdated Show resolved Hide resolved

joe-elliott added 2 commits July 1, 2020 10:51

Improved return body

79d5742

Signed-off-by: Joe Elliott <[email protected]>

warn => info

6cf0789

Signed-off-by: Joe Elliott <[email protected]>

pstibrany approved these changes Jul 1, 2020

View reviewed changes

pstibrany merged commit 76ac8a9 into cortexproject:master Jul 1, 2020

Query frontend readiness #2733

Query frontend readiness #2733

Uh oh!

Conversation

joe-elliott commented Jun 16, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pstibrany commented Jun 16, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

joe-elliott commented Jun 16, 2020

Uh oh!

pstibrany commented Jun 16, 2020

Uh oh!

pracucci left a comment

Choose a reason for hiding this comment

Uh oh!

gouthamve commented Jun 17, 2020

Uh oh!

joe-elliott commented Jun 17, 2020

Uh oh!

pstibrany commented Jun 17, 2020

Uh oh!

pstibrany commented Jun 17, 2020

Uh oh!

joe-elliott commented Jun 17, 2020

Uh oh!

Uh oh!

pracucci left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

joe-elliott commented Jun 30, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pstibrany left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

pracucci left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

pstibrany left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

joe-elliott commented Jun 16, 2020 •

edited

Loading

pstibrany commented Jun 16, 2020 •

edited

Loading

joe-elliott commented Jun 30, 2020 •

edited

Loading