Skip to content

Enhance /debug/health to be usable as a liveness probe#3139

Merged
sougou merged 4 commits intovitessio:masterfrom
HubSpot:debug_health
Sep 6, 2017
Merged

Enhance /debug/health to be usable as a liveness probe#3139
sougou merged 4 commits intovitessio:masterfrom
HubSpot:debug_health

Conversation

@bbeaudreault
Copy link
Copy Markdown
Contributor

Only report health for serving types, namely master, replica, batch. Fix api to return 500 if in error

This should be useful for those in kubernetes or who otherwise want to use a probe to determine whether the vttablet process is healthy. This is distinct from /healthz, which is used for load balancers and returns unhealthy in more cases.

Open question: should we also whitelist the StateTransitioning state? Maybe not, as this should probably be called with a heuristic that allows for a certain number of failures, like many probe implementations do.

@sougou @adkhare

@bbeaudreault bbeaudreault changed the title Enhance /debug/health to be usable as a readiness probe Enhance /debug/health to be usable as a liveness probe Sep 6, 2017
@bbeaudreault
Copy link
Copy Markdown
Contributor Author

I've tested this out and it works as expected in our environment. As I anticipated in my original description, there is a period of 5s or so where the state is transitioning and it returns an error. We could whitelist that state, but I'm just tuning my probes to be able to ride over it.

tabletType := tsv.target.TabletType
tsv.mu.Unlock()
switch tabletType {
case topodatapb.TabletType_MASTER, topodatapb.TabletType_REPLICA, topodatapb.TabletType_BATCH:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be future-proof, let's add experimental, which is considered a serving type.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh. And rdonly also.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BATCH and RDONLY are the same value -- the switch complains if i add that. I'll do experimintal tho

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Of course :). Just experimental then.

@bbeaudreault
Copy link
Copy Markdown
Contributor Author

added experimental. i think shard 1 is never going to go green, it's been failing all morning for multiple PRs

@sougou
Copy link
Copy Markdown
Contributor

sougou commented Sep 6, 2017

LGTM

Approved with PullApprove

@sougou sougou merged commit b75a83e into vitessio:master Sep 6, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants