Skip to content

helm: update liveness probes#3584

Merged
enisoc merged 5 commits intovitessio:masterfrom
derekperkins:mysql-liveness-probe
Jan 27, 2018
Merged

helm: update liveness probes#3584
enisoc merged 5 commits intovitessio:masterfrom
derekperkins:mysql-liveness-probe

Conversation

@derekperkins
Copy link
Copy Markdown
Member

@derekperkins derekperkins commented Jan 23, 2018

Uses mysqladmin ping
cc @enisoc

@derekperkins derekperkins mentioned this pull request Jan 23, 2018
22 tasks
@derekperkins derekperkins force-pushed the mysql-liveness-probe branch 2 times, most recently from 395342e to f8c5ab6 Compare January 24, 2018 05:25
@derekperkins
Copy link
Copy Markdown
Member Author

I found these /debug/health endpoints, which seem to work nicely for the most part, but in testing, the vttablet health check started failing when I hadn't run an InitShardMaster yet. I'm not sure if that's what we want or not, but now that I have everything running and set up, it seems to be working nicely. I'm going to try my import again to see if it causes another crash loop.

@derekperkins derekperkins changed the title helm: add liveness probe to mysql container helm: update liveness probes Jan 24, 2018
@enisoc
Copy link
Copy Markdown
Member

enisoc commented Jan 25, 2018

As you found, /debug/health can fail for reasons that don't indicate a stuck process, so we shouldn't use it for a livenessProbe. That should make a good readinessProbe for vitess processes though.

For now, it looks like /debug/status may be our best bet for a livenessProbe, since it's the simplest in terms of dependencies. Ideally we should add a handler like /debug/no-op that's hard-coded to immediately return 200 OK, and use that for the livenessProbe.

@derekperkins
Copy link
Copy Markdown
Member Author

@enisoc I switched /debug/health to be a readiness probe and used /debug/status for the liveness probe, except for vttablet. /debug/health checks the topology, but there is no master on initial deployment, and you can’t initialize one because the pod isn’t ready.

I think this should be ready to merge.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On second thought, maybe it's more dangerous than helpful to have a livenessProbe on mysqld since it's got a lot more in-memory state than the rest of our processes (you lose a lot on restart). What do you think about having only the readinessProbe here? It seems like a safer default.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's probably a good choice

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should be able to add /debug/health as a readinessProbe on vttablet if we set podManagementPolicy: Parallel on the StatefulSet. That makes sense anyway because vttablets don't need to be started in any particular order, so there's no need to wait.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good choice, but I don't think it solves the problem I encountered. I wasn't able to run InitShardMaster because it never reported as ready. I'll give it another try and report back whether or not it works.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Pod being Unready shouldn't prevent InitShardMaster from working, as long as all the Pods are running (which requires Parallel).

MySQL keeps a lot of state in memory, so we don’t want to lose it unnecessarily
@derekperkins
Copy link
Copy Markdown
Member Author

Ok, I addressed your points and it's working great for me now.

@enisoc
Copy link
Copy Markdown
Member

enisoc commented Jan 27, 2018

LGTM

Approved with PullApprove

@enisoc enisoc merged commit 83866b3 into vitessio:master Jan 27, 2018
@derekperkins derekperkins deleted the mysql-liveness-probe branch January 27, 2018 07:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants