Skip to content

helm: reparent on vttablet preStop hook#4103

Merged
sougou merged 5 commits intovitessio:masterfrom
derekperkins:lifecycle
Nov 4, 2018
Merged

helm: reparent on vttablet preStop hook#4103
sougou merged 5 commits intovitessio:masterfrom
derekperkins:lifecycle

Conversation

@derekperkins
Copy link
Copy Markdown
Member

Attempt to reparent when vttablet stops. I have a couple questions:

  1. Should we try to detect if the current tablet is healthy? If not, we may want to do an EmergencyReparent or just let Orchestrator take over
  2. The docs say that this hook could possibly get called more than once. Is that something we need to worry about?

cc @enisoc @acharis

@derekperkins derekperkins requested a review from enisoc July 25, 2018 06:58
@derekperkins derekperkins force-pushed the lifecycle branch 2 times, most recently from e78d0d3 to 2eebbdb Compare July 25, 2018 18:51
@hmcgonig
Copy link
Copy Markdown
Contributor

Currently if a stop hook fails, kubernetes will allow the resource to be terminated.

With this in mind, ideally you would have some retry logic in your stop hook so that it will keep retrying until it actually reparents (or maybe some other form of error handling). For example, on reparent failure you could sleep infinity and curl out to some form of alerting so an admin can intervene manually, etc.

There have also been a few scenarios where we've seen reparents cause vttablet to crash, leaving the cluster without a master (luckily orchestrator saves us in this case).

I wrote our internal stop hook logic and had to think about all this stuff at one point, so I'm very interested to get other's thoughts on the right course of action for both the above cases.

@sougou
Copy link
Copy Markdown
Contributor

sougou commented Oct 28, 2018

This has been lingering. Can we work on moving this forward?

@derekperkins
Copy link
Copy Markdown
Member Author

Yeah, I'm tackling this after backups.

@derekperkins derekperkins force-pushed the lifecycle branch 2 times, most recently from 0da00e1 to dd45a38 Compare October 28, 2018 23:07
@derekperkins
Copy link
Copy Markdown
Member Author

@hmcgonig Here are more detailed thoughts on those two scenarios.

1 - Should we try to detect if the current tablet is healthy? If not, we may want to do an EmergencyReparent

With this in mind, ideally you would have some retry logic in your stop hook so that it will keep retrying until it actually reparents (or maybe some other form of error handling). For example, on reparent failure you could sleep infinity and curl out to some form of alerting so an admin can intervene manually, etc.

There needs to be some kind of default retry counts because the termination grace period starts right away, with a max of 600 seconds.

"The behavior is similar for a PreStop hook. If the hook hangs during execution, the Pod phase stays in a Terminating state and is killed after terminationGracePeriodSeconds of pod ends. If a PostStart or PreStop hook fails, it kills the Container."
https://kubernetes.io/docs/concepts/containers/container-lifecycle-hooks/

"If the preStop hook is still running after the grace period expires, processes in the Pod are sent the TERM signal with a small (2 second) extended grace period."
https://kubernetes.io/docs/concepts/workloads/pods/pod/#termination-of-pods

Maybe retry 3 times with a 10 second delay between each, and if all 3 attempts fail, then run EmergencyReparent? The PlannedReparent command is essentially handling the healthy case here.

2 - hook could possibly get called more than once

Going through the implementation, we really don't need to be worried about it. In the rare instance that it did get called twice, then either:

  • the first call succeeded, and so when the script runs again, it will exit early because the current tablet is no longer the master
  • the called failed / never happened, in which case we should be able to try again
  • the PlannedReparent is in progress, but hasn't completed yet. I'm assuming that Vitess already handles this race condition, so it doesn't seem important to handle that edge case here.

@derekperkins
Copy link
Copy Markdown
Member Author

There have also been a few scenarios where we've seen reparents cause vttablet to crash, leaving the cluster without a master (luckily orchestrator saves us in this case).

@hmcgonig Which container do you have the preStop hook running in? In this PR, I have it on vttablet, but it's feeling like a race condition between this running and the mysql container stopping. Maybe there's a race condition that is causing vttablet to crash in your case?

@hmcgonig
Copy link
Copy Markdown
Contributor

We run it in vttablet. And youre definitely right, there is a race there, you'll want to add a stop hook to the mysql container that polls the vttablet /debug/vars endpoint until it becomes unreachable. That way you can guarantee process shutdown ordering.

@derekperkins
Copy link
Copy Markdown
Member Author

@hmcgonig @acharis @leoxlin I've implemented the changes we discussed on our call and they really work great.

  • Since I'm not running the mysqlctld container directly, but instead copying Vitess files into a user specified image, I ended up using the busybox binary to do the blocking html call to /debug/vars.
  • I left a TODO for doing more advanced health checking before reparenting that I'll tackle in a future PR.
  • I'm currently only deleting the tablet record in the preStop hook if it is the master. Should I also delete the record on non-masters?

I cleaned up and rebased my commits, so if you think it looks good, I think it's ready to get merged. Thanks so much for your help and suggestions! It would have taken me a lot longer to figure out how to handle the race condition, and I probably wouldn't have gotten to deleting the tablet record.

Signed-off-by: Derek Perkins <derek@derekperkins.com>
Signed-off-by: Derek Perkins <derek@derekperkins.com>
Signed-off-by: Derek Perkins <derek@derekperkins.com>
Copy link
Copy Markdown
Contributor

@hmcgonig hmcgonig left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know a ton about the current vitess docker image composition, but I only really had a couple minor comments that could potentially speed up build times (assuming youre building in a way that allows layer caching). Otherwise, LGTM!

this should allow for more build layer caching

Signed-off-by: Derek Perkins <derek@derekperkins.com>
Signed-off-by: Derek Perkins <derek@derekperkins.com>
@derekperkins
Copy link
Copy Markdown
Member Author

@hmcgonig Those are good build suggestions for the Dockerfiles. I haven't worried too much about build speed yet since they're being auto-generated on Docker Hub. I added that to this PR.

@sougou this should be ready to merge now.

@sougou sougou merged commit bd3aed8 into vitessio:master Nov 4, 2018
@derekperkins derekperkins deleted the lifecycle branch November 9, 2018 18:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants