Skip to content

Drain Cleaner causes cluster downtime on EKS #9932

Answered by scholzj
Manicben asked this question in Q&A
Discussion options

You must be logged in to vote

I guess it might be best to start addressing these one by one ...

EKS uses the default k8s pod eviction timeout of 5 mins. We assume that after this timeout, the node will be forcefully shut down

I think this is definitely an issue. Especially if you have both ZooKeeper and Kafka on the same node and evicted in parallel and the nodes being drained one by one. The rolling update triggered by the Drain Cleaner will normally take 0-120 seconds to start. Then it will first roll the ZooKeeper pod and only then the Kafka Pod. So I think managing this within 5 minutes cannot be guaranteed, especially if the Pod might need to wait for some data to resync before being rolled, would need to pull …

Replies: 2 comments 3 replies

Comment options

You must be logged in to vote
0 replies
Comment options

You must be logged in to vote
3 replies
@Manicben
Comment options

@Manicben
Comment options

@scholzj
Comment options

Answer selected by Manicben
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
2 participants