Make it possible to reactivate a draining worker node#24444
Make it possible to reactivate a draining worker node#24444losipiuk merged 2 commits intotrinodb:masterfrom
Conversation
There was a problem hiding this comment.
Did you think of using StateMachine as implementation block here. Like we do in many places. E.g QueryStateMachine
There was a problem hiding this comment.
I would low level method as this one as private and expose specific transitionTo... methods publically
There was a problem hiding this comment.
Regarding the low level method method and exposing specific transitionTo... methods publicly. This will require adding the switch ...case in the io.trino.server.ServerInfoResource#updateState. Do you think it brings us any value?
There was a problem hiding this comment.
Regarding the StateMachine - I have not seen it before. Interesting idea, but would require me learning how to use it, and rewriting NodeStateManager.
Is it much better or easier to use StateMachine instead of hand written code in NodeStateManager? Can it be done later as a refactoring task?
The thing is, the code presented here is already in production environments and is well tested and battle-proven. Any changes would require careful analysis and testing.
There was a problem hiding this comment.
StateMachine is kinda trivial. It is just a building block wrapping atomic with a state with nicer interface. It also adds tooling for attaching state change listeners, but you do not need that here I guess
A refactor - rename to prepare for adding new logic.
Adds new node states to enable full control over shutdown and reactivation of workers. - state: DRAINING - a reversible shutdown, - state: DRAINED - all tasks are finished, server can be safely and quickly stopped. Can still go back to ACTIVE.
92b0165 to
80fee59
Compare
|
Couple of questions: Does this show up in the UI or CLI or somewhere? |
|
Personally I think we should add for these states somewhere.. |
|
I do not think it is visible anywhere outside of internal communication between coordinator and workers ( |
Thank you for your questions. These are all valid concerns.
Are we showing
It is not a breaking change. To observe new STATE you need to explicitly force the transition by calling a worker API If someone executes the transition to a new DRAINING state, then he/she should be aware that a new state can be observed by any tool that uses coordinator or worker API's for observing node state or lifecycle management.
No idea, do we have such docs? So far @losipiuk thinks we do not need to update any docs. |
Description
Adds new node states to enable full control over shutdown and reactivation of workers.
For backwards compatibility reasons and to cooperate with shutdown initiated manually or by kubernetes two new states are added
DRAININGandDRAINED. The stateSHUTTING_DOWNis left as is, without changes. With two new states it is now possible to have a fine grained control over worker shutdown. Trino worker can go from DRAINING or DRAINED to ACTIVE.Any tool (for example a kubernetes operator) can now initiate the draining phase before executing a shutdown. Worker immediately goes to a DRAINING phase and goes through almost all the steps that the graceful shutdown requires. During the DRAINNIG state the worker can transition back to ACTIVE if requested to do so. Upon finishing all the tasks it does not terminate the thread pools and connections (like SHUTTING_DOWN), but transitions to a
DRAINEDstate. In this state it can be safely and quickly terminated by being requested to go toSHUTTING_DOWNor it can transition back to ACTIVE (also by request).Below is a state transition diagram (with sources in plantuml):

Additional context and related issues
It solves the issue: #9976, and is similar to previous PRs (but has a smaller scoper and is simpler, does not add new logic to manage workers to coordinator):
Release notes
( ) This is not user-visible or is docs only, and no release notes are required.
( ) Release notes are required. Please propose a release note for me.
(x) Release notes are required, with the following suggested text: