Skip to content

Commit

Permalink
ocfs2: fix cluster hang after a node dies
Browse files Browse the repository at this point in the history
When a node dies, other live nodes have to choose a new master for an
existed lock resource mastered by the dead node.

As for ocfs2/dlm implementation, this is done by function -
dlm_move_lockres_to_recovery_list which marks those lock rsources as
DLM_LOCK_RES_RECOVERING and manages them via a list from which DLM
changes lock resource's master later.

So without invoking dlm_move_lockres_to_recovery_list, no master will be
choosed after dlm recovery accomplishment since no lock resource can be
found through ::resource list.

What's worse is that if DLM_LOCK_RES_RECOVERING is not marked for lock
resources mastered a dead node, it will break up synchronization among
nodes.

So invoke dlm_move_lockres_to_recovery_list again.

Fixs: 'commit ee8f7fc ("ocfs2/dlm: continue to purge recovery lockres when recovery master goes down")'
Link: http://lkml.kernel.org/r/63ADC13FD55D6546B7DECE290D39E373CED6E0F9@H3CMLB14-EX.srv.huawei-3com.com
Signed-off-by: Changwei Ge <[email protected]>
Reported-by: Vitaly Mayatskih <[email protected]>
Tested-by: Vitaly Mayatskikh <[email protected]>
Cc: Mark Fasheh <[email protected]>
Cc: Joel Becker <[email protected]>
Cc: Junxiao Bi <[email protected]>
Cc: Joseph Qi <[email protected]>
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
  • Loading branch information
Changwei Ge authored and torvalds committed Nov 16, 2017
1 parent 98d6c09 commit 1c01967
Showing 1 changed file with 1 addition and 0 deletions.
1 change: 1 addition & 0 deletions fs/ocfs2/dlm/dlmrecovery.c
Original file line number Diff line number Diff line change
Expand Up @@ -2419,6 +2419,7 @@ static void dlm_do_local_recovery_cleanup(struct dlm_ctxt *dlm, u8 dead_node)
dlm_lockres_put(res);
continue;
}
dlm_move_lockres_to_recovery_list(dlm, res);
} else if (res->owner == dlm->node_num) {
dlm_free_dead_locks(dlm, res, dead_node);
__dlm_lockres_calc_usage(dlm, res);
Expand Down

0 comments on commit 1c01967

Please sign in to comment.