-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
pimd: Prevent t_join_timer thread from being canceled multiple times #17812
Conversation
Issue: We stop all PIM timers during instance shutdown. Simultaneously, if there are any changes to the next hop (ZEBRA_NEXTHOP_UPDATE), it triggers an RPF update to the upstream next hop based on the new update received from Zebra. This leads to stop an already stopped timer. Fix: Ensure that join_timer_stop is not called on an already canceled thread. signed-off-by: Rajesh Varatharaj<[email protected]>
@@ -332,7 +332,14 @@ static void join_timer_stop(struct pim_upstream *up) | |||
{ | |||
struct pim_neighbor *nbr = NULL; | |||
|
|||
EVENT_OFF(up->t_join_timer); | |||
if (up->t_join_timer) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not usually a fan of testing these pointers in application code, since there's no locking done. the actual event lib call does use a lock - why is this code a problem? it's not unusual to see code that uses the "OFF" or "cancel" apis like this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understand that the event library cancel logic includes locking, and it is common to see APIs like "OFF" or "cancel" used directly.
The issue here is a race condition during PIM instance shutdown combined with RPF updates. Without this check, we can see crashes when attempting to cancel a previously canceled thread.
Another approach I am considering is to call thread_cancel only when it hasnt already cancelled,
#define THREAD_OFF(thread) \
do { \
if ((thread) && (thread)->master != NULL) { \
thread_cancel(&(thread)); \
} \
} while (0)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't that a universal solution,? I need your opinion
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I ... don't think I understand the race condition - can you say what it is that goes wrong? I mean, if the "up" object is still valid, then cancelling a task should be safe. if the object isn't valid, all bets are off, and a minor change in this path isn't going to be a real fix?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mjstapp, let me go back, and dig some more info. Meantime closing this
Issue:
We stop all PIM timers during instance shutdown. Simultaneously, if there are any changes to the next hop (ZEBRA_NEXTHOP_UPDATE), it triggers an RPF update to the upstream next hop based on the new update received from Zebra.
This leads to stop an already stopped timer.
Fix:
Ensure that join_timer_stop is not called on an already canceled thread.
signed-off-by: Rajesh Varatharaj[email protected]