-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve the signal handling to avoid crashes during stop #26679
base: 4.10.2
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Amazing work @pereyra-m
src/wazuh_modules/main.c
Outdated
|
||
static int flag_foreground = 0; // Running in foreground. | ||
|
||
static bool wm_sig_received = false; // Flag to indicate if a signal is being handled. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This can be defined inside the void wm_handler(int signum)
function to reduce the scope
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are right.
src/wazuh_modules/main.c
Outdated
// Main function | ||
|
||
int main(int argc, char **argv) | ||
{ | ||
// Ignore signals until the modules have started | ||
wm_signals_ignore(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note: Performed some testing over this
If the module initialization takes too long or it hungs we are still able to kill the process, I don't know if this desirable or not but its working as intended
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm open to comments here. Ignoring the signals also avoids the Termination
of the process when it receives a signal during the starting phase, but it might change the current behavior too much.
The real fix is in the mutex, I can revert this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The proposed solution is on the right track! However, after reviewing the trace, I would suggest considering some alternative approaches.
One concern from the trace is that the wm_handler
exit routine is executed by the Vuln Detector thread, while ideally, it should be handled by the main thread.
Additionally, the use of sleep(1)
raises some concerns. This approach could lead to inconsistent behavior across different systems—on some, it might be too long, while on others, it could be insufficient. It appears that a sleep function is being used to mitigate a race condition, which is not an ideal solution.
Given that the modules seem to have a flag for handling graceful shutdowns, I would recommend exploring the replacement of wm_signals_ignore
with a per-thread signal mask, ensuring proper signal handling across different threads.
Cheers!
Hi @vikman90 Thank you for the review.
I don't understand why this happens. Can you share the backtrace? I think that setting the signal handling in the main thread shouldn't have this behavior.
I don't like it either, but I thought that if we had some sleeps at the start of other modules, it wouldn't be that bad. I agree that a real coordination of all the modules is much better, because we could know the exact moment when all of them are ready. But this issue isn't reproduced naturally in the product, so I preferred to make fewer changes and not to have a major rework.
I proposed the |
I've uploaded another proposal for your consideration. Given that making each module start routine notify the main thread that is ready is a big change, I moved the sleep to the stop section. I try to cause the less impact possible here, but without this, the SIGTERM signal happens too fast and the module crashes. Now I don't affect the time it takes the module to start. Also, I've removed the signal ignore method at startup to have fewer changes in the PR. Consider that sometimes, the process is terminated because the signal is received during a pre-existent sleep and it isn't ignored anymore; but this isn't a crash. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
The segfault prevented with this approach, in a normal situation it shouldn't happen anymore
// We wait in case the modules haven't completed the initialization | ||
sleep(1); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We know this is not the best approach, but is the best we can do without inquiring about a big refactor regarding the initialization and stopping process of the modules
The stop-and-start interfaces we currently have are not prepared for this kind of behaviour
Description
To avoid the crash of the
wazuh-modulesd
during the shutdown, some improvements were added:sleep()
calls.Update
The original proposal described above might have many changes. A simplified one is under analysis, where the sleep occurs at the stop method and no sig-ignore is configured at startup.
Tests
The reproduction steps now don't cause crashes.