-
Notifications
You must be signed in to change notification settings - Fork 1.9k
lib: execute pthread_join only when tid != 0 in flb_stop. #9428
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: Phillip Whelan <phillip.whelan@chronosphere.io>
|
@pwhelan can we get more details on why CentOS and Redhat don't conform ? I cannot find a reference about it. It seems to me that the proper approach would be to get the return of pthread_join(3) and print the errors if any:
a simple flb_errno() can do the job since that is set in errno |
That would be great, but pthread_join crashes on RHEL and CentOS before it can even return when passing the first argument as zero. |
|
@pwhelan what are the steps to make it crash ? |
You need to configure a calyptia fleet with an invalid configuration, ie: with a non-existent plugin: Reproducing it is hard under other circumstances. |
|
can we get a minimal repro ? it seems this needs hot reload with a speacial crafted config, something we can repro with curl ? |
You need to run fluent-bit under the CentOS/8 container, modify the configuration file once it is running and then send it SIGHUP. |
|
closing in favor of #9432 |
Summary
This PR wraps the call to
pthread_joininflb_stopin a conditional fortid != 0.Calling
pthread_joinwithtid== 0 is undefined. Technically it should returnESRCHbut CentOS and RHEL do not seem to conform.The following crash occurs:
Debugging the actual code in
__pthread_timedjoin_exwe can see it is in fact aNULLdereference:Root cause analysis
The value of
tidis derived fromctx->config->workerhere:The call to
pthread_joinchecksctx->status, which is not changed whenctx->config->workeris set:The value of
ctx->statuscan be set to the following values which are defined in flb_lib.h:The places where
ctx->statusis set do not correlate with the value ofctx->config->workerat all:fluent-bit/src/flb_lib.c
Line 168 in f36c956
fluent-bit/src/flb_lib.c
Line 680 in f36c956
fluent-bit/src/flb_lib.c
Line 726 in f36c956
fluent-bit/src/flb_lib.c
Line 732 in f36c956
fluent-bit/src/flb_lib.c
Line 741 in f36c956
In fact we might want to just check the value of
ctx->config->workerand notctx->statusbefore joining it. There is a correlation between both values but I do not see it as being strong enough to depend upon one or the other.The code that calls
pthread_joinbehind that conditional is also meant for extraordinary or erroneous circumstances and therefore being defensive to me does not seem to be a bad policy.References
Enter
[N/A]in the box, if an item is not applicable to your change.Testing
Before we can approve your change; please submit the following in a comment:
If this is a change to packaging of containers or native binaries then please confirm it works for all targets.
ok-package-testlabel to test for all targets (requires maintainer to do).Documentation
Backporting
Fluent Bit is licensed under Apache 2.0, by submitting this pull request I understand that this code will be released under the terms of that license.