Skip to content

Filesystem based chunk storage results in "chunk_io_locked" exception followed by fluentbit process termination #4598

@Sabari-Arunkumar-ML

Description

@Sabari-Arunkumar-ML

Version: 1.7.5
Environment: Ubuntu (containarized) (k8s)

We have a high load in production and multiple files+rewrite tags in our pipleline.
Upon new log files found over a period in k8s cluster, we will restart fluentbit (SIGTERM call followed by new process creation )

I can see fluentbit crashed sporadically,

Following is the GDB backtrace observed in one of crash dump file

#0 __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:51
#1 0x00007fe965090921 in __GI_abort () at abort.c:79
#2 0x0000000000436a72 in flb_signal_handler (signal=11) at /home/ec2-user/sabari/fb_v_1.7.5/fluent-bit/src/fluent-bit.c:514
#3
#4 0x000000000072e3ee in cio_chunk_is_locked (ch=0x36) at /home/ec2-user/sabari/fb_v_1.7.5/fluent-bit/lib/chunkio/src/cio_chunk.c:343
#5 0x0000000000478b7c in input_chunk_get (tag=0x7fe960446f30 "klog", tag_len=4, in=0x7fe9603f2a80, chunk_size=368, set_down=0x7fe96504df08) at /home/ec2-user/sabari/fb_v_1.7.5/fluent-bit/src/flb_input_chunk.c:630
#6 0x0000000000479121 in flb_input_chunk_append_raw (in=0x7fe9603f2a80, tag=0x7fe960446f30 "klog", tag_len=4, buf=0x7fe96001b9d0, buf_size=368) at /home/ec2-user/sabari/fb_v_1.7.5/fluent-bit/src/flb_input_chunk.c:865
#7 0x000000000048cf47 in in_emitter_add_record (tag=0x7fe960446680 "klog", tag_len=4, buf_data=0x7fe96624901f <error: Cannot access memory at address 0x7fe96624901f>, buf_size=368, in=0x7fe9603f2a80) at /home/ec2-user/sabari/fb_v_1.7.5/fluent-bit/plugins/in_emitter/emitter.c:117
#8 0x0000000000523635 in process_record (tag=0x7fe960452b90 "kubelet", tag_len=7, map=..., buf=0x7fe96624901f, buf_size=368, keep=0x7fe96504e160, ctx=0x7fe9603f2270) at /home/ec2-user/sabari/fb_v_1.7.5/fluent-bit/plugins/filter_rewrite_tag/rewrite_tag.c:324
#9 0x000000000052378b in cb_rewrite_tag_filter (data=0x7fe96624901f, bytes=368, tag=0x7fe960452b90 "kubelet", tag_len=7, out_buf=0x7fe96504e1f8, out_bytes=0x7fe96504e1e8, f_ins=0x1320a40, filter_context=0x7fe9603f2270, config=0x128e290)
at /home/ec2-user/sabari/fb_v_1.7.5/fluent-bit/plugins/filter_rewrite_tag/rewrite_tag.c:375
#10 0x000000000044cc0c in flb_filter_do (ic=0x7fe96043ee70, data=0x7fe960018ce0, bytes=371, tag=0x7fe96043ef00 "kubelet", tag_len=7, config=0x128e290) at /home/ec2-user/sabari/fb_v_1.7.5/fluent-bit/src/flb_filter.c:118
#11 0x00000000004792ee in flb_input_chunk_append_raw (in=0x7fe960404f50, tag=0x7fe96043ef00 "kubelet", tag_len=7, buf=0x7fe960018ce0, buf_size=371) at /home/ec2-user/sabari/fb_v_1.7.5/fluent-bit/src/flb_input_chunk.c:911
#12 0x000000000048cf47 in in_emitter_add_record (tag=0x7fe9604520c0 "kubelet", tag_len=7, buf_data=0x7fe96045c5d0 "\222\327", buf_size=371, in=0x7fe960404f50) at /home/ec2-user/sabari/fb_v_1.7.5/fluent-bit/plugins/in_emitter/emitter.c:117
#13 0x0000000000523635 in process_record (tag=0x7fe96042ede0 "syslog", tag_len=6, map=..., buf=0x7fe96045c5d0, buf_size=371, keep=0x7fe96504e510, ctx=0x7fe960404740) at /home/ec2-user/sabari/fb_v_1.7.5/fluent-bit/plugins/filter_rewrite_tag/rewrite_tag.c:324
#14 0x000000000052378b in cb_rewrite_tag_filter (data=0x7fe96045c5d0, bytes=371, tag=0x7fe96042ede0 "syslog", tag_len=6, out_buf=0x7fe96504e5a8, out_bytes=0x7fe96504e598, f_ins=0x1322050, filter_context=0x7fe960404740, config=0x128e290)
at /home/ec2-user/sabari/fb_v_1.7.5/fluent-bit/plugins/filter_rewrite_tag/rewrite_tag.c:375
#15 0x000000000044cc0c in flb_filter_do (ic=0x7fe9604519d0, data=0x7fe960013630, bytes=177, tag=0x7fe96029a320 "syslog", tag_len=6, config=0x128e290) at /home/ec2-user/sabari/fb_v_1.7.5/fluent-bit/src/flb_filter.c:118
#16 0x00000000004792ee in flb_input_chunk_append_raw (in=0x12c30e0, tag=0x7fe96029a320 "syslog", tag_len=6, buf=0x7fe960013630, buf_size=177) at /home/ec2-user/sabari/fb_v_1.7.5/fluent-bit/src/flb_input_chunk.c:911
#17 0x0000000000491487 in process_content (file=0x7fe9602a11b0, bytes=0x7fe96504e858) at /home/ec2-user/sabari/fb_v_1.7.5/fluent-bit/plugins/in_tail/tail_file.c:367
#18 0x000000000049316b in flb_tail_file_chunk (file=0x7fe9602a11b0) at /home/ec2-user/sabari/fb_v_1.7.5/fluent-bit/plugins/in_tail/tail_file.c:994
#19 0x000000000048d9ba in in_tail_collect_event (file=0x7fe9602a11b0, config=0x128e290) at /home/ec2-user/sabari/fb_v_1.7.5/fluent-bit/plugins/in_tail/tail.c:261
#20 0x0000000000498277 in tail_fs_event (ins=0x12c30e0, config=0x128e290, in_context=0x7fe96029dec0) at /home/ec2-user/sabari/fb_v_1.7.5/fluent-bit/plugins/in_tail/tail_fs_inotify.c:268
#21 0x000000000044c6cd in flb_input_collector_fd (fd=215, config=0x128e290) at /home/ec2-user/sabari/fb_v_1.7.5/fluent-bit/src/flb_input.c:1004
#22 0x000000000045c5d8 in flb_engine_handle_event (config=0x128e290, mask=1, fd=215) at /home/ec2-user/sabari/fb_v_1.7.5/fluent-bit/src/flb_engine.c:363
#23 flb_engine_start (config=0x128e290) at /home/ec2-user/sabari/fb_v_1.7.5/fluent-bit/src/flb_engine.c:624
#24 0x00000000004422db in flb_lib_worker (data=0x128e260) at /home/ec2-user/sabari/fb_v_1.7.5/fluent-bit/src/flb_lib.c:493
#25 0x00007fe965e0a6db in start_thread (arg=0x7fe96504f700) at pthread_create.c:463

Note: We never got into this scenario , when we didn't use filesytem based storage.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Stalewaiting-for-userWaiting for more information, tests or requested changes

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions