-
Couldn't load subscription status.
- Fork 1.8k
Closed
Closed
Copy link
Labels
bugfixedwaiting-for-userWaiting for more information, tests or requested changesWaiting for more information, tests or requested changes
Description
Bug Report
Describe the bug
It might be related to corrupted files in storage, or just a large backlog there. On start, fluentbit will try to load all the files, and crash soon afterwards with segfault.
To Reproduce
[2020/02/12 23:24:15] [error] [storage] format check failed: systemd.2/8-1581461952.525678621.flb
[2020/02/12 23:24:15] [error] [storage] format check failed: systemd.2/8-1581467643.662417426.flb
[2020/02/12 23:24:25] [error] [storage] format check failed: tail.0/163-1581549864.900113196.flb
[2020/02/12 23:24:25] [error] [storage] [cio file] cannot map chunk: tail.0/163-1581549864.900113196.flb
[2020/02/12 23:24:25] [error] [storage] format check failed: tail.0/163-1581549865.134681941.flb
[2020/02/12 23:24:25] [error] [storage] [cio file] cannot map chunk: tail.0/163-1581549865.134681941.flb
[engine] caught signal (SIGSEGV)
#0 0x5636a8cf8f0e in cio_file_st_get_meta_len() at lib/chunkio/include/chunkio/cio_file_st.h:72
#1 0x5636a8cf8f42 in cio_file_st_get_content() at lib/chunkio/include/chunkio/cio_file_st.h:93
#2 0x5636a8cf93f1 in cio_chunk_get_content() at lib/chunkio/src/cio_chunk.c:193
#3 0x5636a8a964f5 in flb_input_chunk_flush() at src/flb_input_chunk.c:550
#4 0x5636a8a7c823 in flb_engine_dispatch() at src/flb_engine_dispatch.c:146
#5 0x5636a8a79b22 in flb_engine_flush() at src/flb_engine.c:85
#6 0x5636a8a7b318 in flb_engine_handle_event() at src/flb_engine.c:247
#7 0x5636a8a7b318 in flb_engine_start() at src/flb_engine.c:489
#8 0x5636a89ec813 in main() at src/fluent-bit.c:853
#9 0x7f75cf5ecb96 in ???() at ???:0
#10 0x5636a89ea9d9 in ???() at ???:0
#11 0xffffffffffffffff in ???() at ???:0
Aborted (core dumped)
- Steps to reproduce the problem:
One example we had 1.2GB files stuck in storage. It repro'ed consistently. I moved the files out of storage, ran again, it didn't crash. When I moved them back, it didn't seem to crash. Might be hard to repro, but seems to happen regularly in our prod.
Expected behavior
No crashes
Screenshots
Your Environment
- Version used: 1.3.2
- Configuration:
td-agent-bit.conf: |
[SERVICE]
# Rely on the supervisor service (eg kubelet) to restart
# the fluentbit daemon when the configuration changes.
Config_Watch on
# Given we run in a container stay in the foreground.
Daemon off
Flush 1
HTTP_Server on
HTTP_Listen 0.0.0.0
HTTP_Port 2020
Log_Level warning
Parsers_File parsers.conf
storage.path /var/lib/fluentbit/storage/
storage.sync full
storage.checksum on
# from https://github.com/fluent/fluent-bit/issues/1362#issuecomment-500166931
storage.backlog.mem_limit 100M
Problematic input (one of 3):
[INPUT]
Name tail
Tag kube.<namespace_name>.<pod_name>.<container_name>.<docker_id>
Tag_Regex (?<pod_name>[a-z0-9](?:[-a-z0-9]*[a-z0-9])?(?:\\.[a-z0-9]([-a-z0-9]*[a-z0-9])?)*)_(?<namespace_name>[^_]+)_(?<container_name>.+)-(?<docker_id>[a-z0-9]{64})\.log$
Path /var/log/containers/*.log
Parser docker
DB /var/lib/fluentbit/input_tail_kube.db
Docker_Mode On
Mem_Buf_Limit 50MB
Buffer_Chunk_Size 1MB
Buffer_Max_Size 1MB
Skip_Long_Lines On
storage.type filesystem
Refresh_Interval 10
-
Environment name and version (e.g. Kubernetes? What version?):
k8s -
Server type and version:
-
Operating System and version:
NAME="Ubuntu"
VERSION="18.04.3 LTS (Bionic Beaver)" -
Filters and plugins:
Additional context
Ideally we would see no crashes. Seems after any crash, the storage files are corrupted and can't be loaded by subsequent runs.
Metadata
Metadata
Assignees
Labels
bugfixedwaiting-for-userWaiting for more information, tests or requested changesWaiting for more information, tests or requested changes