Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

out_file: Conflicts occur when appending to the same file with multiple workers #3805

Closed
daipom opened this issue Jun 30, 2022 · 1 comment
Closed
Assignees
Labels
bug Something isn't working

Comments

@daipom
Copy link
Contributor

daipom commented Jun 30, 2022

Describe the bug

out_file can corrupt the output file if append true and workers > 1.

  • The full reproduction step is attached.
  • I make use of flush_at_shutdown in the attached procedure to make the reproduction easier.
  • However, it should occur on normal writes as well.

We should implement a proper lock mechanism as we do for append false

To Reproduce

fluentd.conf

<system>
  workers 8
</system>

<source>
  @type dummy
  tag test.log
  rate 100
  dummy {"message": "This is the test message"}
</source>

<match test.**>
  @type file
  path /test/fluentd/log/test
  append true
  <buffer>
    @type "file"
    path /test/fluentd/buffer
    flush_mode interval
    flush_interval 60m
    flush_at_shutdown true
  </buffer>
</match>

Script to check the output file

check.py

import sys
import json

all_line_counts = 0
broken_line_counts = 0

for line in sys.stdin:
    try:
        json.loads(line.strip().split(maxsplit=2)[2])
    except Exception:
        broken_line_counts += 1
        print(all_line_counts, line)
    all_line_counts += 1

print(f"{broken_line_counts} / {all_line_counts} lines are broken.")

Operation

  1. Run the fluentd with the config
  2. Wait a few seconds
  3. Stop the fluentd (This flushes all buffers)
  4. Check the output file: $ python3 check.py < {the output filepath}

Then, you can see some lines broken.

4272 2022-06-30T17:01:37+09:00	test.log	{"message":"T2022-06-30T17:01:17+09:00	test.log	{"message":"This is the test message"}

4493 2022-06-30T17:01:19+09:00	testhis is the test message"}

5157 2022-06-30T17:01:21+09:00	test.log	{"message":"This is the t-06-30T17:01:39+09:00	test.log	{"message":"This is the test message"}

5223 .log	{"message":"This is the test message"}

5666 2022-06-30T17:01est message"}

6109 2022-06-30T17:01:26+09:00	test.log	{"message"::23+09:00	test.log	{"message":"This is the test message"}

6773 2022-06-30T17:01:19+09:00	test"This is the test message"}

7437 2022-06-30T17:01:21+09:00	test.log	{"message":"This is the t22-06-30T17:01:28+09:00	test.log	{"message":"This is the test message"}

8101 2022-06-30T17:01og	{"message":"This is the test message"}

8765 2022-06-30T17:01:26+09:00	test.log	{"message":t message"}

9208 2022-06-30T17:01:37+09:00	test.log	{"message":"T"This is the test message"}

10094 2022-06-30T17:01:30+09:00	test.lhis is the test message"}

10382 og	{"message":"This is the test message"}

10603 2022-06-30T17:01:32+09:00	test.log	{"message":"This is the tes-06-30T17:01:39+09:00	test.log	{"message":"This is the test message"}

10669 t message"}

12285 2022-06-30T17:01:19+09:00	test:24+09:00	test.log	{"message":"This is the test message"}

12506 2022-06-30T17:01:26+09:00	test.log	{"message":.log	{"message":"This is the test message"}

12727 2022-06-30T17:01:21+09:00	test.log	{"message":"This is the t"This is the test message"}

12949 20est message"}

13392 2022-06-30T17:01:30+09:00	test.l:24+09:00	test.log	{"message":"This is the test message"}

13613 2022-06-30T17:01:26+09:00	test.log	{"message":og	{"message":"This is the test message"}

13834 2022-06-30T17:01:32+09:00	test.log	{"message":"This is the tes"This is the test message"}

14056 20t message"}

14499 2022-06-30T17:01:30+09:00	test.l5+09:00	test.log	{"message":"This is the test message"}

14720 2022-06-30T17:01:37+09:00	test.log	{"message":"Tog	{"message":"This is the test message"}

15163 2022t message"}

26 / 18240 lines are broken.

Expected behavior

out_file outputs all lines correctly,
or the configuration is validated.

Your Environment

- Fluentd version: fluentd 1.15.0
- Operating system: Ubuntu 20.04.04 LTS


$ cat /etc/os-release
NAME="Ubuntu"
VERSION="20.04.4 LTS (Focal Fossa)"
ID=ubuntu
ID_LIKE=debian
PRETTY_NAME="Ubuntu 20.04.4 LTS"
VERSION_ID="20.04"
HOME_URL="https://www.ubuntu.com/"
SUPPORT_URL="https://help.ubuntu.com/"
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
VERSION_CODENAME=focal
UBUNTU_CODENAME=focal


- Kernel version: 5.13.0-44-generic

Your Configuration

I Wrote it in `To Reproduce` section.

Your Error Log

None.

Additional context

  • One solution is to use "worker_id" to make the output path unique, but it's arguably suboptimal.
  • We should ensure the exclusive access by implementing a file locking.
  • I'm going to fix this problem.
@ashie ashie added the bug Something isn't working label Jun 30, 2022
@daipom
Copy link
Contributor Author

daipom commented Jul 1, 2022

This problem is fixed by #3808.
Thanks for the fix!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants