-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Buffer's chunk_limit_size
not working as expected on Windows platform
#2713
Comments
Enabling Fluentd Debug logs and re-starting the Fluentd service on Windows also verifies creation of those small sized 1020 buffer chunks of size 0-1 KB each (buffer chunks are created in just 1-2 sec), until fluentd fails to create a new buffer chunk because of system error Stop creating buffer files: error = Too many open files @ rb_sysopen - C:\opt\td-agent\buffer/buffer.b598db0aeee94993f44c0493f7a979962.log" location="C:/opt/td-agent/embedded/lib/ruby/gems/2.4.0/gems/fluentd-1.7.4/lib/fluent/plugin/buffer/file_chunk.rb:291:in `rescue in create_new_chunk'"
|
I was in the same situation. |
@cosmo0920 Could you check this issue? |
I've reproduced this issue with S3 plugin and minio instead of AWS S3: <source>
@type tail
path "C:/inetpub/logs/LogFiles/W3SVC1/*.log"
pos_file "C:\\inetpub\\td-agent-iis.pos"
format /^(?<timestamp>\d+-\d+-\d+\s\d+:\d+:\d+)\s(?<s_ip>[^\s]*)\s(?<cs_method>[^\s]*)\s(?<cs_uri_stem>[^\s]*)\s(?<cs_uri_query>[^\s]*)\s(?<s_port>[^\s]*)\s(?<cs_username>[^\s]*)\s(?<c_ip>[^\s]*)\s(?<cs_user_agent>[^\s]*)\s(?<cs_referer>[^\s]*)\s(?<sc_status>[^\s]*)\s(?<sc_substatus>[^\s]*)\s(?<sc_win32_status>[^\s]*)\s(?<time_taken>[^\s]*)?$/
tag xxxx
keep_time_key true
time_key timestamp
time_format %Y-%m-%d %H:%M:%S
read_from_head true
limit_recently_modified 3d
tag xxxx
</source>
<filter xxxx>
@type record_modifier
<record>
xxxxxx xxxxxx
xxxxxx xxxxxx
time ${record['timestamp']}
</record>
</filter>
<match xxxx>
@type s3
aws_key_id minioadmin
aws_sec_key minioadmin
s3_bucket fluentd
s3_region us-east-1
s3_endpoint http://192.168.10.19:9000/
path xxxx
s3_object_key_format %{path}%{time_slice}/%{index}.%{file_extension}
time_slice_format %Y-%m-%d/%H #%Y%m%d%H
time_format %Y-%m-%dT%H:%M:%S.%L%z
format json
time_key time
include_time_key true
force_path_style true
<buffer tag,time>
@type file
path C:/opt/td-agent/buffer
timekey 3600 # 1 hour partition
chunk_limit_size 64MB
flush_mode interval
flush_interval 30s
flush_thread_count 4
flush_at_shutdown true
</buffer>
</match>
|
I've confirmed that this PR (#2560) is introduced this issue on Windows.
|
Because this data will be differed unexpectedly on Windows. This unexpected differing caused flood of buffer files due to wrong metadata comparision. `hash` method inside Metadata struct should be used on non-Windows environment. This object_id value's unstability is monitored as follows: ``` {:timekey_object_id=>36247560} {:timekey_object_id=>36247560} {:timekey_object_id=>38199640} {:timekey_object_id=>38199640} {:timekey_object_id=>38199640} {:timekey_object_id=>38240520} {:timekey_object_id=>38240520} {:timekey_object_id=>38240520} {:timekey_object_id=>38277560} {:timekey_object_id=>38277560} {:timekey_object_id=>38277560} {:timekey_object_id=>38314220} {:timekey_object_id=>38314220} {:timekey_object_id=>38314220} {:timekey_object_id=>40539060} {:timekey_object_id=>40539060} {:timekey_object_id=>40539060} {:timekey_object_id=>40598620} {:timekey_object_id=>40598620} {:timekey_object_id=>40598620} {:timekey_object_id=>40764680} {:timekey_object_id=>40764680} {:timekey_object_id=>40764680} {:timekey_object_id=>40613600} {:timekey_object_id=>40613600} {:timekey_object_id=>40613600} {:timekey_object_id=>40741660} {:timekey_object_id=>40741660} {:timekey_object_id=>40741660} {:timekey_object_id=>40895360} {:timekey_object_id=>40895360} {:timekey_object_id=>40895360} {:timekey_object_id=>40926520} <snip> ``` Signed-off-by: Hiroshi Hatake <[email protected]>
…indows Don't use timekey.object_id for Metadata instance comparison on Windows. Fix #2713
We are trying to use fluentd on Windows for logs collection, but it seems that buffer section's
chunk_limit_size
is not working on windows.Even though
chunk_limit_size
value is defined as high as 64 MB, fluentd creates a lot of small small chunk files of size 0-1 KB with just 1 log line in each.Because of this the Windows file descriptors limit(for a process) is reached and then fluentd fails with Error Logs saying not able to write new chunk file and failed to flush the buffer.
Steps to reproduce the behavior:
[Windows] Create a sample log file with significant amount of log lines (say 100k)
Below is a sample from IIS Access Log file used for testing
Install td-agent(td-agent-3.5.1-0-x64.msi), configure it using below config file and then start fluentd Service.
Expected behavior:
Fluentd should not create a lot of small small chunk files of size 0-1KB, rather respect the
chunk_size_limit
value of defined 64MB.Testing environment:
Test config:
Error Log:
Within 1-2 sec of starting Fluentd Service, the buffer folder suddenly explodes with 2040 files (buffer.xx.log and buffer.xx.log.meta file combos).
Each of the buffer.xx.log contains only 1 log line event and td-agent log starts showing warn logs stating Can't create new buffer file.
Additional context:
Using same config inside a fluent Debian container fluent/fluentd:v1.7-debian-1 works fine on both Linux and Windows 10 Pro(with linux container support).
Problem happens only while running fluentd directly as a service on Windows or using same config inside a fluent Windows container Windows Server 2016(can't run linux container).
So I suspect there is something wrong specifically with fluent packaging on Windows. Please help!
The text was updated successfully, but these errors were encountered: