Fluentd retains excessive amounts of memory after handling traffic peaks #1657

joell · 2017-08-05T10:31:06Z

When setting up a simple, 2-part Fluentd (TCP -> forwarder) -> (forwardee -> disk configuration) and giving it 5 million JSON objects to process all at once, resident-set memory consumption jumps from an inital 30MB to between 200-450MB, and does not come back down after computation is complete. This is observed using version 2.3.5-1.el7 of the TD Agent RPM package running on CentOS 7. (The version of Fluentd in that package is 0.12.36.)

Steps to reproduce:

# define and start a simple TCP forwarder
$ cat > test-in.conf <<'EOF'
<source>
  @type  tcp
  tag    testing
  format json
  port   10130
</source>

<match **>
  @type forward
  require_ack_response true
  flush_interval 10s

  <server>
    host 127.0.0.1
    port 10131
  </server>
</match>
EOF
$ td-agent --no-supervisor -c test-in.conf &

# define and start a simple forwadee that logs to disk
$ cat > test-out.conf <<'EOF'
<source>
   @type forward
   port  10131
</source>

<match **>
  @type              file
  format             json
  path               "/tmp/fluent-test"
  flush_interval     10s
  buffer_chunk_limit 16m
  compress           gzip
</match>
EOF
$ td-agent --no-supervisor -c test-out.conf &

# observe initial memory consumption
$ ps -o pid,vsz,rss,cmd | grep '[t]d-'
 4254 431724 30544 /opt/td-agent/embedded/bin/ruby /sbin/td-agent --no-supervisor -c test-out.conf
 4259 498816 30052 /opt/td-agent/embedded/bin/ruby /sbin/td-agent --no-supervisor -c test-in.conf

# pump a bunch of data through it
$ seq 1 5000000 | sed 's/.*/{"num": &, "filler": "this is filler text to make the event larger"}/' > /dev/tcp/localhost/10130 &

# wait for all data to flush to disk and CPU to return to idle
$ watch -n1 'ls -lt /tmp/fluent-test.* | head -5'

# observe final memory consumption
$ ps -o pid,rss,cmd | grep '[t]d-'
 4338 628248 288972 /opt/td-agent/embedded/bin/ruby /sbin/td-agent --no-supervisor -c test-out.conf
 4343 1023868 461616 /opt/td-agent/embedded/bin/ruby /sbin/td-agent --no-supervisor -c test-in.conf

As you can see from the RSS numbers, each td-agent process started out around 30MB and they ended at ~290MB and ~460MB, respectively. Neither process will release that memory if you wait a while. (In the real-world staging system we initially discovered this on, memory consumption of the test-out.conf-equivalent configuration reached over 3GB, and the test-in.conf-equivalent was a Fluent Bit instance exhibiting a recently-fixed duplication bug.)

Reviewing a Fluentd-related kubernetes issue during our own diagnostics, we noticed that the behavior we observed seemed similar to the Fluentd behavior described there when built without jemalloc. This led us to check if the td-agent binary we were using was in fact linked with jemalloc. According to the FAQ, jemalloc is used when building the Treasure Data RPMs, and though we found jemalloc libraries installed on the system, we couldn't find any existence of jemalloc in the process running in memory. Specifically, we tried the following things:

# given a td-agent process with PID 4343...

# no jemalloc shared library mentioned in the memory mapping
$ pmap 4343 | grep jemalloc
# ... so it doesn't look like it's dynamically linked

# grab the entire memory space and search it for references to jemalloc
$ gcore 4343
$ strings core.4343 | fgrep jemalloc

# ... but if it were statically linked, you'd expect to find some of these strings
$ strings /opt/td-agent/embedded/lib/libjemalloc.a | fgrep jemalloc

In short, this leads us to wonder... are the binaries invoked by td-agent actually linked with jemalloc? If they are not, is the old memory fragmentation problem that jemalloc solved what we are observing here? (And if they aren't, am I raising this issue in the wrong place, and if so where should I raise it?)

The text was updated successfully, but these errors were encountered:

repeatedly · 2017-08-07T21:59:06Z

Did you try this setting?

https://docs.fluentd.org/v0.12/articles/performance-tuning#reduce-memory-usage

joell · 2017-08-09T16:16:36Z

Yes, even with constraints on the oldobject factor, the problem persists. In fact, even with more draconion restrictions on the garbage collector the problem persists, e.g.:

$ RUBY_GC_HEAP_OLDOBJECT_LIMIT_FACTOR=0.9 \
  RUBY_GC_HEAP_GROWTH_FACTOR=1.05         \
  RUBY_GC_MALLOC_LIMIT_MAX=16777216       \
  RUBY_GC_OLDMALLOC_LIMIT_MAX=16777216    \
  td-agent --no-supervisor -c test.conf &

Coupling that output td-agent process with an input of ...

$ td-agent-bit -i tcp://127.0.0.1:10130 -t test -o forward://127.0.0.1:10131 &
$ seq 1 3000 | sed 's/.*/{"num": &, "filler": "this is filler text to make the event larger"}/' > /dev/tcp/localhost/10130 &

... sees the td-agent resident-set memory jump to around ~650MB and hold there, even long after all data has been processed. (Note: I'm using Fluent Bit 0.11.15 here, which has a nasty data duplication problem which is useful in stressing td-agent with input in this test.)

Honestly, it seems like the real issue here is the Ruby 2.1 garbage collector; as far as I can tell, it never releases memory to the OS that it allocates, thereby damning any process that has an even momentary need for a large quantity of memory to hold on to that memory for the rest of its lifetime. (Please correct me if I'm wrong here.)

If the Ruby gc issue cannot be fixed by some form of additional configuration, then perhaps Fluentd could use some type of backpressure mechanism to avoid ingesting input faster than it can truly process it and avoid accumulating large queues in memory?

kladiv · 2019-06-27T10:41:07Z

+1

ganmacs · 2019-12-23T03:03:04Z

it's a stale issue. I'm closing. if you have any problem, updating fluentd and ruby might help it.

joell · 2019-12-23T17:25:08Z

I have re-tested this issue using the latest td-agent 3.5.1 RPM package for RHEL 7. That package includes Fluentd 1.7.4 and embeds Ruby 2.4.0. The problem still remains exactly as it was originally reported.

Nothing has been fixed, despite the intervening 2 years. The issue should be re-opened.

P.S. You as the Fluentd developers and project administrators collectively have the right to run your project however you see fit. However, making claims of Fluentd's suitability for production use and its performance on your website are not consistent with this kind of callous disregard for the validity of an easily reproducible issue that has serious impacts on Fluentd's production suitability. I gave very simple steps to reproduce this issue, and it only took me a few minutes to download the latest TD-Agent RPM, install it, and copy and paste the commands from my original report to see that the outcome remained the same. You could have trivially done the same. The fact that you could not be bothered to do so but instead chose to try to bury and ignore this problem speaks volumes, especially as another user indicated as recently as June 27 that it is still affecting people. If you truly feel that is the appropriate response, then you should also remove the false claims of performance and production-suitability from the Fluentd website.

ganmacs · 2019-12-24T02:11:12Z

The problem still remains exactly as it was originally reported.

Thank you re-test it. then I should re-open this issue.

cede87 · 2020-07-02T12:08:32Z

I'm having the same issue using td-agent 3.8.0 RPM package for Amazon2. That package includes Fluentd 1.11.1 and embeds Ruby 2.4.10. Any news here? Work in progress? Our fluentD is in production with 4 aggregators ingesting at the same time into ElasticSearch. So far stable but the memory is normalizing always close to 100% leaving just 250-300 MB free which I think doesn't have any sense... I don't know what else to test... I'm checking this since weeks, changing many configurations and different versions without any clue. Even I tried to adjust the GC variables as @joell did in the past but the behaviour is the same.

You can see here how a new aggregator doesn't release memory until be close to 100%. The unique thing we improve adding a new aggregator is reduce a little bit the CPU usage.

Our buffer size (I think is in sync with this memory issue) as I read in other threads with memory buffer the behaviour is different...

Total network I/O (when the peacks are related also with the amount of memory needed)

Please help, thanks in advance.

joell · 2020-07-02T15:43:25Z

@cede87: As I noted in my comment on April 9, 2017 the underlying issue here appears to be in the default Ruby garbage collector and memory allocator.

The Fluentd developers could directly avoid this by applying a backpressure mechanism or spooling incoming data to disk instead of in memory.

Alternatively, the Ruby memory allocation issue can be indirectly avoided by replacing or manipulating the Ruby memory allocator. One method is to replace the allocator with jemalloc (though different versions are reported to be more effective than others); this approach was documented as being done by the Fluentd devs, but as I noted in the original issue text it doesn't look like jemalloc is actually used in the build that produces the RPM. Another method would be to try to manipulate the allocator's behavior through things like the MALLOC_ARENA_MAX environment variable.

A summary of the underlying problem and some of the techniques you might be able to try -- including going so far as patching the Ruby garbage collector yourself -- can be found in this article.

Best of luck.

cede87 · 2020-07-03T14:00:00Z

@joell thanks for the quick reply. I did the same checks you did in the past and I could verify that we are using jemalloc with our RPM installation. So if I'm not mistaken we are not able to use MALLOC_ARENA_MAX environment variable... even so we are suffering the same problems. Any suggestion?

[root@x ~]# pmap 4057 | grep jemalloc
00007fba1a234000 292K r-x-- libjemalloc.so.2
00007fba1a27d000 2048K ----- libjemalloc.so.2
00007fba1a47d000 8K r---- libjemalloc.so.2
00007fba1a47f000 4K rw--- libjemalloc.so.2

Thanks!

joell · 2020-07-03T14:27:07Z

@cede87: The presence of libjemalloc.so.2 indicates you might be running jemalloc 5 instead of jemalloc 3. During a discussion about making jemalloc the default allocator for Ruby, it was noted:

jemalloc 3.6 is slow but space efficient. jemalloc 5.1 is faster but almost as bad with space as untuned glibc.

Glancing at package content listings online, it looks like you would see libjemalloc.so.1 if you were using the jemalloc 3 series, which demonstrates the small heap sizes. You might consider rebuilding Fluentd with jemalloc 3 instead.

cede87 · 2020-07-06T13:20:41Z

@joell many many thanks for your suggestions. I was able to change jemalloc version from 5.x to 3.6.0 using td-agent 3.8.0 in one server (just to test). Notice the difference.

So I can confirm the following:

td-agent 3.8.0 is using jemalloc 5.x
jemalloc 5.x is faster but almost as bad with space as untuned glibc.
downgrade jemalloc to 3.6.0 solved the memory issues.

I think FluentD developers should take a look on this.
Thanks again,

Adhira-Deogade · 2020-08-27T10:14:29Z

Would appreciate a solution, the memory usage has peaked and isn't falling down at all.

cede87 · 2020-08-28T07:11:49Z

Hi @Adhira-Deogade please follow my notes to be able to downgrade jemalloc version.

yum groupinstall "Development Tools"
wget https://github.com/jemalloc/jemalloc/releases/download/3.6.0/jemalloc-3.6.0.tar.bz2
bunzip2 jemalloc-3.6.0.tar.bz2
tar xvf jemalloc-3.6.0.tar
cd jemalloc-3.6.0
./configure
make
make install
create symbolic links: ln -s file link

Note: First delete the files in: cd /opt/td-agent/embedded/lib
rm libjemalloc.a libjemalloc_pic.a libjemalloc.so.2 libjemalloc.so
ln -s /usr/local/lib/libjemalloc.a /opt/td-agent/embedded/lib/libjemalloc.a
ln -s /usr/local/lib/libjemalloc_pic.a /opt/td-agent/embedded/lib/libjemalloc.pic.a
ln -s /usr/local/lib/libjemalloc.so.1 /opt/td-agent/embedded/lib/libjemalloc.so.2
ln -s libjemalloc.so.2 libjemalloc.so

Note: If you do ls you should see these symbolic links:
/opt/td-agent/embedded/lib/libjemalloc.a -> /usr/local/lib/libjemalloc.a
/opt/td-agent/embedded/lib/libjemalloc.pic.a -> /usr/local/lib/libjemalloc_pic.a
/opt/td-agent/embedded/lib/libjemalloc.so -> libjemalloc.so.2
/opt/td-agent/embedded/lib/libjemalloc.so.2 -> /usr/local/lib/libjemalloc.so.1

Restart td-agent
Verify
ps -aux | grep td-agent -> Copy the ID of one process
pmap ID | grep jemalloc -> Must be libjemalloc.so.1
[root@ip-10-103-149-184 lib]# pmap 28103 | grep jemalloc
00007ff35f4f9000 276K r-x-- libjemalloc.scd o.1
00007ff35f53e000 2048K ----- libjemalloc.so.1
00007ff35f73e000 8K r---- libjemalloc.so.1
00007ff35f740000 4K rw--- libjemalloc.so.1

I hope it helps you,
Daniel

github-actions · 2021-01-26T10:10:55Z

This issue has been automatically marked as stale because it has been open 90 days with no activity. Remove stale label or comment or this issue will be closed in 30 days

joell · 2021-01-28T18:08:15Z

This issue should remain open until it is resolved. It has continued to affect people since it was reported in 2017, and the current best workarounds are laborious and invasive.

github-actions · 2021-04-30T10:02:12Z

This issue has been automatically marked as stale because it has been open 90 days with no activity. Remove stale label or comment or this issue will be closed in 30 days

joell · 2021-04-30T13:52:37Z

I repeat:

This issue should remain open until it is resolved. It has continued to affect people since it was reported in 2017, and the current best workarounds are laborious and invasive.

cede87 · 2021-05-04T07:10:00Z

This issue is still affecting us. The unique solution is changing the jemalloc version. Currently is what we are doing even in production environments. Which is not the best solution.

ashie · 2021-05-05T14:41:04Z

Thank you for notifying it.

# no jemalloc shared library mentioned in the memory mapping
$ pmap 4343 | grep jemalloc
# ... so it doesn't look like it's dynamically linked

td-agent loads jemalloc by LD_PRELOAD and this environment variable is set only by init script or systemd unit file.
Launching td-agent command manually doesn't set it, this is the why you didn't see jemalloc in pmap.

https://github.com/treasure-data/omnibus-td-agent/blob/b0a421e49b123c0c77ae897fbd1f782bb226a9ed/templates/etc/init.d/rpm/td-agent#L186-L188

if [ -f "${TD_AGENT_HOME}/embedded/lib/libjemalloc.so" ]; then
  export LD_PRELOAD="${TD_AGENT_HOME}/embedded/lib/libjemalloc.so"
fi

https://github.com/treasure-data/omnibus-td-agent/blob/master/templates/etc/systemd/td-agent.service.erb#L11

Environment=LD_PRELOAD=<%= install_path %>/embedded/lib/libjemalloc.so

td-agent 3.8.0 is using jemalloc 5.x

td-agent 3 uses jemalloc 4.5.0, not 5.x

https://github.com/treasure-data/omnibus-td-agent/blob/b0a421e49b123c0c77ae897fbd1f782bb226a9ed/config/projects/td-agent3.rb#L24

override :jemalloc, :version => '4.5.0'

ashie · 2021-05-05T15:06:48Z

Hmm, I confirmed that jemalloc-3.6.0 consumes fewer memory than jemalloc 5.2.1 (td-agent 4.1.0's default) in this case:

jemalloc 3.6.0:

  27672 450484 231048 /opt/td-agent/bin/ruby -Eascii-8bit:ascii-8bit /usr/sbin/td-agent -c test-in.conf --under-supervisor
  27685 364460 139124 /opt/td-agent/bin/ruby -Eascii-8bit:ascii-8bit /usr/sbin/td-agent -c test-out.conf --under-supervisor

jemalloc 5.2.1:

  28148 554264 255248 /opt/td-agent/bin/ruby -Eascii-8bit:ascii-8bit /usr/sbin/td-agent -c test-in.conf --under-supervisor
  28161 472848 187796 /opt/td-agent/bin/ruby -Eascii-8bit:ascii-8bit /usr/sbin/td-agent -c test-out.conf --under-supervisor

You can switch jemalloc version easily by LD_PRELOAD:

$ wget https://github.com/jemalloc/jemalloc/releases/download/3.6.0/jemalloc-3.6.0.tar.bz2
$ tar xvf jemalloc-3.6.0.tar.bz2
$ cd jemalloc-3.6.0
$ ./configure --prefix=/opt/jemalloc-3.6.0
$ make
$ sudo make install
$ LD_PRELOAD=/opt/jemalloc-3.6.0/lib/libjemalloc.so td-agent

But I'm not sure it's always efficient and worth to replace.

joell · 2021-05-11T14:29:38Z

@ashie: Thank you for looking into this issue and confirming yourself what the community has been reporting.

Regarding the efficiency of jemalloc 3.x vs 5.x, the common trend I've read is that 5.x may be a bit faster. However, as Fleuntd is both advertised for production use and frequently used in production environments, I would argue that stability is more important than speed here. We have run into issues using Fluentd in production where its memory consumption has grown to the point where it has actively hampered other more important production services on a host. Ultimately, we've had to move away from Fluentd for certain applications because of this bug.

For the sake of ensuring system stability, I would argue for making jemalloc 3.x the default allocator. If people need greater performance and are confident their use case will not trigger this memory consumption issue, they could use jemalloc 5.x instead via LD_PRELOAD.

I urge that the default Fluentd configuration prioritize stability over performance.

ashie · 2021-05-12T01:50:18Z

Thanks for your opinion. I've opened an issue for td-agent: fluent/fluent-package-builder#305

weakcamel · 2021-11-03T07:49:41Z

I can confirm that the issue is still very much present and valid on 4.2.0 td-agent / Ubuntu Bionic. We're using it to report on Artifactory stats as part of Jfrog monitoring platform so configuration is pretty much default.

Fluentd's memory usage is creeping up a lot so as a workaround we've applied cgroup limits to it (50% of 128GB RAM utilization of the host which is a massive 64 GB). It took only a few hours after restarting td-agent service for it to be OOM killed:

Nov 02 18:55:57 host.example.com kernel: ruby invoked oom-killer: gfp_mask=0xcc0(GFP_KERNEL), order=0, oom_score_adj=0
Nov 02 18:55:57 host.example.com kernel: CPU: 38 PID: 16039 Comm: ruby Tainted: P           O      5.4.0-77-generic #86~18.04.1-Ubuntu
Nov 02 18:55:57 host.example.com kernel: Hardware name: Dell Inc. PowerEdge R530/0CN7X8, BIOS 2.4.2 01/09/2017
Nov 02 18:55:57 host.example.com kernel: Call Trace:
Nov 02 18:55:57 host.example.com kernel:  dump_stack+0x6d/0x8b
Nov 02 18:55:57 host.example.com kernel:  dump_header+0x4f/0x200
Nov 02 18:55:57 host.example.com kernel:  oom_kill_process+0xe6/0x120
Nov 02 18:55:57 host.example.com kernel:  out_of_memory+0x109/0x510
Nov 02 18:55:57 host.example.com kernel:  mem_cgroup_out_of_memory+0xbb/0xd0
Nov 02 18:55:57 host.example.com kernel:  try_charge+0x79a/0x7d0
Nov 02 18:55:57 host.example.com kernel:  ? __alloc_pages_nodemask+0x153/0x320
Nov 02 18:55:57 host.example.com kernel:  mem_cgroup_try_charge+0x75/0x190
Nov 02 18:55:57 host.example.com kernel:  mem_cgroup_try_charge_delay+0x22/0x50
Nov 02 18:55:57 host.example.com kernel:  __handle_mm_fault+0x8d5/0x1270
Nov 02 18:55:57 host.example.com kernel:  ? __switch_to_asm+0x40/0x70
Nov 02 18:55:57 host.example.com kernel:  handle_mm_fault+0xcb/0x210
Nov 02 18:55:57 host.example.com kernel:  __do_page_fault+0x2a1/0x4d0
Nov 02 18:55:57 host.example.com kernel:  do_page_fault+0x2c/0xe0
Nov 02 18:55:57 host.example.com kernel:  page_fault+0x34/0x40
Nov 02 18:55:57 host.example.com kernel: RIP: 0033:0x7f1f16426b23
Nov 02 18:55:57 host.example.com kernel: Code: fe 6f 64 16 e0 c5 fe 6f 6c 16 c0 c5 fe 6f 74 16 a0 c5 fe 6f 7c 16 80 c5 fe 7f 07 c5 fe 7f 4f 20 c5 fe 7f 57 40 c5 fe 7f 5f 60 <c5> fe 7f 64 17 e0 c5
Nov 02 18:55:57 host.example.com kernel: RSP: 002b:00007f0bafa270c8 EFLAGS: 00010202
Nov 02 18:55:57 host.example.com kernel: RAX: 00007f098866df61 RBX: 00007f1ed0df8f90 RCX: 000000000426df61
Nov 02 18:55:57 host.example.com kernel: RDX: 00000000000000fc RSI: 00007f1ee3a3e800 RDI: 00007f098866df61
Nov 02 18:55:57 host.example.com kernel: RBP: 00000000071c2454 R08: 00007f0984400000 R09: ffffffffffffffff
Nov 02 18:55:57 host.example.com kernel: R10: 00007f1f14c5e100 R11: 00007f09885b4192 R12: 00007f1ee3a3e800
Nov 02 18:55:57 host.example.com kernel: R13: 00000000000000fc R14: 0000000000000001 R15: 000000000426e05d
Nov 02 18:55:57 host.example.com kernel: memory: usage 65894512kB, limit 65894512kB, failcnt 29724
Nov 02 18:55:57 host.example.com kernel: memory+swap: usage 0kB, limit 9007199254740988kB, failcnt 0
Nov 02 18:55:57 host.example.com kernel: kmem: usage 138544kB, limit 9007199254740988kB, failcnt 0
Nov 02 18:55:57 host.example.com kernel: Memory cgroup stats for /system.slice/td-agent.service:
Nov 02 18:55:57 host.example.com kernel: anon 67332968448
                                                  file 0
                                                  kernel_stack 626688
                                                  slab 3518464
                                                  sock 0
                                                  shmem 0
                                                  file_mapped 0
                                                  file_dirty 0
                                                  file_writeback 811008
                                                  anon_thp 0
                                                  inactive_anon 3250925568
                                                  active_anon 64081309696
                                                  inactive_file 0
                                                  active_file 0
                                                  unevictable 0
                                                  slab_reclaimable 557056
                                                  slab_unreclaimable 2961408
                                                  pgfault 378416214
                                                  pgmajfault 0
                                                  workingset_refault 0
                                                  workingset_activate 0
                                                  workingset_nodereclaim 0
                                                  pgrefill 946474
                                                  pgscan 243287
                                                  pgsteal 91570
                                                  pgactivate 60357
                                                  pgdeactivate 946408
                                                  pglazyfree 0
                                                  pglazyfreed 0
                                                  thp_fault_alloc 0
                                                  thp_collapse_alloc 0
Nov 02 18:55:57 host.example.com kernel: Tasks state (memory values in pages):
Nov 02 18:55:57 host.example.com kernel: [  pid  ]   uid  tgid total_vm      rss pgtables_bytes swapents oom_score_adj name
Nov 02 18:55:57 host.example.com kernel: [  15976]     0 15976    59526     3287   462848     6900             0 fluentd
Nov 02 18:55:57 host.example.com kernel: [  15983]     0 15983 22606458 16437845 134709248    84397             0 ruby
Nov 02 18:55:57 host.example.com kernel: oom-kill:constraint=CONSTRAINT_MEMCG,nodemask=(null),cpuset=/,mems_allowed=0-1,oom_memcg=/system.slice/td-agent.service,task_memcg=/system.slice/td-agent.
Nov 02 18:55:57 host.example.com kernel: Memory cgroup out of memory: Killed process 15983 (ruby) total-vm:90425832kB, anon-rss:65744072kB, file-rss:7308kB, shmem-rss:0kB, UID:0 pgtables:131552kB
Nov 02 18:55:59 host.example.com kernel: oom_reaper: reaped process 15983 (ruby), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB

swordfish444 · 2022-06-06T15:19:00Z

@ashie What is the latest status on this issue? This seems like a HUGE flaw in fluentd. The issue you linked fluent/fluent-package-builder#305 was closed without action taken.

We are facing fluentd OOM issues in production. Please advise.

torbenaa · 2022-12-15T22:27:38Z

Same issue with td-agent 4.4.2 ( fluentd 1.15.3 ) on ubi image ( rhel8 ).

I cannot believe this is not fixed - surely not production ready :(

yangjiel · 2023-04-28T02:18:12Z

I can confirm this behavior on td-agent 4.4.1 fluentd 1.13.3 (c328422) as well.

The memory seems is not dynamically deallocated. Initial startup without traffic, memory consumption is low. After a large traffic then turn to no traffic, memory is still staying at the highest point.

yangjiel · 2023-05-12T02:09:14Z

This issue may relate to #4174

daipom · 2023-07-06T16:49:27Z

Regarding the case where ignore_same_log_interval is the cause, it will be fixed by the following issue and PR.
Thanks @yangjiel !

Watson1978 · 2025-02-28T05:06:26Z

Seems that the huge memory has been used in memory buffer.
https://docs.fluentd.org/buffer/memory

The memory buffer has been used implicitly with out_forward plugin.

I added the parameters to tune the memory buffer:

## test-in.conf
<source>
  @type tcp
  tag testing
  format json
  port 10130
</source>
<match **>
  @type forward
  require_ack_response true
  flush_interval 10s
  <server>
    host 127.0.0.1
    port 10131
  </server>

  ## Tune the buffer parameters
  <buffer tag>
    flush_interval 10s
    chunk_limit_size 1m
    total_limit_size 25m
  </buffer>
</match>

Then, it can reduce the memory usage.

test data generator

require "socket"
require "json"

i = 0

Thread.new do
  s = TCPSocket.open("localhost", 10130)
  data = {"message": "this is filler text to make the event larger"}.to_json

  loop do
    puts "[#{i}]"
    s.puts data
    i += 1
  end

  s.close
end

sleep 60

ganmacs closed this as completed Dec 23, 2019

ganmacs reopened this Dec 24, 2019

github-actions bot added the stale label Jan 26, 2021

github-actions bot removed the stale label Jan 29, 2021

github-actions bot added the stale label Apr 30, 2021

github-actions bot removed the stale label May 1, 2021

ashie added feature request *Deprecated Label* Use enhancement label in general Non fluentd issue labels May 5, 2021

ashie added the bug Something isn't working label May 6, 2021

ashie mentioned this issue May 11, 2021

Collecting logs often leads to memory leaks #3223

Closed

ashie mentioned this issue May 12, 2021

Consider replacing (or adding) jemalloc with v3.6.0 fluent/fluent-package-builder#305

Closed

ashie mentioned this issue Jun 21, 2021

worker-0 randomly SIGKILLed #3413

Closed

ashie mentioned this issue Jul 16, 2021

worker-0 is getting SIGKILLed instead of nicely reloading upon issuing gracefulReload #3341

Closed

ashie added the memory label Mar 15, 2022

yangjiel mentioned this issue May 12, 2023

Large Memory Consumption when ignore_same_log_interval #4174

Closed

ashie mentioned this issue Jul 19, 2023

Try to use Jemalloc 5.3.0 fluent/fluent-package-builder#523

Closed

kenhys added enhancement Feature request or improve operations and removed feature request *Deprecated Label* Use enhancement label in general labels Jul 31, 2024

Watson1978 self-assigned this Feb 20, 2025

Watson1978 linked a pull request Mar 4, 2025 that will close this issue

memory_chunk: clear internal memory use in buffer string #4845

Draft

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fluentd retains excessive amounts of memory after handling traffic peaks #1657

Fluentd retains excessive amounts of memory after handling traffic peaks #1657

joell commented Aug 5, 2017

repeatedly commented Aug 7, 2017

joell commented Aug 9, 2017

kladiv commented Jun 27, 2019

ganmacs commented Dec 23, 2019

joell commented Dec 23, 2019

ganmacs commented Dec 24, 2019

cede87 commented Jul 2, 2020

joell commented Jul 2, 2020

cede87 commented Jul 3, 2020

joell commented Jul 3, 2020

cede87 commented Jul 6, 2020

Adhira-Deogade commented Aug 27, 2020

cede87 commented Aug 28, 2020 •

edited

Loading

github-actions bot commented Jan 26, 2021

joell commented Jan 28, 2021

github-actions bot commented Apr 30, 2021

joell commented Apr 30, 2021

cede87 commented May 4, 2021

ashie commented May 5, 2021

ashie commented May 5, 2021

joell commented May 11, 2021

ashie commented May 12, 2021

weakcamel commented Nov 3, 2021 •

edited

Loading

swordfish444 commented Jun 6, 2022

torbenaa commented Dec 15, 2022

yangjiel commented Apr 28, 2023

yangjiel commented May 12, 2023

daipom commented Jul 6, 2023

Watson1978 commented Feb 28, 2025 •

edited

Loading

Fluentd retains excessive amounts of memory after handling traffic peaks #1657

Fluentd retains excessive amounts of memory after handling traffic peaks #1657

Comments

joell commented Aug 5, 2017

repeatedly commented Aug 7, 2017

joell commented Aug 9, 2017

kladiv commented Jun 27, 2019

ganmacs commented Dec 23, 2019

joell commented Dec 23, 2019

ganmacs commented Dec 24, 2019

cede87 commented Jul 2, 2020

joell commented Jul 2, 2020

cede87 commented Jul 3, 2020

joell commented Jul 3, 2020

cede87 commented Jul 6, 2020

Adhira-Deogade commented Aug 27, 2020

cede87 commented Aug 28, 2020 • edited Loading

github-actions bot commented Jan 26, 2021

joell commented Jan 28, 2021

github-actions bot commented Apr 30, 2021

joell commented Apr 30, 2021

cede87 commented May 4, 2021

ashie commented May 5, 2021

ashie commented May 5, 2021

joell commented May 11, 2021

ashie commented May 12, 2021

weakcamel commented Nov 3, 2021 • edited Loading

swordfish444 commented Jun 6, 2022

torbenaa commented Dec 15, 2022

yangjiel commented Apr 28, 2023

yangjiel commented May 12, 2023

daipom commented Jul 6, 2023

Watson1978 commented Feb 28, 2025 • edited Loading

cede87 commented Aug 28, 2020 •

edited

Loading

weakcamel commented Nov 3, 2021 •

edited

Loading

Watson1978 commented Feb 28, 2025 •

edited

Loading