Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memory leaks when running fluentd in TLS mode #2086

Closed
samitpal opened this issue Jul 24, 2018 · 2 comments
Closed

Memory leaks when running fluentd in TLS mode #2086

samitpal opened this issue Jul 24, 2018 · 2 comments

Comments

@samitpal
Copy link

samitpal commented Jul 24, 2018

  • Fluentd version: version 1.1
  • Environment information : Ubuntu 16.01
  • Fluentd configuration used below

---- fluentd config -----

<source>
  @type monitor_agent
  bind 0.0.0.0
  port 24220
</source>

<source>
  @type forward
  @id forward
  port 24225
  bind 0.0.0.0

  @log_level debug
  <security>
    self_hostname myhost
    shared_key veryverysecret
  </security>
  <transport tls>
    version TLSv1_2
    ca_path  /home/ubuntu/client_auth/certs/ca.crt.pem
    cert_path  /home/ubuntu/client_auth/certs/server.crt.pem
    private_key_path  /home/ubuntu/client_auth/private/server.key.pem
    client_cert_auth true
  </transport>
</source>

 <filter logs.**>
     @type record_transformer
     @id record_transformer
     enable_ruby true
     <record>
         # the timestamp field has be named as @timestamp for kibana to work properly.
         @timestamp ${time.strftime('%Y %b %d %H:%M:%S')}
     </record>
 </filter>

<match logs.**>
  @type copy
  @id copy
  <store>
   @type stdout
   @id stdout
  </store>
</match>
  • When I run fluentd with the above configs and every time I hit port 24225 with nc command as follows
    $ nc -vz localhost 24225
    I can see server connections piling up under _server_connections. I verify that using the following command
    $ curl -s "localhost:24220/api/plugins.json?debug=1"|jq .|grep -A 5 "_server_connections"

We use tcp health checks on our AWS NLB load balancers which sends tcp connections similar to what I did above manually which I think is resulting in a lot of dangling tcp connections. That in turn is leaking memory.

@repeatedly
Copy link
Member

repeatedly commented Jul 25, 2018

_server_connection leak is caused by close timing issue.
Following patch fixes this problem.

diff --git a/lib/fluent/plugin_helper/server.rb b/lib/fluent/plugin_helper/server.rb
index bd9bb92..2a1435f 100644
--- a/lib/fluent/plugin_helper/server.rb
+++ b/lib/fluent/plugin_helper/server.rb
@@ -213,8 +213,10 @@ module Fluent
         socket_option_setter.call(sock)
         close_callback = ->(conn){ @_server_mutex.synchronize{ @_server_connections.delete(conn) } }
         server = Coolio::TCPServer.new(sock, nil, EventHandler::TCPServer, socket_option_setter, close_callback, @log, @under_plugin_development, block) do |conn|
-          @_server_mutex.synchronize do
-            @_server_connections << conn
+          unless conn.closing
+            @_server_mutex.synchronize do
+              @_server_connections << conn
+            end
           end
         end
         server.listen(backlog) if backlog
@@ -227,8 +229,10 @@ module Fluent
         socket_option_setter.call(sock)
         close_callback = ->(conn){ @_server_mutex.synchronize{ @_server_connections.delete(conn) } }
         server = Coolio::TCPServer.new(sock, nil, EventHandler::TLSServer, context, socket_option_setter, close_callback, @log, @under_plugin_development, block) do |conn|
-          @_server_mutex.synchronize do
-            @_server_connections << conn
+          unless conn.closing
+            @_server_mutex.synchronize do
+              @_server_connections << conn
+            end
           end
         end
         server.listen(backlog) if backlog
@@ -538,6 +542,8 @@ module Fluent
         end
 
         class TCPServer < Coolio::TCPSocket
+          attr_reader :closing
+
           def initialize(sock, socket_option_setter, close_callback, log, under_plugin_development, connect_callback)
             raise ArgumentError, "socket must be a TCPSocket: sock=#{sock}" unless sock.is_a?(TCPSocket)
 
@@ -618,6 +624,8 @@ module Fluent
         end
 
         class TLSServer < Coolio::Socket
+          attr_reader :closing
+
           # It can't use Coolio::TCPSocket, because Coolio::TCPSocket checks that underlying socket (1st argument of super) is TCPSocket.
           def initialize(sock, context, socket_option_setter, close_callback, log, under_plugin_development, connect_callback)
             raise ArgumentError, "socket must be a TCPSocket: sock=#{sock}" unless sock.is_a?(TCPSocket)

@samitpal
Copy link
Author

Thanks! I will wait for the change to make its way to a future release.

repeatedly added a commit that referenced this issue Jul 27, 2018
server-helper: Fix connection leak by close timing issue. fix #2086
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants