Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

out_forward unrecoverable error when upgrading from 1.6.3 to 1.7.1 #2626

Closed
stonith opened this issue Sep 20, 2019 · 2 comments · Fixed by #2635
Closed

out_forward unrecoverable error when upgrading from 1.6.3 to 1.7.1 #2626

stonith opened this issue Sep 20, 2019 · 2 comments · Fixed by #2635
Assignees
Labels
bug Something isn't working

Comments

@stonith
Copy link

stonith commented Sep 20, 2019

Describe the bug
After upgrading to 1.7.1 from 1.6.3 most of our forwarders throw an out_forward unrecoverable error and never recovers. 1.6.3 does throw a similar error but recovers.

To Reproduce
Upgrade to 1.7.1 from 1.6.3

Expected behavior
out_forward server health checking should recover

Your Environment

  • Fluentd or td-agent version: 1.6.3 -> 1.7.1
  • Operating system: debian buster
  • Kernel version: 4.14.119+

If you hit the problem with older fluentd version, try latest version first.

Your Configuration

  <source>
    @type monitor_agent
    bind "0.0.0.0"
    port 24220
  </source>
  <source>
    @type prometheus
  </source>
  <source>
    @type prometheus_output_monitor
  </source>
  <system>
    log_level info
  </system>
  <match fluent.**>
    @type null
  </match>
  <source>
    @type tail
    format syslog
    path "/var/log/startupscript.log"
    pos_file "/var/log/fluentd/startupscript.log.pos"
    tag "system.startupscript"
    <parse>
      @type syslog
    </parse>
  </source>
  <source>
    @type tail
    format /^time="(?<time>[^)]*)" level=(?<severity>[^ ]*) msg="(?<message>[^"]*)"( err="(?<error>[^"]*)")?( statusCode=($<status_code>\d+))?/
    path "/var/log/docker.log"
    pos_file "/var/log/fluentd/docker.log.pos"
    tag "system.docker"
    <parse>
      @type regexp
      expression ^time="(?<time>[^)]*)" level=(?<severity>[^ ]*) msg="(?<message>[^"]*)"( err="(?<error>[^"]*)")?( statusCode=($<status_code>\d+))?
    </parse>
  </source>
  <source>
    @type tail
    format none
    path "/var/log/etcd.log"
    pos_file "/var/log/fluentd/etcd.log.pos"
    tag "system.etcd"
    <parse>
      @type none
    </parse>
  </source>
  <source>
    @type tail
    format multiline
    multiline_flush_interval 5s
    format_firstline /^\w\d{4}/
    format1 /^(?<severity>\w)(?<time>\d{4} [^\s]*)\s+(?<pid>\d+)\s+(?<source>[^ \]]+)\] (?<message>.*)/
    time_format %m%d %H:%M:%S.%N
    path "/var/log/kubelet.log"
    pos_file "/var/log/fluentd/kubelet.log.pos"
    tag "system.kubelet"
    <parse>
      time_format %m%d %H:%M:%S.%N
      format_firstline /^\w\d{4}/
      @type multiline
      format1 /^(?<severity>\w)(?<time>\d{4} [^\s]*)\s+(?<pid>\d+)\s+(?<source>[^ \]]+)\] (?<message>.*)/
    </parse>
  </source>
  <source>
    @type tail
    format multiline
    multiline_flush_interval 5s
    format_firstline /^\w\d{4}/
    format1 /^(?<severity>\w)(?<time>\d{4} [^\s]*)\s+(?<pid>\d+)\s+(?<source>[^ \]]+)\] (?<message>.*)/
    time_format %m%d %H:%M:%S.%N
    path "/var/log/kube-proxy.log"
    pos_file "/var/log/fluentd/kube-proxy.log.pos"
    tag "system.kube-proxy"
    <parse>
      time_format %m%d %H:%M:%S.%N
      format_firstline /^\w\d{4}/
      @type multiline
      format1 /^(?<severity>\w)(?<time>\d{4} [^\s]*)\s+(?<pid>\d+)\s+(?<source>[^ \]]+)\] (?<message>.*)/
    </parse>
  </source>
  <source>
    @type tail
    format multiline
    multiline_flush_interval 5s
    format_firstline /^\w\d{4}/
    format1 /^(?<severity>\w)(?<time>\d{4} [^\s]*)\s+(?<pid>\d+)\s+(?<source>[^ \]]+)\] (?<message>.*)/
    time_format %m%d %H:%M:%S.%N
    path "/var/log/kube-apiserver.log"
    pos_file "/var/log/fluentd/kube-apiserver.log.pos"
    tag "system.kube-apiserver"
    <parse>
      time_format %m%d %H:%M:%S.%N
      format_firstline /^\w\d{4}/
      @type multiline
      format1 /^(?<severity>\w)(?<time>\d{4} [^\s]*)\s+(?<pid>\d+)\s+(?<source>[^ \]]+)\] (?<message>.*)/
    </parse>
  </source>
  <source>
    @type tail
    format multiline
    multiline_flush_interval 5s
    format_firstline /^\w\d{4}/
    format1 /^(?<severity>\w)(?<time>\d{4} [^\s]*)\s+(?<pid>\d+)\s+(?<source>[^ \]]+)\] (?<message>.*)/
    time_format %m%d %H:%M:%S.%N
    path "/var/log/kube-controller-manager.log"
    pos_file "/var/log/fluentd/kube-controller-manager.log.pos"
    tag "system.kube-controller-manager"
    <parse>
      time_format %m%d %H:%M:%S.%N
      format_firstline /^\w\d{4}/
      @type multiline
      format1 /^(?<severity>\w)(?<time>\d{4} [^\s]*)\s+(?<pid>\d+)\s+(?<source>[^ \]]+)\] (?<message>.*)/
    </parse>
  </source>
  <source>
    @type tail
    format multiline
    multiline_flush_interval 5s
    format_firstline /^\w\d{4}/
    format1 /^(?<severity>\w)(?<time>\d{4} [^\s]*)\s+(?<pid>\d+)\s+(?<source>[^ \]]+)\] (?<message>.*)/
    time_format %m%d %H:%M:%S.%N
    path "/var/log/kube-scheduler.log"
    pos_file "/var/log/fluentd/kube-scheduler.log.pos"
    tag "system.kube-scheduler"
    <parse>
      time_format %m%d %H:%M:%S.%N
      format_firstline /^\w\d{4}/
      @type multiline
      format1 /^(?<severity>\w)(?<time>\d{4} [^\s]*)\s+(?<pid>\d+)\s+(?<source>[^ \]]+)\] (?<message>.*)/
    </parse>
  </source>
  <source>
    @type tail
    format multiline
    multiline_flush_interval 5s
    format_firstline /^\w\d{4}/
    format1 /^(?<severity>\w)(?<time>\d{4} [^\s]*)\s+(?<pid>\d+)\s+(?<source>[^ \]]+)\] (?<message>.*)/
    time_format %m%d %H:%M:%S.%N
    path "/var/log/rescheduler.log"
    pos_file "/var/log/fluentd/rescheduler.log.pos"
    tag "system.rescheduler"
    <parse>
      time_format %m%d %H:%M:%S.%N
      format_firstline /^\w\d{4}/
      @type multiline
      format1 /^(?<severity>\w)(?<time>\d{4} [^\s]*)\s+(?<pid>\d+)\s+(?<source>[^ \]]+)\] (?<message>.*)/
    </parse>
  </source>
  <source>
    @type tail
    format multiline
    multiline_flush_interval 5s
    format_firstline /^\w\d{4}/
    format1 /^(?<severity>\w)(?<time>\d{4} [^\s]*)\s+(?<pid>\d+)\s+(?<source>[^ \]]+)\] (?<message>.*)/
    time_format %m%d %H:%M:%S.%N
    path "/var/log/glbc.log"
    pos_file "/var/log/fluentd/glbc.log.pos"
    tag "system.glbc"
    <parse>
      time_format %m%d %H:%M:%S.%N
      format_firstline /^\w\d{4}/
      @type multiline
      format1 /^(?<severity>\w)(?<time>\d{4} [^\s]*)\s+(?<pid>\d+)\s+(?<source>[^ \]]+)\] (?<message>.*)/
    </parse>
  </source>
  <source>
    @type tail
    format multiline
    multiline_flush_interval 5s
    format_firstline /^\w\d{4}/
    format1 /^(?<severity>\w)(?<time>\d{4} [^\s]*)\s+(?<pid>\d+)\s+(?<source>[^ \]]+)\] (?<message>.*)/
    time_format %m%d %H:%M:%S.%N
    path "/var/log/cluster-autoscaler.log"
    pos_file "/var/log/fluentd/cluster-autoscaler.log.pos"
    tag "system.cluster-autoscaler"
    <parse>
      time_format %m%d %H:%M:%S.%N
      format_firstline /^\w\d{4}/
      @type multiline
      format1 /^(?<severity>\w)(?<time>\d{4} [^\s]*)\s+(?<pid>\d+)\s+(?<source>[^ \]]+)\] (?<message>.*)/
    </parse>
  </source>
  <source>
    @id systemd-cri
    @type systemd
    matches [{"_SYSTEMD_UNIT":"docker.service"}]
    read_from_head true
    tag "system.container-runtime"
    <storage>
      @type "local"
      persistent false
      path "/var/log/fluentd/journald-container-runtime.pos"
    </storage>
    <entry>
      fields_lowercase true
      fields_strip_underscores true
    </entry>
  </source>
  <source>
    @id systemd-kubelet
    @type systemd
    matches [{"_SYSTEMD_UNIT":"kubelet.service"}]
    read_from_head true
    tag "system.kubelet"
    <storage>
      @type "local"
      persistent false
      path "/var/log/fluentd/journald-kubelet.pos"
    </storage>
    <entry>
      fields_lowercase true
      fields_strip_underscores true
    </entry>
  </source>
  <source>
    @id systemd-npd
    @type systemd
    matches [{"_SYSTEMD_UNIT":"node-problem-detector.service"}]
    read_from_head true
    tag "system.node-problem-detector"
    <storage>
      @type "local"
      persistent false
      path "/var/log/fluentd/journald-node-problem-detector.pos"
    </storage>
    <entry>
      fields_lowercase true
      fields_strip_underscores true
    </entry>
  </source>
  <filter **>
    @type record_modifier
    char_encoding "utf-8"
  </filter>
  <filter system.**>
    @type record_modifier
    remove_keys "cmdline,hostname,boot_id,machine_id,systemd_invocation_id"
    <record>
      kube_cluster tier2
      host gke-tier2-us-east1-cluster-default-1-2d973341-gvhb
      location gcp-us-east1
      source ${tag_parts[1]}
      sourcetype fluentd
    </record>
  </filter>
  <match system.**>
    @type forward
    @id forward
    compress gzip
    ignore_network_errors_at_startup true
    keepalive true
    keepalive_timeout 60s
    heartbeat_type udp
    <server>
      host "fluentd-aggregator"
      port 24224
    </server>
    <server>
      host "fluentd-failover-publisher"
      port 24224
      standby
    </server>
    <buffer>
      @type "file"
      path "/var/log/fluentd/fluentd/kubernetes.system-forwarder.buffer"
      chunk_limit_size 1M
      total_limit_size 10240M
      flush_interval 5s
      retry_max_interval 30
      retry_forever true
      flush_thread_count 4
      overflow_action block
      compress gzip
    </buffer>
  </match>
</ROOT>

Your Error Log

2019-09-20 15:09:12 +0000 [warn]: #0 [forward] detached forwarding server '' host="fluentd-failover-publisher" port=24224 phi=16.049416432170446 phi_threshold=16
2019-09-20 15:10:48 +0000 [warn]: #0 [forward] got unrecoverable error in primary and no secondary error_class=NoMethodError error="undefined method `empty?' for nil:NilClass"
  2019-09-20 15:10:48 +0000 [warn]: #0 /app/vendor/bundle/ruby/2.6.0/bundler/gems/fluentd-dc20204dc9fc/lib/fluent/plugin/out_forward/socket_cache.rb:116:in `pick_socket'
  2019-09-20 15:10:48 +0000 [warn]: #0 /app/vendor/bundle/ruby/2.6.0/bundler/gems/fluentd-dc20204dc9fc/lib/fluent/plugin/out_forward/socket_cache.rb:35:in `block in checkout_or'
  2019-09-20 15:10:48 +0000 [warn]: #0 /app/vendor/bundle/ruby/2.6.0/bundler/gems/fluentd-dc20204dc9fc/lib/fluent/plugin/out_forward/socket_cache.rb:34:in `synchronize'
  2019-09-20 15:10:48 +0000 [warn]: #0 /app/vendor/bundle/ruby/2.6.0/bundler/gems/fluentd-dc20204dc9fc/lib/fluent/plugin/out_forward/socket_cache.rb:34:in `checkout_or'
  2019-09-20 15:10:48 +0000 [warn]: #0 /app/vendor/bundle/ruby/2.6.0/bundler/gems/fluentd-dc20204dc9fc/lib/fluent/plugin/out_forward/connection_manager.rb:85:in `connect_keepalive'
  2019-09-20 15:10:48 +0000 [warn]: #0 /app/vendor/bundle/ruby/2.6.0/bundler/gems/fluentd-dc20204dc9fc/lib/fluent/plugin/out_forward/connection_manager.rb:42:in `connect'
  2019-09-20 15:10:48 +0000 [warn]: #0 /app/vendor/bundle/ruby/2.6.0/bundler/gems/fluentd-dc20204dc9fc/lib/fluent/plugin/out_forward.rb:732:in `connect'
  2019-09-20 15:10:48 +0000 [warn]: #0 /app/vendor/bundle/ruby/2.6.0/bundler/gems/fluentd-dc20204dc9fc/lib/fluent/plugin/out_forward.rb:606:in `send_data'
  2019-09-20 15:10:48 +0000 [warn]: #0 /app/vendor/bundle/ruby/2.6.0/bundler/gems/fluentd-dc20204dc9fc/lib/fluent/plugin/out_forward.rb:336:in `block in write'
  2019-09-20 15:10:48 +0000 [warn]: #0 /app/vendor/bundle/ruby/2.6.0/bundler/gems/fluentd-dc20204dc9fc/lib/fluent/plugin/out_forward/load_balancer.rb:46:in `block in select_healthy_node'
  2019-09-20 15:10:48 +0000 [warn]: #0 /app/vendor/bundle/ruby/2.6.0/bundler/gems/fluentd-dc20204dc9fc/lib/fluent/plugin/out_forward/load_balancer.rb:37:in `times'
  2019-09-20 15:10:48 +0000 [warn]: #0 /app/vendor/bundle/ruby/2.6.0/bundler/gems/fluentd-dc20204dc9fc/lib/fluent/plugin/out_forward/load_balancer.rb:37:in `select_healthy_node'
  2019-09-20 15:10:48 +0000 [warn]: #0 /app/vendor/bundle/ruby/2.6.0/bundler/gems/fluentd-dc20204dc9fc/lib/fluent/plugin/out_forward.rb:336:in `write'
  2019-09-20 15:10:48 +0000 [warn]: #0 /app/vendor/bundle/ruby/2.6.0/bundler/gems/fluentd-dc20204dc9fc/lib/fluent/plugin/output.rb:1125:in `try_flush'
  2019-09-20 15:10:48 +0000 [warn]: #0 /app/vendor/bundle/ruby/2.6.0/bundler/gems/fluentd-dc20204dc9fc/lib/fluent/plugin/output.rb:1431:in `flush_thread_run'
  2019-09-20 15:10:48 +0000 [warn]: #0 /app/vendor/bundle/ruby/2.6.0/bundler/gems/fluentd-dc20204dc9fc/lib/fluent/plugin/output.rb:461:in `block (2 levels) in start'
  2019-09-20 15:10:48 +0000 [warn]: #0 /app/vendor/bundle/ruby/2.6.0/bundler/gems/fluentd-dc20204dc9fc/lib/fluent/plugin_helper/thread.rb:78:in `block in thread_create'
2019-09-20 15:10:48 +0000 [warn]: #0 [forward] bad chunk is moved to /tmp/fluent/backup/worker0/forward/592fd795b36962a1f96a893d2330a0e7.log
2019-09-20 15:10:58 +0000 [warn]: #0 [forward] got unrecoverable error in primary and no secondary error_class=NoMethodError error="undefined method `empty?' for nil:NilClass"
  2019-09-20 15:10:58 +0000 [warn]: #0 suppressed same stacktrace
2019-09-20 15:10:58 +0000 [warn]: #0 [forward] bad chunk is moved to /tmp/fluent/backup/worker0/forward/592fd79f3cebb05a54d945f8b57b26cc.log
2019-09-20 15:11:08 +0000 [warn]: #0 [forward] got unrecoverable error in primary and no secondary error_class=NoMethodError error="undefined method `empty?' for nil:NilClass"
  2019-09-20 15:11:08 +0000 [warn]: #0 suppressed same stacktrace
2019-09-20 15:11:08 +0000 [warn]: #0 [forward] bad chunk is moved to /tmp/fluent/backup/worker0/forward/592fd7a8c651134b99d70995594bd46e.log
2019-09-20 15:11:18 +0000 [warn]: #0 [forward] got unrecoverable error in primary and no secondary error_class=NoMethodError error="undefined method `empty?' for nil:NilClass"
  2019-09-20 15:11:18 +0000 [warn]: #0 suppressed same stacktrace
2019-09-20 15:11:18 +0000 [warn]: #0 [forward] bad chunk is moved to /tmp/fluent/backup/worker0/forward/592fd7b24fdc2878850e8b9b96e67862.log
2019-09-20 15:11:28 +0000 [warn]: #0 [forward] got unrecoverable error in primary and no secondary error_class=NoMethodError error="undefined method `empty?' for nil:NilClass"
  2019-09-20 15:11:28 +0000 [warn]: #0 suppressed same stacktrace
2019-09-20 15:11:28 +0000 [warn]: #0 [forward] bad chunk is moved to /tmp/fluent/backup/worker0/forward/592fd7bbd92ec73541629192d82eaa2e.log
2019-09-20 15:11:38 +0000 [warn]: #0 [forward] got unrecoverable error in primary and no secondary error_class=NoMethodError error="undefined method `empty?' for nil:NilClass"
  2019-09-20 15:11:38 +0000 [warn]: #0 /app/vendor/bundle/ruby/2.6.0/bundler/gems/fluentd-dc20204dc9fc/lib/fluent/plugin/out_forward/socket_cache.rb:116:in `pick_socket'
  2019-09-20 15:11:38 +0000 [warn]: #0 /app/vendor/bundle/ruby/2.6.0/bundler/gems/fluentd-dc20204dc9fc/lib/fluent/plugin/out_forward/socket_cache.rb:35:in `block in checkout_or'
  2019-09-20 15:11:38 +0000 [warn]: #0 /app/vendor/bundle/ruby/2.6.0/bundler/gems/fluentd-dc20204dc9fc/lib/fluent/plugin/out_forward/socket_cache.rb:34:in `synchronize'
  2019-09-20 15:11:38 +0000 [warn]: #0 /app/vendor/bundle/ruby/2.6.0/bundler/gems/fluentd-dc20204dc9fc/lib/fluent/plugin/out_forward/socket_cache.rb:34:in `checkout_or'
  2019-09-20 15:11:38 +0000 [warn]: #0 /app/vendor/bundle/ruby/2.6.0/bundler/gems/fluentd-dc20204dc9fc/lib/fluent/plugin/out_forward/connection_manager.rb:85:in `connect_keepalive'
  2019-09-20 15:11:38 +0000 [warn]: #0 /app/vendor/bundle/ruby/2.6.0/bundler/gems/fluentd-dc20204dc9fc/lib/fluent/plugin/out_forward/connection_manager.rb:42:in `connect'
  2019-09-20 15:11:38 +0000 [warn]: #0 /app/vendor/bundle/ruby/2.6.0/bundler/gems/fluentd-dc20204dc9fc/lib/fluent/plugin/out_forward.rb:732:in `connect'
  2019-09-20 15:11:38 +0000 [warn]: #0 /app/vendor/bundle/ruby/2.6.0/bundler/gems/fluentd-dc20204dc9fc/lib/fluent/plugin/out_forward.rb:606:in `send_data'
  2019-09-20 15:11:38 +0000 [warn]: #0 /app/vendor/bundle/ruby/2.6.0/bundler/gems/fluentd-dc20204dc9fc/lib/fluent/plugin/out_forward.rb:336:in `block in write'
  2019-09-20 15:11:38 +0000 [warn]: #0 /app/vendor/bundle/ruby/2.6.0/bundler/gems/fluentd-dc20204dc9fc/lib/fluent/plugin/out_forward/load_balancer.rb:46:in `block in select_healthy_node'
  2019-09-20 15:11:38 +0000 [warn]: #0 /app/vendor/bundle/ruby/2.6.0/bundler/gems/fluentd-dc20204dc9fc/lib/fluent/plugin/out_forward/load_balancer.rb:37:in `times'
  2019-09-20 15:11:38 +0000 [warn]: #0 /app/vendor/bundle/ruby/2.6.0/bundler/gems/fluentd-dc20204dc9fc/lib/fluent/plugin/out_forward/load_balancer.rb:37:in `select_healthy_node'
  2019-09-20 15:11:38 +0000 [warn]: #0 /app/vendor/bundle/ruby/2.6.0/bundler/gems/fluentd-dc20204dc9fc/lib/fluent/plugin/out_forward.rb:336:in `write'
  2019-09-20 15:11:38 +0000 [warn]: #0 /app/vendor/bundle/ruby/2.6.0/bundler/gems/fluentd-dc20204dc9fc/lib/fluent/plugin/output.rb:1125:in `try_flush'
  2019-09-20 15:11:38 +0000 [warn]: #0 /app/vendor/bundle/ruby/2.6.0/bundler/gems/fluentd-dc20204dc9fc/lib/fluent/plugin/output.rb:1431:in `flush_thread_run'
  2019-09-20 15:11:38 +0000 [warn]: #0 /app/vendor/bundle/ruby/2.6.0/bundler/gems/fluentd-dc20204dc9fc/lib/fluent/plugin/output.rb:461:in `block (2 levels) in start'
  2019-09-20 15:11:38 +0000 [warn]: #0 /app/vendor/bundle/ruby/2.6.0/bundler/gems/fluentd-dc20204dc9fc/lib/fluent/plugin_helper/thread.rb:78:in `block in thread_create'
2019-09-20 15:11:38 +0000 [warn]: #0 [forward] bad chunk is moved to /tmp/fluent/backup/worker0/forward/592fd7c563d25ac2a0c56a7971cb5d30.log
2019-09-20 15:11:39 +0000 [warn]: #0 [forward] detached forwarding server '' host="fluentd-aggregator" port=24224 hard_timeout=true
2019-09-20 15:11:48 +0000 [warn]: #0 [forward] failed to flush the buffer. retry_time=0 next_retry_seconds=2019-09-20 15:11:49 +0000 chunk="592fd7ceebe490e9e802ef0b16909e38" error_class=Fluent::Plugin::ForwardOutput::NoNodesAvailable error="no nodes are available"
  2019-09-20 15:11:48 +0000 [warn]: #0 /app/vendor/bundle/ruby/2.6.0/bundler/gems/fluentd-dc20204dc9fc/lib/fluent/plugin/out_forward/load_balancer.rb:55:in `select_healthy_node'
  2019-09-20 15:11:48 +0000 [warn]: #0 /app/vendor/bundle/ruby/2.6.0/bundler/gems/fluentd-dc20204dc9fc/lib/fluent/plugin/out_forward.rb:336:in `write'
  2019-09-20 15:11:48 +0000 [warn]: #0 /app/vendor/bundle/ruby/2.6.0/bundler/gems/fluentd-dc20204dc9fc/lib/fluent/plugin/output.rb:1125:in `try_flush'
  2019-09-20 15:11:48 +0000 [warn]: #0 /app/vendor/bundle/ruby/2.6.0/bundler/gems/fluentd-dc20204dc9fc/lib/fluent/plugin/output.rb:1431:in `flush_thread_run'
  2019-09-20 15:11:48 +0000 [warn]: #0 /app/vendor/bundle/ruby/2.6.0/bundler/gems/fluentd-dc20204dc9fc/lib/fluent/plugin/output.rb:461:in `block (2 levels) in start'
  2019-09-20 15:11:48 +0000 [warn]: #0 /app/vendor/bundle/ruby/2.6.0/bundler/gems/fluentd-dc20204dc9fc/lib/fluent/plugin_helper/thread.rb:78:in `block in thread_create'
2019-09-20 15:11:49 +0000 [warn]: #0 [forward] failed to flush the buffer. retry_time=1 next_retry_seconds=2019-09-20 15:11:51 +0000 chunk="592fd7ceebe490e9e802ef0b16909e38" error_class=Fluent::Plugin::ForwardOutput::NoNodesAvailable error="no nodes are available"
  2019-09-20 15:11:49 +0000 [warn]: #0 /app/vendor/bundle/ruby/2.6.0/bundler/gems/fluentd-dc20204dc9fc/lib/fluent/plugin/out_forward/load_balancer.rb:55:in `select_healthy_node'
  2019-09-20 15:11:49 +0000 [warn]: #0 /app/vendor/bundle/ruby/2.6.0/bundler/gems/fluentd-dc20204dc9fc/lib/fluent/plugin/out_forward.rb:336:in `write'
  2019-09-20 15:11:49 +0000 [warn]: #0 /app/vendor/bundle/ruby/2.6.0/bundler/gems/fluentd-dc20204dc9fc/lib/fluent/plugin/output.rb:1125:in `try_flush'
  2019-09-20 15:11:49 +0000 [warn]: #0 /app/vendor/bundle/ruby/2.6.0/bundler/gems/fluentd-dc20204dc9fc/lib/fluent/plugin/output.rb:1431:in `flush_thread_run'
  2019-09-20 15:11:49 +0000 [warn]: #0 /app/vendor/bundle/ruby/2.6.0/bundler/gems/fluentd-dc20204dc9fc/lib/fluent/plugin/output.rb:461:in `block (2 levels) in start'
  2019-09-20 15:11:49 +0000 [warn]: #0 /app/vendor/bundle/ruby/2.6.0/bundler/gems/fluentd-dc20204dc9fc/lib/fluent/plugin_helper/thread.rb:78:in `block in thread_create'
2019-09-20 15:11:51 +0000 [warn]: #0 [forward] failed to flush the buffer. retry_time=2 next_retry_seconds=2019-09-20 15:11:52 +0000 chunk="592fd7ceebe490e9e802ef0b16909e38" error_class=Fluent::Plugin::ForwardOutput::NoNodesAvailable error="no nodes are available"
  2019-09-20 15:11:51 +0000 [warn]: #0 suppressed same stacktrace
2019-09-20 15:11:52 +0000 [warn]: #0 [forward] failed to flush the buffer. retry_time=3 next_retry_seconds=2019-09-20 15:11:56 +0000 chunk="592fd7ceebe490e9e802ef0b16909e38" error_class=Fluent::Plugin::ForwardOutput::NoNodesAvailable error="no nodes are available"
  2019-09-20 15:11:52 +0000 [warn]: #0 /app/vendor/bundle/ruby/2.6.0/bundler/gems/fluentd-dc20204dc9fc/lib/fluent/plugin/out_forward/load_balancer.rb:55:in `select_healthy_node'
  2019-09-20 15:11:52 +0000 [warn]: #0 /app/vendor/bundle/ruby/2.6.0/bundler/gems/fluentd-dc20204dc9fc/lib/fluent/plugin/out_forward.rb:336:in `write'
  2019-09-20 15:11:52 +0000 [warn]: #0 /app/vendor/bundle/ruby/2.6.0/bundler/gems/fluentd-dc20204dc9fc/lib/fluent/plugin/output.rb:1125:in `try_flush'
  2019-09-20 15:11:52 +0000 [warn]: #0 /app/vendor/bundle/ruby/2.6.0/bundler/gems/fluentd-dc20204dc9fc/lib/fluent/plugin/output.rb:1431:in `flush_thread_run'
  2019-09-20 15:11:52 +0000 [warn]: #0 /app/vendor/bundle/ruby/2.6.0/bundler/gems/fluentd-dc20204dc9fc/lib/fluent/plugin/output.rb:461:in `block (2 levels) in start'
  2019-09-20 15:11:52 +0000 [warn]: #0 /app/vendor/bundle/ruby/2.6.0/bundler/gems/fluentd-dc20204dc9fc/lib/fluent/plugin_helper/thread.rb:78:in `block in thread_create'
2019-09-20 15:11:56 +0000 [warn]: #0 [forward] failed to flush the buffer. retry_time=4 next_retry_seconds=2019-09-20 15:12:03 +0000 chunk="592fd7ceebe490e9e802ef0b16909e38" error_class=Fluent::Plugin::ForwardOutput::NoNodesAvailable error="no nodes are available"
  2019-09-20 15:11:56 +0000 [warn]: #0 suppressed same stacktrace
2019-09-20 15:12:03 +0000 [warn]: #0 [forward] failed to flush the buffer. retry_time=5 next_retry_seconds=2019-09-20 15:12:19 +0000 chunk="592fd7ceebe490e9e802ef0b16909e38" error_class=Fluent::Plugin::ForwardOutput::NoNodesAvailable error="no nodes are available"
  2019-09-20 15:12:03 +0000 [warn]: #0 /app/vendor/bundle/ruby/2.6.0/bundler/gems/fluentd-dc20204dc9fc/lib/fluent/plugin/out_forward/load_balancer.rb:55:in `select_healthy_node'
  2019-09-20 15:12:03 +0000 [warn]: #0 /app/vendor/bundle/ruby/2.6.0/bundler/gems/fluentd-dc20204dc9fc/lib/fluent/plugin/out_forward.rb:336:in `write'
  2019-09-20 15:12:03 +0000 [warn]: #0 /app/vendor/bundle/ruby/2.6.0/bundler/gems/fluentd-dc20204dc9fc/lib/fluent/plugin/output.rb:1125:in `try_flush'
  2019-09-20 15:12:03 +0000 [warn]: #0 /app/vendor/bundle/ruby/2.6.0/bundler/gems/fluentd-dc20204dc9fc/lib/fluent/plugin/output.rb:1431:in `flush_thread_run'
  2019-09-20 15:12:03 +0000 [warn]: #0 /app/vendor/bundle/ruby/2.6.0/bundler/gems/fluentd-dc20204dc9fc/lib/fluent/plugin/output.rb:461:in `block (2 levels) in start'
  2019-09-20 15:12:03 +0000 [warn]: #0 /app/vendor/bundle/ruby/2.6.0/bundler/gems/fluentd-dc20204dc9fc/lib/fluent/plugin_helper/thread.rb:78:in `block in thread_create'
2019-09-20 15:12:19 +0000 [warn]: #0 [forward] failed to flush the buffer. retry_time=6 next_retry_seconds=2019-09-20 15:12:51 +0000 chunk="592fd7ceebe490e9e802ef0b16909e38" error_class=Fluent::Plugin::ForwardOutput::NoNodesAvailable error="no nodes are available"
  2019-09-20 15:12:19 +0000 [warn]: #0 suppressed same stacktrace
2019-09-20 15:12:51 +0000 [warn]: #0 [forward] failed to flush the buffer. retry_time=7 next_retry_seconds=2019-09-20 15:13:19 +0000 chunk="592fd7ceebe490e9e802ef0b16909e38" error_class=Fluent::Plugin::ForwardOutput::NoNodesAvailable error="no nodes are available"
  2019-09-20 15:12:51 +0000 [warn]: #0 suppressed same stacktrace
2019-09-20 15:13:19 +0000 [warn]: #0 [forward] failed to flush the buffer. retry_time=8 next_retry_seconds=2019-09-20 15:13:52 +0000 chunk="592fd7ceebe490e9e802ef0b16909e38" error_class=Fluent::Plugin::ForwardOutput::NoNodesAvailable error="no nodes are available"
  2019-09-20 15:13:19 +0000 [warn]: #0 suppressed same stacktrace
2019-09-20 15:13:52 +0000 [warn]: #0 [forward] failed to flush the buffer. retry_time=9 next_retry_seconds=2019-09-20 15:14:21 +0000 chunk="592fd7ceebe490e9e802ef0b16909e38" error_class=Fluent::Plugin::ForwardOutput::NoNodesAvailable error="no nodes are available"
  2019-09-20 15:13:52 +0000 [warn]: #0 suppressed same stacktrace
2019-09-20 15:14:21 +0000 [warn]: #0 [forward] failed to flush the buffer. retry_time=10 next_retry_seconds=2019-09-20 15:14:51 +0000 chunk="592fd7ceebe490e9e802ef0b16909e38" error_class=Fluent::Plugin::ForwardOutput::NoNodesAvailable error="no nodes are available"

Additional context

1.7.1 will continue to fail and retry with error="no nodes are available".

1.6.3 pod logs - throws similar error but recovers and forwards logs fluentd-system-forwarder-wk46w.log
1.7.1 pod logs - never recovers
fluentd-system-forwarder-xh5tp.log

@stonith stonith added the bug Something isn't working label Sep 20, 2019
@repeatedly
Copy link
Member

We recently refactor the code of out_forward and it seems the cause.
We will check it.

@ganmacs
Copy link
Member

ganmacs commented Sep 30, 2019

Thank you for the report! #2635 should fix this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants