Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fluentd error_class=NoMethodError error="undefined method `get' for nil:NilClass" on startup #3718

Closed
LeonD9 opened this issue Apr 18, 2022 · 4 comments · Fixed by #3755
Closed
Assignees
Labels
bug Something isn't working

Comments

@LeonD9
Copy link

LeonD9 commented Apr 18, 2022

Describe the bug

After fluentd rollout in k8s sometimes buffer is left with only .meta files which causes nil:nilclass errors

2022-04-16 19:25:26 +0000 [error]: #0 unexpected error on reading data host="10.1.7.177" port=42756 error_class=NoMethodError error="undefined method `get' for nil:NilClass"

from this point fluentd is running (doesn't crash) but doesn't receive any logs or sends any logs, and only shows errors.

To Reproduce

It happens after rollout on start of the pods (not all pods are affected)

Expected behavior

All pods starting normally

Your Environment

- Fluentd version: fluent/fluentd-kubernetes-daemonset:v1.14.6-debian-opensearch-1.0
Statefulset of 3 pods

Your Configuration

<system>
      rpc_endpoint 127.0.0.1:24444
      log_level info
      root_dir /var/log/fluentd
    </system>

    <source>
      @type forward
      port 24224
      bind 0.0.0.0
    </source>

    <source>
      @type http_healthcheck
      port 8888
      bind 0.0.0.0
    </source>

    <source>
      @type prometheus
      bind 0.0.0.0
      port 24231
      metrics_path /metrics
    </source>
    <source>
      @type prometheus_output_monitor
      interval 10
    <labels>
      hostname ${hostname}
    </labels>
    </source>

    <label @opensearch-main-logs>
      <match **>
        @type opensearch
        @id main-logs
        include_tag_key true
        time_key time
        scheme "#{ENV['FLUENT_OPENSEARCH_SCHEME'] || 'https'}"
        ssl_verify "#{ENV['FLUENT_OPENSEARCH_SSL_VERIFY'] || 'true'}"
        ssl_version "#{ENV['FLUENT_OPENSEARCH_SSL_VERSION'] || 'TLSv1_2'}"
        reload_connections "#{ENV['FLUENT_OPENSEARCH_RELOAD_CONNECTIONS'] || 'false'}"
        reconnect_on_error "#{ENV['FLUENT_OPENSEARCH_RECONNECT_ON_ERROR'] || 'true'}"
        reload_on_failure "#{ENV['FLUENT_OPENSEARCH_RELOAD_ON_FAILURE'] || 'true'}"
        log_os_400_reason "#{ENV['FLUENT_OPENSEARCH_LOG_ES_400_REASON'] || 'true'}"
        with_transporter_log false
        logstash_prefix 'k8s'
        logstash_dateformat "#{ENV['FLUENT_OPENSEARCH_LOGSTASH_DATEFORMAT'] || '%Y.%m.%d'}"
        logstash_format "#{ENV['FLUENT_OPENSEARCH_LOGSTASH_FORMAT'] || 'true'}"
        index_name 'k8s'
        target_index_key "#{ENV['FLUENT_OPENSEARCH_TARGET_INDEX_KEY'] || use_nil}"
        type_name 'fluentd'
        include_timestamp "#{ENV['FLUENT_OPENSEARCH_INCLUDE_TIMESTAMP'] || 'false'}"
        template_name "#{ENV['FLUENT_OPENSEARCH_TEMPLATE_NAME'] || use_nil}"
        template_file "#{ENV['FLUENT_OPENSEARCH_TEMPLATE_FILE'] || use_nil}"
        template_overwrite "#{ENV['FLUENT_OPENSEARCH_TEMPLATE_OVERWRITE'] || use_default}"
        request_timeout "#{ENV['FLUENT_OPENSEARCH_REQUEST_TIMEOUT'] || '500s'}"
        suppress_type_name "#{ENV['FLUENT_OPENSEARCH_SUPPRESS_TYPE_NAME'] || 'true'}"
        <endpoint>
          url "#{ENV['FLUENT_MAIN_OPENSEARCH_HOST']}"
          region "#{ENV['OPENSEARCH_REGION']}"
          assume_role_arn "#{ENV['ASSUMED_IAM_ROLE']}"
        </endpoint>
        <buffer>
          @type file
          flush_thread_count '4'
          flush_interval '10s'
          chunk_limit_size '10MB'
          chunk_limit_records '5000'
          total_limit_size '65GB'
          retry_max_interval '30'
          flush_mode 'interval'
          overflow_action 'drop_oldest_chunk'
        </buffer>
      </match>
    </label>

    <match kubernetes.**>
      @type relabel
      @label @opensearch-main-logs
    </match>

    <match kubernetes-nodes.**>
      @type relabel
      @label @opensearch-main-logs
    </match>

Your Error Log

2022-04-16 19:25:26 +0000 [trace]: #0 [main-logs] writing events into buffer instance=458700 metadata_size=1
2022-04-16 19:25:26 +0000 [debug]: #0 [main-logs] Created new chunk chunk_id="5dcd50bbcfeaf9b7d6fc14f983017cbb" metadata=#<struct Fluent::Plugin::Buffer::Metadata timekey=nil, tag="kubernetes.var.log.containers.test.log", variables=nil, seq=0>
2022-04-16 19:25:26 +0000 [trace]: #0 [main-logs] chunk /var/log/fluentd/worker0/main-logs/buffer/buffer.b5dcd50bbcfeaf9b7d6fc14f983017cbb.log size_added: 3499 new_size: 3499
2022-04-16 19:25:26 +0000 [warn]: #0 emit transaction failed: error_class=NoMethodError error="undefined method `+' for nil:NilClass" location="/fluentd/vendor/bundle/ruby/2.6.0/gems/fluentd-1.14.3/lib/fluent/plugin/output.rb:1341:in `submit_flush_once'" tag="kubernetes.var.log.containers.test.log"
  2022-04-16 19:25:26 +0000 [warn]: #0 /fluentd/vendor/bundle/ruby/2.6.0/gems/fluentd-1.14.3/lib/fluent/plugin/output.rb:1341:in `submit_flush_once'
  2022-04-16 19:25:26 +0000 [warn]: #0 /fluentd/vendor/bundle/ruby/2.6.0/gems/fluentd-1.14.3/lib/fluent/plugin/output.rb:853:in `emit_buffered'
  2022-04-16 19:25:26 +0000 [warn]: #0 /fluentd/vendor/bundle/ruby/2.6.0/gems/fluentd-1.14.3/lib/fluent/plugin/output.rb:830:in `emit_events'
  2022-04-16 19:25:26 +0000 [warn]: #0 /fluentd/vendor/bundle/ruby/2.6.0/gems/fluentd-1.14.3/lib/fluent/event_router.rb:115:in `emit_stream'
  2022-04-16 19:25:26 +0000 [warn]: #0 /fluentd/vendor/bundle/ruby/2.6.0/gems/fluentd-1.14.3/lib/fluent/plugin/out_relabel.rb:29:in `process'
  2022-04-16 19:25:26 +0000 [warn]: #0 /fluentd/vendor/bundle/ruby/2.6.0/gems/fluentd-1.14.3/lib/fluent/plugin/output.rb:839:in `emit_sync'
  2022-04-16 19:25:26 +0000 [warn]: #0 /fluentd/vendor/bundle/ruby/2.6.0/gems/fluentd-1.14.3/lib/fluent/event_router.rb:115:in `emit_stream'
  2022-04-16 19:25:26 +0000 [warn]: #0 /fluentd/vendor/bundle/ruby/2.6.0/gems/fluentd-1.14.3/lib/fluent/plugin/in_forward.rb:338:in `on_message'
  2022-04-16 19:25:26 +0000 [warn]: #0 /fluentd/vendor/bundle/ruby/2.6.0/gems/fluentd-1.14.3/lib/fluent/plugin/in_forward.rb:226:in `block in handle_connection'
  2022-04-16 19:25:26 +0000 [warn]: #0 /fluentd/vendor/bundle/ruby/2.6.0/gems/fluentd-1.14.3/lib/fluent/plugin/in_forward.rb:263:in `block (3 levels) in read_messages'
  2022-04-16 19:25:26 +0000 [warn]: #0 /fluentd/vendor/bundle/ruby/2.6.0/gems/fluentd-1.14.3/lib/fluent/plugin/in_forward.rb:262:in `feed_each'
  2022-04-16 19:25:26 +0000 [warn]: #0 /fluentd/vendor/bundle/ruby/2.6.0/gems/fluentd-1.14.3/lib/fluent/plugin/in_forward.rb:262:in `block (2 levels) in read_messages'
  2022-04-16 19:25:26 +0000 [warn]: #0 /fluentd/vendor/bundle/ruby/2.6.0/gems/fluentd-1.14.3/lib/fluent/plugin/in_forward.rb:271:in `block in read_messages'
  2022-04-16 19:25:26 +0000 [warn]: #0 /fluentd/vendor/bundle/ruby/2.6.0/gems/fluentd-1.14.3/lib/fluent/plugin_helper/server.rb:613:in `on_read_without_connection'
  2022-04-16 19:25:26 +0000 [warn]: #0 /fluentd/vendor/bundle/ruby/2.6.0/gems/cool.io-1.7.1/lib/cool.io/io.rb:123:in `on_readable'
  2022-04-16 19:25:26 +0000 [warn]: #0 /fluentd/vendor/bundle/ruby/2.6.0/gems/cool.io-1.7.1/lib/cool.io/io.rb:186:in `on_readable'
  2022-04-16 19:25:26 +0000 [warn]: #0 /fluentd/vendor/bundle/ruby/2.6.0/gems/cool.io-1.7.1/lib/cool.io/loop.rb:88:in `run_once'
  2022-04-16 19:25:26 +0000 [warn]: #0 /fluentd/vendor/bundle/ruby/2.6.0/gems/cool.io-1.7.1/lib/cool.io/loop.rb:88:in `run'
  2022-04-16 19:25:26 +0000 [warn]: #0 /fluentd/vendor/bundle/ruby/2.6.0/gems/fluentd-1.14.3/lib/fluent/plugin_helper/event_loop.rb:93:in `block in start'
  2022-04-16 19:25:26 +0000 [warn]: #0 /fluentd/vendor/bundle/ruby/2.6.0/gems/fluentd-1.14.3/lib/fluent/plugin_helper/thread.rb:78:in `block in thread_create'
2022-04-16 19:25:26 +0000 [error]: #0 unexpected error on reading data host="10.1.7.177" port=42756 error_class=NoMethodError error="undefined method `get' for nil:NilClass"


### Additional context

Terminating the affected pod fixes the issue.
@LeonD9 LeonD9 changed the title Fluentd error_class=NoMethodError error="undefined method `get' for nil:NilClass" caused by log meta files Fluentd error_class=NoMethodError error="undefined method `get' for nil:NilClass" on startup Apr 18, 2022
@ashie
Copy link
Member

ashie commented May 14, 2022

@output_flush_thread_current_position = (@output_flush_thread_current_position + 1) % @buffer_config.flush_thread_count

@output_flush_thread_current_position is used before initialize?

@output_flush_thread_current_position = 0

It should be initialized at initialize.

@ashie ashie added the bug Something isn't working label May 14, 2022
@ashie ashie self-assigned this May 16, 2022
@LeonD9
Copy link
Author

LeonD9 commented May 19, 2022

@ashie It seems that after k8s rollout it tries to start from the oldest meta files that don't have logs anymore and weren't deleted for example it looks like this in the buffer:

-rw-r--r-- 1 root root     210 Apr  6 12:48 buffer.q5dbfbc57c4f3db244ed64432683694be.log.meta
-rw-r--r-- 1 root root     216 Apr  6 12:48 buffer.q5dbfbc5d1e4ee2af973b17d9441f9a21.log.meta
-rw-r--r-- 1 root root     217 Apr  6 12:49 buffer.q5dbfbc644ab839ea34e784594ae11776.log.meta

after deleting them and restarting the pod the issue gets fixed.

ashie added a commit that referenced this issue May 24, 2022
Because `@output_flush_thread_current_position` isn't initialize before
starting flush threads, it might cause crash when flush threads start
to processing buffers before initializing it at main thread.

Fixes #3718

Signed-off-by: Takuro Ashie <[email protected]>
@ashie ashie moved this to Done in Fluentd Kanban May 24, 2022
@ashie
Copy link
Member

ashie commented Jun 13, 2022

It's fixed in master branch but not yet released.
We'll release it at end of this month as v1.15.0.
Docker images are released later than it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
Archived in project
Development

Successfully merging a pull request may close this issue.

2 participants