Skip to content

td-agent v3.7.1 ssl hostname does not match the server certificate #763

@toyaser

Description

@toyaser

(check apply)

Problem

We are using td-agent v3.7.1
This is using fluentd version 1.10.2 and fluent-plugin-elasticsearch v4.0.7

We have a 3 node local elasticsearch cluster setup where starting up the td-agent will continue to work for around 18-20 hours after which we start to see fluentd fail with the following error:

2020-05-26 17:37:17 +0000 [warn]: #0 failed to flush the buffer. retry_time=16 next_retry_seconds=2020-05-29 17:18:06 +0000 chunk="5a6904a4700dc751015bf6f7fb2e0bc1" error_class=Fluent::Plugin::ElasticsearchOutput::RecoverableRequestFailure error="could not push logs to Elasticsearch cluster ({:host=>\"elasticsearch.mydomain.io\", :port=>9200, :scheme=>\"https\", :user=>\"my_user\", :password=>\"obfuscated\"}): hostname \"10.0.0.1\" does not match the server certificate (OpenSSL::SSL::SSLError)"

2020-05-26 17:37:17 +0000 [warn]: #0 failed to flush the buffer. retry_time=17 next_retry_seconds=2020-05-27 02:35:12 +0000 chunk="5a690483041ba29bda96202b35491072" error_class=Fluent::Plugin::ElasticsearchOutput::RecoverableRequestFailure error="could not push logs to Elasticsearch cluster ({:host=>\"elasticsearch.mydomain.io\", :port=>9200, :scheme=>\"https\", :user=>\"my_user\", :password=>\"obfuscated\"}): hostname \"10.0.0.2\" does not match the server certificate (OpenSSL::SSL::SSLError)"

2020-05-26 17:37:17 +0000 [warn]: #0 failed to flush the buffer. retry_time=18 next_retry_seconds=2020-05-27 11:17:37 +0000 chunk="5a69048c8e1d158c8826c73a15f903b0" error_class=Fluent::Plugin::ElasticsearchOutput::RecoverableRequestFailure error="could not push logs to Elasticsearch cluster ({:host=>\"elasticsearch.mydomain.io\", :port=>9200, :scheme=>\"https\", :user=>\"my_user\", :password=>\"obfuscated\"}): hostname \"10.0.0.3\" does not match the server certificate (OpenSSL::SSL::SSLError)"

Steps to replicate

Leave td-agent running for long enough.

<source>
    @type tail
    @id in_tail_my_log_log
    path "D:/Logs/my-logs-*.log"
    pos_file "C:/opt/td-agent/etc/td-agent/pos_files/logs.pos"
    tag "my-logs.*"
    enable_watch_timer false
    enable_stat_watcher true
    read_from_head true
    <parse>
      @type "json"
    </parse>
  </source>
 <filter **>
    @type elasticsearch_genid
    hash_id_key _hash
 </filter>
 <match my-logs.**>
    @type copy
    <store>
      id_key _hash
      remove_keys _hash
      @type "elasticsearch"
      @log_level debug
      host "elasticsearch.mydomain.io"
      port 9200
      scheme https
      ssl_version TLSv1_2
      logstash_format true
      logstash_prefix "my-logs"
      logstash_dateformat "%Y.%m"
      include_tag_key true
      user "fluentd_user"
      password "XXXXX"
      type_name "_doc"
      tag_key "@log_name"
      <buffer>
        flush_thread_count 8
        flush_interval 5s
      </buffer>
    </store>
  </match>

Expected Behavior or What you need to ask

no need to restart fluentd

Using Fluentd and ES plugin versions

  • Fluentd or td-agent version: td-agent 3.1.1
  • Operating system: Windows Server 2016
  • Elasticsearch version: 6.2.0

Additional context

What is interesting, is that logs will be shipped consistently and then will suddenly stop working. Also to note we have 3 separate servers each shipping logs to the same elasticsearch cluster, and all 3 servers will eventually (around the same time) fail with the exact same reason.

A restart of the fluentd service gets rid of the issue, but any logs in the buffer are lost and manual recovery has to be done.

Another point to make is using a much older version of td-agent v3.1.1 which uses fluentd v1.0.2 and fluent-plugin-elasticsearch v2.4.0 works with no issues.

Using the old version of td-agent, we have been running for over a week with no issues.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions