Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Kubernetes container logs - duplicate logs found when using read_bytes_limit_per_second parameter #3464

Closed
doge95 opened this issue Jul 21, 2021 · 6 comments · Fixed by #3466
Assignees
Labels
bug Something isn't working

Comments

@doge95
Copy link

doge95 commented Jul 21, 2021

Describe the bug

Continue on #3434. I followed @ashie suggestion and did the stress test again on our EFK stack with read_bytes_limit_per_second parameter and Fluentd version v1.13.2. However, I found that logs are duplicated in Elasticsearch.

To Reproduce

Container Runtime Version: containerd://1.4
Kubelet logging configuration: --container-log-max-files=50 --container-log-max-size=100Mi

# echo "GET http://172.16.5.154" | vegeta attack -rate=10000 -duration=120s | tee results.bin | vegeta report
Requests      [total, rate, throughput]         1200000, 10000.01, 9999.99
Duration      [total, attack, wait]             2m0s, 2m0s, 289.975µs
Latencies     [min, mean, 50, 90, 95, 99, max]  119.123µs, 488.862µs, 280.767µs, 913.659µs, 1.335ms, 4.028ms, 61.799ms
Bytes In      [total, mean]                     0, 0.00
Bytes Out     [total, mean]                     0, 0.00
Success       [ratio]                           100.00%
Status Codes  [code:count]                      204:1200000
Error Set:

Expected behavior

Expect Kibana to also receive 1,200,000 logs. However, it receives 1,622,486 entires.
Screen Shot 2021-07-21 at 5 16 41 PM

Your Environment

- Fluentd version: 1.13.2
- Operating system: Debian GNU/Linux 10 (buster)
- Kernel version: 5.4.0-72-generic

Your Configuration

Fluentd Chart values.yaml:

dashboards:
  enabled: "false"
image:
  tag: v1.13.2-debian-elasticsearch7-1.0

resources:
  limits:
    cpu: 400m
    memory: 400Mi
  requests:
    cpu: 200m
    memory: 200Mi
plugins:
- fluent-plugin-elasticsearch

nodeSelector:
  node-role.kubernetes.io/efktest: "true"

service:
  type: "ClusterIP"
  annotations: {}
  ports:
    - name: "aggregator"
      protocol: TCP
      containerPort: 24224


volumes:
- name: varlogcontainers
  hostPath:
    path: /var/log/containers
- name: varlogpods
  hostPath:
    path: /var/log/pods
- name: fluentpos
  hostPath:
    path: /var/log/fluent/
- name: etcfluentd-main
  configMap:
    name: fluentd-main
    defaultMode: 0777
- name: etcfluentd-config
  configMap:
    name: fluentd-config
    defaultMode: 0777
volumeMounts:
- name: varlogcontainers
  mountPath: /var/log/containers
  readOnly: true
- name: varlogpods
  mountPath: /var/log/pods
  readOnly: true
- name: fluentpos
  mountPath: /var/log/fluent/
- name: etcfluentd-main
  mountPath: /etc/fluent
- name: etcfluentd-config
  mountPath: /etc/fluent/config.d/

## Fluentd configurations:
fileConfigs:
  00_global.conf: |-
    <system>
      log_level debug
    </system>
  01_sources.conf: |-
    <source>
      @type tail
      @id in_tail_container_logs
      path /var/log/containers/*.log
      pos_file /var/log/fluent/fluentd-containers.log.pos
      tag kubernetes.*
      exclude_path ["/var/log/containers/*_kube-system_*.log", "/var/log/containers/*_default_*.log", "/var/log/containers/*fluent*.log"]
      read_from_head true
      follow_inodes true
      refresh_interval 1
      read_bytes_limit_per_second 100000
      <parse>
        @type none
        message_key message
      </parse>
    </source>
    
    # Send the logs to the elasticsearch; one config for one namesapce
    <match kubernetes.** >
      @type elasticsearch_data_stream
      data_stream_name logs-efk-container
      hosts https://elasticsearch-client.default.svc.cluster.local:9200
      user elastic
      password xxxxxx
      ssl_verify false
      reload_connections false
      reconnect_on_error true
      reload_on_failure true
      log_es_400_reason false
      request_timeout 1000s
      include_timestamp true
      <buffer>
        @type memory
        flush_interval 3s
        retry_max_times 10
      </buffer>
    </match>
  02_filters.conf: |-
  03_dispatch.conf: |-
  04_outputs.conf: |-

Your Error Log

No error log

Additional context

When the log is rotated, 2 entries added to the pos file.

# ls -lrth
\total 167M
-rw-r----- 1 root root 103M Jul 21 16:47 0.log.20210721-164752
-rw-r----- 1 root root  47M Jul 21 16:48 0.log

Pos file:

# cat /var/log/fluent/fluentd-containers.log.pos
/var/log/containers/openresty-efk-stress-test-558464d485-wg7l5_efktest_openresty-7d2236f476b49cf828caec125558eb0724532481067814e85cf2c50b85754b3f.log	00000000009c0246	000000006a4316ab
/var/log/containers/openresty-efk-stress-test-558464d485-wg7l5_efktest_openresty-7d2236f476b49cf828caec125558eb0724532481067814e85cf2c50b85754b3f.log	00000000005fdfd8	000000006a4316ac
/var/log/containers/openresty-efk-stress-test-558464d485-wg7l5_efktest_openresty-7d2236f476b49cf828caec125558eb0724532481067814e85cf2c50b85754b3f.log	00000000005e3ffe	000000006a4316ac

One example of the duplicated log:

# grep 2021-07-21T16:47:52.958955667+08:00 *
0.log:2021-07-21T16:47:52.958955667+08:00 stdout F 172.24.0.69 - - [21/Jul/2021:16:47:52 +0800] "GET / HTTP/1.1" 204 0 "-" "Go-http-client/1.1"

The above log is found duplicated in Kibana. However, it only appears once in 0.log file.
Screen Shot 2021-07-21 at 4 49 01 PM

@doge95
Copy link
Author

doge95 commented Jul 21, 2021

# wc -l *
   422486 0.log
   777520 0.log.20210721-164752
  1200006 total

It looks like all the logs in 0.log file are duplicated.
777,520+422,486*2-6(nginx starting logs)=1,622,486, which matches the observed number of log entries in Kibana.

@doge95
Copy link
Author

doge95 commented Jul 21, 2021

I test both read_bytes_limit_per_second=100000, 500000. Found the same issue of duplicated logs.

ashie added a commit that referenced this issue Jul 24, 2021
There are 2 causes on this issue:

* Wrong inode is set to TailWatcher when follow_inode is true
* A key (TargetInfo) in @Tails isn't updated for a same path even if new
  one has different inode

Fix #3464

Signed-off-by: Takuro Ashie <[email protected]>
@ashie ashie added the bug Something isn't working label Jul 24, 2021
@ashie
Copy link
Member

ashie commented Jul 24, 2021

Thanks for your report.
I've confirmed the bug.
#3466 will fix it.

ashie added a commit that referenced this issue Jul 24, 2021
There are 2 causes on this issue:

* Wrong inode is set to TailWatcher when follow_inode is true
* A key (TargetInfo) in @Tails isn't updated for a same path even if new
  one has different inode

Fix #3464

Signed-off-by: Takuro Ashie <[email protected]>
@ashie ashie self-assigned this Jul 26, 2021
@doge95
Copy link
Author

doge95 commented Jul 26, 2021

Thanks for your report.
I've confirmed the bug.
#3466 will fix it.

Hey @ashie, thanks for the quick confirmation. May I ask when will it be released?

@kenhys
Copy link
Contributor

kenhys commented Jul 29, 2021

@doge95
Copy link
Author

doge95 commented Aug 2, 2021

@doge95 released as v1.13.3.

https://github.com/fluent/fluentd/blob/master/CHANGELOG.md#bug-fix

Hey @kenhys , thanks a lot for the quick fix! I have tested it out and not seeing this issue any more.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants