[Log input]Forever growing registry file with kubernetes autodiscovery #13140

marqc · 2019-08-01T10:02:04Z

When using kubernetes autodiscover provider registry file tends to grow in time leaving a lot of entries with TTL=-2. This entries are never removed from registry. eg.

sample config:

filebeat.autodiscover:
  providers:
    - type: kubernetes
      cleanup_timeout: 5m
      hints.enabled: true
      templates.config:
        - type: container
          paths:
            - "/var/lib/docker/containers/${data.kubernetes.container.id}/*-json.log"
          scan_frequency: 3s
          max_bytes: 1000000
          clean_removed: true

cat data.json | jq -r .[].ttl | sort | uniq -c
    660 -1
   2957 -2

When pods are stopped inputs are stopped/disabled and marked with TTL=-2, log files are often getting removed from disk after that (for example for jobs from cronjob it can keep stopped docker containers for long time), so with no active Input it won't be traced and won't be removed from registry.

For state to be removed from registry "states.Update" method must be called on it, but with autodiscovery pattern containing containerId no input will ever keep track of them and make them get removed from registry.

I think, that kubernetes autodiscovery should always remove state from registry when final cleanup_timeout "stop" event is send, because kubernetes will never re-run the same already stopped container (it always creates new one).

The text was updated successfully, but these errors were encountered:

jsoriano · 2019-08-13T14:08:21Z

Hi @marqc, thanks for the report, we are investigating the issue.

On the meantime, I think that your configuration is not doing what you expect. There are two ways of configuring filebeat autodiscover, one with hints, and another one with templates, mixing them is possible, but can lead to some unexpected behaviours. In your case you are enabling hints-based configuration with hints.enabled: true, and you are also trying to define a template.
In this case I think that the template is ignored, it should be defined as:

      templates:
        - config:
            type: container
            paths:
              - "/var/lib/docker/containers/${data.kubernetes.container.id}/*-json.log"
            scan_frequency: 3s
            max_bytes: 1000000
            clean_removed: true

BUT, this configuration will apply to all containers, as well as hints-based autodiscover, so you would have the default configuration of hints-based autodiscover, and this template working at the same time for any container.

If you want to override some options (like these scan_frequency, max_bytes...) while using hints-based autodiscover, you can do it overriding the default settings used with hints.default_config. Something like this:

filebeat.autodiscover:
  providers:
    - type: kubernetes
      cleanup_timeout: 5m
      hints.enabled: true
      hints.default_config:
        type: container
        paths:
          - "/var/lib/docker/containers/${data.kubernetes.container.id}/*-json.log"
        scan_frequency: 3s
        max_bytes: 1000000
        clean_removed: true

marqc · 2019-08-13T15:11:06Z

@jsoriano thanks, I have alreadydone that and overriding attributes works as expected. The original issue is not affected by this change. It still leaves entries in registry if log file is not deleted from disk in 5 minutes after container is stopped (crashed, pod evicted, job finished).

jsoriano · 2019-08-13T17:21:13Z

@marqc we can confirm that there is some issue cleaning the state of files that are not owned by any input. There is an ongoing effort to refactor filebeat registry that will probably help here.

In the meantime the only solution would be to stop filebeat and cleanup the registry file with some script.

silenceper · 2020-05-14T08:49:20Z

Is there the latest solution, about cleaning up the status in the registry？

boernd · 2020-11-05T13:20:13Z

FYI, leaking registry entries (in our case with > 15k registry entries) also caused filebeat to stall and rarely send any events. After cleanup the performance was ok again (version 7.9.3).

trnl · 2020-11-06T01:24:42Z

We suffer from the same issue as well.

Log delivery based on filebeat is not really stable.

jsoriano · 2020-11-06T09:57:22Z

@trnl is this also happening to you with 7.9?

@trnl @boernd approximately, how many files are you collecting at a given moment?

trnl · 2020-11-09T12:04:16Z

@jsoriano 7.9.2

We have entries in registry form May 2020, however containers and related folders gone from the system quite long time ago.

boernd · 2020-11-09T14:23:18Z

@trnl is this also happening to you with 7.9?

@trnl @boernd approximately, how many files are you collecting at a given moment?

@jsoriano Hard to tell, Kibana tells me ~2k unique log.file.path for the last couple of minutes. We have 212 pods runnng atm, so roughly 10 * 5 (docker logs including the rotated ones) logs per node.

The following screen shows the registry growing averaged per filebeat:

The drop in the graph is where I did a manual cleanup of some pods.

hukaixuan · 2021-04-26T03:42:29Z

Any updates of this issue? We meet the same issue here:

we have about 200 containers of a k8s node and use filebeat to collect their log, but the registry file is really big(up to 20M, ~50k lines),cause the performance of filebeat is not stable

filebeat performance:

filebeat version: 7.11.2
configuration of input part:

filebeat.autodiscover:
  providers:
    - type: kubernetes
      host: ${NODE_NAME}
      hints.enabled: true
      hints.default_config:
        type: container
        paths:
          - /var/log/containers/*${data.kubernetes.container.id}.log

filebeat.registry.flush: 10s

By the way, could anyone explain why the registry file size affect the performance of filebeat so much?

elasticmachine · 2021-04-26T14:07:02Z

Pinging @elastic/agent (Team:Agent)

hukaixuan · 2021-04-27T12:06:01Z

After reading the code about registry, found the reason of "why the registry file size affect the performance of filebeat so much?"
I found that the update of memory states and write to registry file are in the same select block，so they cannot execute in parallel.
But looks like the write registry file method commitStateUpdates is safe to parallel processing with r.onEvents(states) (since gcStates with locking states and the following operation is doing with a copy of states).
So I move commitStateUpdates to an independent goroutine.

And looks the performance of filebeat is better and stable:

I want to known if it is all right to doing this change or there will be some problems?

alexandervasylev · 2021-08-04T09:51:32Z

Did someone find a solution without changing source code? We've faced with the same problem within a Filebeat in a Kubernetes cluster.

exekias · 2021-08-13T09:05:42Z

We have been working on a new input that may help solving this issue, as it is able to cleanup registry entries that are no longer used, I've created an issue to test and validate the approach: elastic/integrations#1526

srhb · 2022-01-19T07:44:06Z

Any news on mitigations here?

elasticmachine · 2022-01-19T08:10:25Z

Pinging @elastic/elastic-agent-data-plane (Team:Elastic-Agent-Data-Plane)

stephan-erb-by · 2022-04-01T08:26:02Z

I think the question is if we can the hints-based autodiscover working together with the new filestream input. Has anyone attempted this, yet?

faec · 2022-04-04T17:57:29Z

It looks to me like @MichaelKatsoulis checked in the switch to the filestream input in elastic/integrations#2139, so this might already be done?

stephan-erb-by · 2022-04-05T08:08:38Z

I think the integrations are not using or supporting hints.enabled, but I might be mistaken. So the new integrations would fix it for the autodiscovert, but not all usecases supported by the old mechanism.

fdartayre · 2022-07-01T07:23:03Z

As the issue comes from the log input, suggested workaround is to use a filestream input instead (GA since 7.14):

filebeat.autodiscover:
  providers:
    - type: kubernetes
      cleanup_timeout: 5m
      hints.enabled: true
      hints.default_config:
        type: filestream
        id: "my-id-${data.kubernetes.container.id}"
        paths:
          - "/var/lib/docker/containers/${data.kubernetes.container.id}/*-json.log"
        scan_frequency: 3s
        message_max_bytes: 1000000
        clean_removed: true
        parsers:
        - container: ~

Note: without a dynamic id ( id: "my-id-${data.kubernetes.container.id}"), the provider would auto generate an id, which could lead to duplicated data (#31239).

stephan-erb-by · 2022-07-01T07:57:35Z

thanks @fdartayre!

We are a heavy user of Kubernetes POD annotations to configure the logs input, such as

  co.elastic.logs.mycontainer/json.add_error_key": "true"
  co.elastic.logs.mycontainer/json.keys_under_root": "true"
  co.elastic.logs.mycontainer/json.message_key": "message"
  co.elastic.logs.mycontainer/json.ignore_decoding_error": "true"
  co.elastic.logs.mycontainer/json.expand_keys": "true"

or

  co.elastic.logs.myothercontainer/multiline.type": "pattern"
  co.elastic.logs.myothercontainer/multiline.pattern": "^[[:space:]]"
  co.elastic.logs.myothercontainer/multiline.negate": "false"
  co.elastic.logs.myothercontainer/multiline.match": "after"

To my knowledge this will not work correctly with the new filestream input. Or should this still work?

Iatbzh · 2022-07-26T00:47:35Z

Has this problem been solved? I still have this problem in filebeat7.9.2

jsoriano · 2022-07-26T10:17:12Z

Has this problem been solved? I still have this problem in filebeat7.9.2

Have you tried to upgrade to a more recent version? As mentioned in #13140 (comment) you may try to use the filestream input to mitigate this issue.

Iatbzh · 2022-07-26T13:07:03Z

这个问题解决了吗？我在filebeat7.9.2中仍然有这个问题

您是否尝试过升级到更新的版本？如#13140（评论）中所述，您可以尝试使用filestream输入来缓解此问题。

Thank you for resolving

asazallesmilner · 2022-11-28T16:39:34Z

Is this a validated and functional code to run filestream with autodiscover and hints? #13140 (comment)

filebeat.autodiscover:
providers:
- type: kubernetes
cleanup_timeout: 5m
hints.enabled: true
hints.default_config:
type: filestream
id: "my-id-${data.kubernetes.container.id}"
paths:
- "/var/lib/docker/containers/${data.kubernetes.container.id}/*-json.log"
scan_frequency: 3s
message_max_bytes: 1000000
clean_removed: true
parsers:
- container: ~

eedugon · 2023-01-02T13:19:13Z

@jsoriano , @fdartayre : why do we use and suggest scan_frequency input option that is from log legacy input instead of prospector.scanner.check_interval option that is the one documented in filestream input? Are both valid?

asazallesmilner · 2023-02-14T16:43:08Z

Want to put a note here for what we found.
Filestream is currently INCOMPATIBLE with hints based annotations. This means all of the hints based annotations our users were using broke when we went to Filestream and we are having to roll back.

bigpigeon · 2023-03-24T02:19:15Z

I fix this issue with pr #34904
just setting filebeat.yaml similar below

filebeat.autodiscover:
  providers:
    - type: kubernetes
      cleanup_timeout: 5m
      templates.config:
        - type: container
          paths:
            - "/var/lib/docker/containers/${data.kubernetes.container.id}/*-json.log"
          close_removed: true
          clean_removed: true

botelastic · 2024-03-27T09:02:55Z

Hi!
We just realized that we haven't looked into this issue in a while. We're sorry!

We're labeling this issue as Stale to make it hit our filters and make sure we get back to it as soon as possible. In the meantime, it'd be extremely helpful if you could take a look at it as well and confirm its relevance. A simple comment with a nice emoji will be enough :+1.
Thank you for your contribution!

rsafonseca · 2024-11-21T22:36:07Z

Can this be re-opened? It is still an issue

jsoriano added Team:Integrations Label for the Integrations team bug containers Related to containers use case libbeat labels Aug 8, 2019

andresrc added the [zube]: Investigate label Aug 12, 2019

andresrc assigned jsoriano Aug 12, 2019

jsoriano removed Team:Integrations Label for the Integrations team [zube]: Investigate labels Aug 13, 2019

jsoriano assigned urso and unassigned jsoriano Aug 13, 2019

jsoriano added Filebeat Filebeat and removed libbeat labels Aug 13, 2019

jsoriano added the Team:Elastic-Agent Label for the Agent team label Apr 26, 2021

exekias mentioned this issue Aug 13, 2021

Investigate switching to filestream input for container logs elastic/integrations#1526

Closed

ruflin added the Team:Elastic-Agent-Data-Plane Label for the Agent Data Plane team label Jan 19, 2022

jlind23 added the 8.4-candidate label Apr 1, 2022

jlind23 added v8.4.0 and removed 8.4-candidate labels May 24, 2022

jlind23 unassigned urso May 24, 2022

jlind23 added 8.5-candidate and removed v8.4.0 labels May 24, 2022

jlind23 removed the 8.5-candidate label Jul 8, 2022

jlind23 changed the title ~~Forever growing registry file with kubernetes autodiscovery~~ [Log input]Forever growing registry file with kubernetes autodiscovery Jul 8, 2022

gsantoro mentioned this issue Jan 24, 2023

Replace container input with filestream input in hints' default config #34354

Closed

bigpigeon mentioned this issue Mar 23, 2023

fix Forever growing registry file with kubernetes autodiscovery when setting clean_removed #34904

Open

botelastic bot added the Stalled label Mar 27, 2024

botelastic bot closed this as completed Sep 23, 2024

rsafonseca mentioned this issue Nov 22, 2024

Cleanup states from registrar when the files are removed #41747

Open

6 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Log input]Forever growing registry file with kubernetes autodiscovery #13140

[Log input]Forever growing registry file with kubernetes autodiscovery #13140

marqc commented Aug 1, 2019 •

edited

Loading

jsoriano commented Aug 13, 2019

marqc commented Aug 13, 2019

jsoriano commented Aug 13, 2019

silenceper commented May 14, 2020

boernd commented Nov 5, 2020 •

edited

Loading

trnl commented Nov 6, 2020

jsoriano commented Nov 6, 2020

trnl commented Nov 9, 2020

boernd commented Nov 9, 2020

hukaixuan commented Apr 26, 2021

elasticmachine commented Apr 26, 2021

hukaixuan commented Apr 27, 2021 •

edited

Loading

alexandervasylev commented Aug 4, 2021

exekias commented Aug 13, 2021

srhb commented Jan 19, 2022

elasticmachine commented Jan 19, 2022

stephan-erb-by commented Apr 1, 2022 •

edited

Loading

faec commented Apr 4, 2022

stephan-erb-by commented Apr 5, 2022

fdartayre commented Jul 1, 2022

stephan-erb-by commented Jul 1, 2022 •

edited

Loading

Iatbzh commented Jul 26, 2022

jsoriano commented Jul 26, 2022

Iatbzh commented Jul 26, 2022 •

edited

Loading

asazallesmilner commented Nov 28, 2022

eedugon commented Jan 2, 2023

asazallesmilner commented Feb 14, 2023

bigpigeon commented Mar 24, 2023 •

edited

Loading

botelastic bot commented Mar 27, 2024

rsafonseca commented Nov 21, 2024

[Log input]Forever growing registry file with kubernetes autodiscovery #13140

[Log input]Forever growing registry file with kubernetes autodiscovery #13140

Comments

marqc commented Aug 1, 2019 • edited Loading

jsoriano commented Aug 13, 2019

marqc commented Aug 13, 2019

jsoriano commented Aug 13, 2019

silenceper commented May 14, 2020

boernd commented Nov 5, 2020 • edited Loading

trnl commented Nov 6, 2020

jsoriano commented Nov 6, 2020

trnl commented Nov 9, 2020

boernd commented Nov 9, 2020

hukaixuan commented Apr 26, 2021

elasticmachine commented Apr 26, 2021

hukaixuan commented Apr 27, 2021 • edited Loading

alexandervasylev commented Aug 4, 2021

exekias commented Aug 13, 2021

srhb commented Jan 19, 2022

elasticmachine commented Jan 19, 2022

stephan-erb-by commented Apr 1, 2022 • edited Loading

faec commented Apr 4, 2022

stephan-erb-by commented Apr 5, 2022

fdartayre commented Jul 1, 2022

stephan-erb-by commented Jul 1, 2022 • edited Loading

Iatbzh commented Jul 26, 2022

jsoriano commented Jul 26, 2022

Iatbzh commented Jul 26, 2022 • edited Loading

asazallesmilner commented Nov 28, 2022

eedugon commented Jan 2, 2023

asazallesmilner commented Feb 14, 2023

bigpigeon commented Mar 24, 2023 • edited Loading

botelastic bot commented Mar 27, 2024

rsafonseca commented Nov 21, 2024

marqc commented Aug 1, 2019 •

edited

Loading

boernd commented Nov 5, 2020 •

edited

Loading

hukaixuan commented Apr 27, 2021 •

edited

Loading

stephan-erb-by commented Apr 1, 2022 •

edited

Loading

stephan-erb-by commented Jul 1, 2022 •

edited

Loading

Iatbzh commented Jul 26, 2022 •

edited

Loading

bigpigeon commented Mar 24, 2023 •

edited

Loading