Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cleanup states from registrar when the files are removed #41747

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

rsafonseca
Copy link

@rsafonseca rsafonseca commented Nov 22, 2024

Proposed commit message

Cleanup states from registrar when the files are removed and no longer related to any input
Fixes unbounded registry file growth and increased memory usage on long running machines with a lot of input churn

Checklist

  • My code follows the style guidelines of this project
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • I have made corresponding change to the default configuration files
  • I have added tests that prove my fix is effective or that my feature works
  • I have added an entry in CHANGELOG.next.asciidoc or CHANGELOG-developer.next.asciidoc.

Related issues

Use cases

Ensures registry file doesn't keep growing forever which causes additional disk space usage, as well increased mem/cpu usage. The fact that the registry never gets cleaned up in a number of scenarios like running on a Kubernetes cluster, constitues a slow memory leak (memory usage increases over time with long running nodes, especially when there is a lot of pod/input churn)

Screenshots

image

@rsafonseca rsafonseca requested a review from a team as a code owner November 22, 2024 11:11
@botelastic botelastic bot added the needs_team Indicates that the issue/PR needs a Team:* label label Nov 22, 2024
Copy link
Contributor

mergify bot commented Nov 22, 2024

This pull request does not have a backport label.
If this is a bug or security fix, could you label this PR @rsafonseca? 🙏.
For such, you'll need to label your PR with:

  • The upcoming major version of the Elastic Stack
  • The upcoming minor version of the Elastic Stack (if you're not pushing a breaking change)

To fixup this pull request, you need to add the backport labels for the needed
branches, such as:

  • backport-8./d is the label to automatically backport to the 8./d branch. /d is the digit

Copy link
Contributor

mergify bot commented Nov 22, 2024

backport-8.x has been added to help with the transition to the new branch 8.x.
If you don't need it please use backport-skip label and remove the backport-8.x label.

@mergify mergify bot added the backport-8.x Automated backport to the 8.x branch with mergify label Nov 22, 2024
@rsafonseca
Copy link
Author

Lint failed on a previously existing issue, unrelated to the change

@rsafonseca
Copy link
Author

pushed a commit to fix previously existing linting failure

@pierrehilbert pierrehilbert added the Team:Elastic-Agent-Data-Plane Label for the Agent Data Plane team label Nov 22, 2024
@botelastic botelastic bot removed the needs_team Indicates that the issue/PR needs a Team:* label label Nov 22, 2024
@elasticmachine
Copy link
Collaborator

Pinging @elastic/elastic-agent-data-plane (Team:Elastic-Agent-Data-Plane)

@pierrehilbert
Copy link
Collaborator

@belimawr could you please have a look here?
If I remember correctly you worked on a similar issue in the past

@rsafonseca
Copy link
Author

This PR doesn't clean the backlog of existing registry entries, but it does prevent them from piling up as they are now removed when the input is closed

@belimawr
Copy link
Contributor

@belimawr could you please have a look here? If I remember correctly you worked on a similar issue in the past

I'll take a look at it today.

Copy link
Contributor

mergify bot commented Nov 25, 2024

This pull request is now in conflicts. Could you fix it? 🙏
To fixup this pull request, you can check out it locally. See documentation: https://help.github.com/articles/checking-out-pull-requests-locally/

git fetch upstream
git checkout -b fix_clean_removed upstream/fix_clean_removed
git merge upstream/main
git push upstream fix_clean_removed

@belimawr
Copy link
Contributor

@rsafonseca, thanks for the PR. The log input has been deprecated for a while, hence we haven't been adding features to it.

It's replacement is the Filestream input that has got may advantages over the log input, including cleaning up the registry (look for the clean_* options). This prevents the registry from growing indefinitely.

Have you tried it? Is there any reason why it wouldn't work for you?

@rsafonseca
Copy link
Author

rsafonseca commented Nov 25, 2024

@rsafonseca, thanks for the PR. The log input has been deprecated for a while, hence we haven't been adding features to it.

It's replacement is the Filestream input that has got may advantages over the log input, including cleaning up the registry (look for the clean_* options). This prevents the registry from growing indefinitely.

Have you tried it? Is there any reason why it wouldn't work for you?

Hi @belimawr,

I'm actually using the container input which is not deprecated, since this is the default and recommended for containers, which is a wrapper around the log input.

return log.NewInput(cfg, outletFactory, context)

The container input does have all the clean_ options listed in the docs, but they don't work because they are not all implemented in the log input https://www.elastic.co/guide/en/beats/filebeat/current/filebeat-input-container.html

Oddly, it seems that it does have a config property for stream, but since it's using the log input it's probably doing nothing, so maybe someone just forgot to finish this last bit? 😹

So, either we could merge this to fix the problem with the log input, regardless of it being deprecated, or we should get the container input actually migrated to Filestream input

@belimawr
Copy link
Contributor

@rsafonseca, thanks for the PR. The log input has been deprecated for a while, hence we haven't been adding features to it.
It's replacement is the Filestream input that has got may advantages over the log input, including cleaning up the registry (look for the clean_* options). This prevents the registry from growing indefinitely.
Have you tried it? Is there any reason why it wouldn't work for you?

Hi @belimawr,

I'm actually using the container input which is not deprecated, since this is the default and recommended for containers, which is a wrapper around the log input.

return log.NewInput(cfg, outletFactory, context)

The container input does have all the clean_ options listed in the docs, but they don't work because they are not all implemented in the log input https://www.elastic.co/guide/en/beats/filebeat/current/filebeat-input-container.html

Oddly, it seems that it does have a config property for stream, but since it's using the log input it's probably doing nothing, so maybe someone just forgot to finish this last bit? 😹

So, either we could merge this to fix the problem with the log input, regardless of it being deprecated, or we should get the container input actually migrated to Filestream input

Tanks for the quick reply @rsafonseca !

Indeed the container input using the log input has been a challenge for us, we're trying to migrate it to Filestream, ensuring safe and correct state migration on a highly dynamic environment like Kubernets does pose a challenge.

What we've been recommending is for users to migrate to the Filestream input with the container parser + fingerprint for the file identity.

If you're using the input directly, then you can configure it like this:

filebeat.inputs:
  - type: filestream
    id: kubernetes-container-logs
    paths:
      - /var/log/containers/*.log
    parsers:
      - container: ~
    prospector:
      scanner:
        fingerprint.enabled: true
        symlinks: true
    file_identity.fingerprint: ~
    processors:
      - add_kubernetes_metadata:
          host: '${NODE_NAME}'
          matchers:
            - logs_path:
                logs_path: /var/log/containers/

Or if you want to use autodiscover:

filebeat.autodiscover:
  providers:
    - type: kubernetes
      node: '${NODE_NAME}'
      hints.enabled: true
      hints.default_config:
        type: filestream
        id: >-
          kubernetes-container-logs-${data.kubernetes.pod.name}-${data.kubernetes.container.id}
        paths:
          - '/var/log/containers/*-${data.kubernetes.container.id}.log'
        parsers:
          - container: ~
        prospector:
          scanner:
            fingerprint.enabled: true
            symlinks: true
        file_identity.fingerprint: ~

Those two examples are from our default Kubernetes manifest

The only downside from this migration is that at first there will be a little of duplication as the Filestream input will ingest any already-existing file from the beginning, however as Kubernetes is a highly dynamic environment and the container log files usually rotate quickly, usually the data duplication is small.

You can also take a look at ignore_inactive to reduce the amount of duplicated data, with the caveat that there might be a little bit of data not ingested while Filebeat is restarting.

Would that work for you?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport-8.x Automated backport to the 8.x branch with mergify Team:Elastic-Agent-Data-Plane Label for the Agent Data Plane team
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Log input]Forever growing registry file with kubernetes autodiscovery
4 participants