-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Log input]Forever growing registry file with kubernetes autodiscovery #13140
Comments
Hi @marqc, thanks for the report, we are investigating the issue. On the meantime, I think that your configuration is not doing what you expect. There are two ways of configuring filebeat autodiscover, one with hints, and another one with templates, mixing them is possible, but can lead to some unexpected behaviours. In your case you are enabling hints-based configuration with
BUT, this configuration will apply to all containers, as well as hints-based autodiscover, so you would have the default configuration of hints-based autodiscover, and this template working at the same time for any container. If you want to override some options (like these
|
@jsoriano thanks, I have alreadydone that and overriding attributes works as expected. The original issue is not affected by this change. It still leaves entries in registry if log file is not deleted from disk in 5 minutes after container is stopped (crashed, pod evicted, job finished). |
@marqc we can confirm that there is some issue cleaning the state of files that are not owned by any input. There is an ongoing effort to refactor filebeat registry that will probably help here. In the meantime the only solution would be to stop filebeat and cleanup the registry file with some script. |
Is there the latest solution, about cleaning up the status in the registry? |
FYI, leaking registry entries (in our case with > 15k registry entries) also caused filebeat to stall and rarely send any events. After cleanup the performance was ok again (version 7.9.3). |
We suffer from the same issue as well. Log delivery based on filebeat is not really stable. |
@jsoriano 7.9.2 We have entries in registry form May 2020, however containers and related folders gone from the system quite long time ago. |
@jsoriano Hard to tell, Kibana tells me ~2k unique The following screen shows the registry growing averaged per filebeat: The drop in the graph is where I did a manual cleanup of some pods. |
Pinging @elastic/agent (Team:Agent) |
Did someone find a solution without changing source code? We've faced with the same problem within a Filebeat in a Kubernetes cluster. |
We have been working on a new input that may help solving this issue, as it is able to cleanup registry entries that are no longer used, I've created an issue to test and validate the approach: elastic/integrations#1526 |
Any news on mitigations here? |
Pinging @elastic/elastic-agent-data-plane (Team:Elastic-Agent-Data-Plane) |
I think the question is if we can the hints-based autodiscover working together with the new |
It looks to me like @MichaelKatsoulis checked in the switch to the filestream input in elastic/integrations#2139, so this might already be done? |
I think the integrations are not using or supporting |
As the issue comes from the filebeat.autodiscover:
providers:
- type: kubernetes
cleanup_timeout: 5m
hints.enabled: true
hints.default_config:
type: filestream
id: "my-id-${data.kubernetes.container.id}"
paths:
- "/var/lib/docker/containers/${data.kubernetes.container.id}/*-json.log"
scan_frequency: 3s
message_max_bytes: 1000000
clean_removed: true
parsers:
- container: ~ Note: without a dynamic id ( |
thanks @fdartayre! We are a heavy user of Kubernetes POD annotations to configure the logs input, such as
or
To my knowledge this will not work correctly with the new |
Has this problem been solved? I still have this problem in filebeat7.9.2 |
Have you tried to upgrade to a more recent version? As mentioned in #13140 (comment) you may try to use the |
Thank you for resolving |
Is this a validated and functional code to run filestream with autodiscover and hints? #13140 (comment) filebeat.autodiscover: |
@jsoriano , @fdartayre : why do we use and suggest |
Want to put a note here for what we found. |
I fix this issue with pr #34904
|
Hi! We're labeling this issue as |
Can this be re-opened? It is still an issue |
When using kubernetes autodiscover provider registry file tends to grow in time leaving a lot of entries with TTL=-2. This entries are never removed from registry. eg.
sample config:
When pods are stopped inputs are stopped/disabled and marked with TTL=-2, log files are often getting removed from disk after that (for example for jobs from cronjob it can keep stopped docker containers for long time), so with no active Input it won't be traced and won't be removed from registry.
For state to be removed from registry "states.Update" method must be called on it, but with autodiscovery pattern containing containerId no input will ever keep track of them and make them get removed from registry.
I think, that kubernetes autodiscovery should always remove state from registry when final cleanup_timeout "stop" event is send, because kubernetes will never re-run the same already stopped container (it always creates new one).
The text was updated successfully, but these errors were encountered: