Skip to content

Proposal: Cache metadata to local disk/db #19511

@jsoriano

Description

@jsoriano

To enrich events collected from Cloud Foundry, Beats need to query the apps API, this is done now by add_cloudfoundry_metadata, and it uses an in-memory cache to avoid querying again for the metadata of apps already seen. Metadata is cached by default during 2 minutes, but this time can be increased with the cache_duration setting.

Big Cloud Foundry deployments can have several thousands of applications running at the same time, in these cases add_cloudfoundry_metadata needs to cache a lot of data. Increasing cache_duration can help on these cases to reduce the number of requests done. But on restarts this data is lost and needs to be requested again on startup, provoking thousands of requests that could be avoided if this data were persisted somewhere else. Other implementations of Cloud Foundry events consumers (nozzles) persist the app metadata on disk to prevent problems with this.

Beats have several features that use internal in-memory caches. In general they don't contain so much data, but it could also happen for example with kubernetes since we support monitoring at the cluster scope (see #14738). So maybe other features could also benefit of having some kind of persistence between restarts.

If persisted to disk, we would need to make sure that the data directory also persists between restarts. This is relevant when running Beats on containers/Kubernetes.

Depending on how this is implemented, persistent cache could be shared between beats, so for example Filebeat and Metricbeat monitoring the same Cloud Foundry deployment from the same host could share their local caches if they are in some local database, or if they have access to some common data directory.

The enrich processor of Elasticsearch could help here, but at the moment only Filebeat supports pipelines, and even with support for pipelines it wouldn't be so clear how to add this step to existing pipelines.

@exekias @urso I would like to have your thoughts on this.

Metadata

Metadata

Assignees

Labels

Team:PlatformsLabel for the Integrations - Platforms teamdiscussIssue needs further discussion.int-goalInternal goal of an iteration

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions