Skip to content

Field caps against many remote clusters consumes substantial heap #78665

@DaveCTurner

Description

@DaveCTurner

Elasticsearch version (bin/elasticsearch --version): Seen in 7.14.0 but master looks affected still

Plugins installed: []

JVM version (java -version): Bundled

OS version (uname -a if on a Unix-like system): Cloud

Description of the problem including expected versus actual behavior:

When running field caps for CCS we broadcast a request to every remote cluster and only start to merge the results together once all responses are received. We have users with 150+ large remote clusters, and one such user reported a single field caps request which ended up consuming 10+GB of heap with the un-merged responses. It doesn't take many of them to reliably cause their CCS cluster to experience OOMs.

Can we merge the results from remote clusters incrementally, and limit the number of remote cluster requests in flight? Can we shift some of the merging work onto the remote clusters even?

Steps to reproduce:

  1. Add a few hundred remote clusters each with several hundred indices containing ~1000 fields.
  2. Request field caps against all the remote clusters at once.

Provide logs (if relevant):

Link from the internal case will appear below.

Metadata

Metadata

Assignees

Labels

:Search/SearchSearch-related issues that do not fall into other categories>bugTeam:SearchMeta label for search team

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions