-
Notifications
You must be signed in to change notification settings - Fork 25.6k
Description
Elasticsearch version (bin/elasticsearch --version): Seen in 7.14.0 but master looks affected still
Plugins installed: []
JVM version (java -version): Bundled
OS version (uname -a if on a Unix-like system): Cloud
Description of the problem including expected versus actual behavior:
When running field caps for CCS we broadcast a request to every remote cluster and only start to merge the results together once all responses are received. We have users with 150+ large remote clusters, and one such user reported a single field caps request which ended up consuming 10+GB of heap with the un-merged responses. It doesn't take many of them to reliably cause their CCS cluster to experience OOMs.
Can we merge the results from remote clusters incrementally, and limit the number of remote cluster requests in flight? Can we shift some of the merging work onto the remote clusters even?
Steps to reproduce:
- Add a few hundred remote clusters each with several hundred indices containing ~1000 fields.
- Request field caps against all the remote clusters at once.
Provide logs (if relevant):
Link from the internal case will appear below.