Additional stats fields for Elasticsearch#41652
Additional stats fields for Elasticsearch#416523kt wants to merge 18 commits intoelastic:mainfrom 3kt:additional_fields_index_stats
Conversation
|
This pull request does not have a backport label.
To fixup this pull request, you need to add the backport labels for the needed
|
|
|
|
Ran a quick smoke test by running the new Elasticsearch module (with
The module collected the additional fields as expected, but we will need an additional modification in Elasticsearch repo for the index template, as I believe this is hardcoded there.
The below diagrams present various metrics for the targeted cluster. The first 3 hours (before the annotation) don't use the new metricbeat module, the 3 hours after do. The cluster is located in us-east-1, and it was between 12:00 and 18:00 local in the target time frame (I'm in EMEA so the screenshots below show different times). All these charts are filtered on Also note that I used the Management thread pool count:
Barely perceptible increase following the additional deployment of a metricbeat. I don't think we can put this on the new code (and therefore CPU usage:
There doesn't seem to be any major difference, the last spike seems to be caused by cluster activity, rather than monitoring collection. Garbage collections:
Similar opinion here, the GC spike correlates with the CPU increase Heap usage:
No notable difference here either. The decrease could be related to US east coast winding down, as this is where the target cluster was located. I will keep collection active for longer, but I don't see any concrete evidence for now that this addition could negatively impact the health or stability of the cluster. |
|
Thanks for running these tests, they look promising!
Do you mean this index template? That's easily modifiable |
|
Did you also collect the response sizes and response times? Also, it would be interesting to know the frequency at which this is called. |
|
@henningandersen about collection rate, I didn't change the default of 1 collection point every 10 seconds:
Response size is an interesting one - since I use "external" collection, the addition of my Metricbeat translates into additional outbound traffic, which we can quantify with the billing API. Looking at the last 10 days, we have the following:
This seems to represent around ~300MB or 16% of added traffic per hour for this deployment. In a "real" scenario, the collection would be internal though. Also note that this increase could be caused by the cluster upgrade from 8.15 to 8.16, which happened over the displayed time range. Going beyond, the internode traffic didn't drastically change after the addition of this collection:
Paying attention to the scale of internode and outbound traffic, the new collection data volume would probably get "drowned" in the TBs of internode traffic. |
|
Added a debug call in the For the "old" code base: For the "new" code base, hitting the same large SRE cluster, I get: In short: |
|
Discarding, in favor of #41944 |








Proposed commit message
Adds
creation_dateandtier_preferencefields forelasticsearch.indexdataset.This will be necessary for further development through elastic/integrations#11656
Checklist
I have made corresponding change to the default configuration filesN/ACHANGELOG.next.asciidocorCHANGELOG-developer.next.asciidoc.Regarding the documentation, the example document is copied from the
data.jsonfile, accurately modified in this PR.Another modification in the
integrationsrepo will be required (for this file)Disruptive User Impact
This "shouldn't" have an impact on end-users, this doesn't alter existing behavior but only adds 2 new fields that will be exposed in the gathered Elasticsearch monitoring stats.
Author's Checklist
How to test this PR locally
You can run the integration against any cluster (with
xpackor otherwise) and check that the generated index stats documents have the two new fields:creation_datetier_preferenceScreenshots