Additional stats fields for Elasticsearch by 3kt · Pull Request #41652 · elastic/beats

3kt · 2024-11-15T17:32:30Z

Proposed commit message

Adds creation_date and tier_preference fields for elasticsearch.index dataset.
This will be necessary for further development through elastic/integrations#11656

Checklist

My code follows the style guidelines of this project
I have commented my code, particularly in hard-to-understand areas
I have made corresponding changes to the documentation.
~~I have made corresponding change to the default configuration files~~ N/A
I have added tests that prove my fix is effective or that my feature works
I have added an entry in CHANGELOG.next.asciidoc or CHANGELOG-developer.next.asciidoc.

Regarding the documentation, the example document is copied from the data.json file, accurately modified in this PR.

Another modification in the integrations repo will be required (for this file)

Disruptive User Impact

This "shouldn't" have an impact on end-users, this doesn't alter existing behavior but only adds 2 new fields that will be exposed in the gathered Elasticsearch monitoring stats.

Author's Checklist

Modify the integrations repository so that the documentation is accurate there: Exposed and documented new fields creation_date and tier_preference for Elasticsearch integration integrations#11942
Modify the Elasticsearch code for the built-in index template: tier_preference and creation_date fields in monitoring template elasticsearch#117851

How to test this PR locally

You can run the integration against any cluster (with xpack or otherwise) and check that the generated index stats documents have the two new fields:

creation_date
tier_preference

Screenshots

mergify · 2024-11-15T17:33:08Z

This pull request does not have a backport label.
If this is a bug or security fix, could you label this PR @3kt? 🙏.
For such, you'll need to label your PR with:

The upcoming major version of the Elastic Stack
The upcoming minor version of the Elastic Stack (if you're not pushing a breaking change)

To fixup this pull request, you need to add the backport labels for the needed
branches, such as:

backport-8./d is the label to automatically backport to the 8./d branch. /d is the digit

mergify · 2024-11-15T17:33:08Z

backport-8.x has been added to help with the transition to the new branch 8.x.
If you don't need it please use backport-skip label and remove the backport-8.x label.

metricbeat/module/elasticsearch/index/data.go

metricbeat/module/elasticsearch/elasticsearch.go

3kt · 2024-11-28T23:09:04Z

Ran a quick smoke test by running the new Elasticsearch module (with xpack.enabled: true) against a large internal cloud cluster:

8.7 TB RAM
151 nodes
47k shards

The module collected the additional fields as expected, but we will need an additional modification in Elasticsearch repo for the index template, as I believe this is hardcoded there.

The below diagrams present various metrics for the targeted cluster. The first 3 hours (before the annotation) don't use the new metricbeat module, the 3 hours after do. The cluster is located in us-east-1, and it was between 12:00 and 18:00 local in the target time frame (I'm in EMEA so the screenshots below show different times).

All these charts are filtered on attributes.data :"hot", since the collection takes place over API (which therefore gets solely routed to hot nodes, since no coordination nodes are used).

Also note that I used the scope: cluster collection mechanism in the Metricbeat module, but since my code modification only impacts the index dataset, I don't expect this to have an influence on cluster load.

Management thread pool count:

Barely perceptible increase following the additional deployment of a metricbeat. I don't think we can put this on the new code (and therefore metadata collection), but rather on the fact that I added a Metricbeat, rather than replaced the existing one.

CPU usage:

There doesn't seem to be any major difference, the last spike seems to be caused by cluster activity, rather than monitoring collection.

Garbage collections:

Similar opinion here, the GC spike correlates with the CPU increase

Heap usage:

No notable difference here either. The decrease could be related to US east coast winding down, as this is where the target cluster was located.

I will keep collection active for longer, but I don't see any concrete evidence for now that this addition could negatively impact the health or stability of the cluster.

consulthys · 2024-11-29T04:19:16Z

Thanks for running these tests, they look promising!

The module collected the additional fields as expected, but we will need an additional modification in Elasticsearch repo for the index template, as I believe this is hardcoded there.

Do you mean this index template? That's easily modifiable

henningandersen · 2024-12-03T14:19:00Z

Did you also collect the response sizes and response times? Also, it would be interesting to know the frequency at which this is called.

3kt · 2024-12-03T15:42:32Z

@henningandersen about collection rate, I didn't change the default of 1 collection point every 10 seconds:

Response size is an interesting one - since I use "external" collection, the addition of my Metricbeat translates into additional outbound traffic, which we can quantify with the billing API. Looking at the last 10 days, we have the following:

This seems to represent around ~300MB or 16% of added traffic per hour for this deployment. In a "real" scenario, the collection would be internal though.

Also note that this increase could be caused by the cluster upgrade from 8.15 to 8.16, which happened over the displayed time range.

Going beyond, the internode traffic didn't drastically change after the addition of this collection:

Paying attention to the scale of internode and outbound traffic, the new collection data volume would probably get "drowned" in the TBs of internode traffic.

3kt · 2024-12-03T18:39:40Z

Added a debug call in the GetClusterState:

	// Measure how long the API call takes
	start := time.Now()

	queryString := strings.Join(queryParams, "&")

	content, err := fetchPath(http, resetURI, clusterStateURI, queryString)
	if err != nil {
		return nil, err
	}

	elapsed := time.Since(start)

	// Display time in ms
	fmt.Printf("Cluster state response size: %d - took %d ms\n", len(content), elapsed.Milliseconds())

For the "old" code base:

data points: 54
average response time: 673.9 ms
std dev response time: 370.0 ms
response size: 14540971 b

For the "new" code base, hitting the same large SRE cluster, I get:

data points: 100
average response time: 7727.0 ms
std dev response time: 1497.0 ms
response size: 15679947 b

In short:

| Metric              | Old      | New      | Difference |
| Data point          | 54       | 100      |            |
| Response size bytes | 14540971 | 15679947 | +7.8%      |
| Response time ms    | 673.9    | 7727.0   | +1046%     |

cc @consulthys @henningandersen

3kt · 2024-12-08T20:30:33Z

Discarding, in favor of #41944

3kt added the enhancement label Nov 15, 2024

botelastic bot added the needs_team Indicates that the issue/PR needs a Team:* label label Nov 15, 2024

mergify bot assigned 3kt Nov 15, 2024

mergify bot added the backport-8.x Automated backport to the 8.x branch with mergify label Nov 15, 2024

3kt added the Team:Monitoring Stack Monitoring team label Nov 17, 2024

botelastic bot removed the needs_team Indicates that the issue/PR needs a Team:* label label Nov 17, 2024

consulthys reviewed Nov 21, 2024

View reviewed changes

metricbeat/module/elasticsearch/index/data.go Show resolved Hide resolved

metricbeat/module/elasticsearch/index/data.go Show resolved Hide resolved

metricbeat/module/elasticsearch/index/data.go Outdated Show resolved Hide resolved

3kt requested a review from consulthys November 21, 2024 20:26

consulthys reviewed Nov 21, 2024

View reviewed changes

metricbeat/module/elasticsearch/elasticsearch.go Show resolved Hide resolved

3kt added 15 commits November 22, 2024 01:53

Additional stats fields for Elasticsearch

6ac18f9

Populate tier preference and creation date from cluster state

0826d9c

Mage update

059c13f

Fixed tier_preference fetch

a4636c5

Need to also fetch metadata

703b011

Fixed tier_preference / creation_date fetch

c751696

Tier preference is a single value

81b4432

Added filter_path when cluster state is fetched

39077a8

Also consider require._tier_preference key

7e3e633

Tier preference is optional, don't fail if missing

b3aaf84

Fixed other cluster state call

594c13b

format

2d6c9d9

Matched test to endpoint

5417abb

Fix filter path

ba6cc57

lint

e194958

consulthys approved these changes Nov 22, 2024

View reviewed changes

Fix linting

d078d00

3kt added 2 commits December 2, 2024 12:38

Added entry to CHANGELOG.next.asciidoc

3853498

Merge branch 'main' into additional_fields_index_stats

d0ffd8b

3kt marked this pull request as ready for review December 2, 2024 11:52

3kt requested a review from a team as a code owner December 2, 2024 11:52

This was referenced Dec 2, 2024

Exposed and documented new fields creation_date and tier_preference for Elasticsearch integration elastic/integrations#11942

Closed

tier_preference and creation_date fields in monitoring template elastic/elasticsearch#117851

Merged

3kt mentioned this pull request Dec 6, 2024

Additional stats fields for Elasticsearch #41944

Merged

8 tasks

3kt closed this Dec 8, 2024

This was referenced Dec 12, 2024

[8.x](backport #41944) Additional stats fields for Elasticsearch #42007

Merged

[8.17](backport #41944) Additional stats fields for Elasticsearch #42027

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Additional stats fields for Elasticsearch#41652

Additional stats fields for Elasticsearch#41652
3kt wants to merge 18 commits intoelastic:mainfrom
3kt:additional_fields_index_stats

3kt commented Nov 15, 2024 •

edited

Loading

Uh oh!

mergify bot commented Nov 15, 2024

Uh oh!

mergify bot commented Nov 15, 2024

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

3kt commented Nov 28, 2024

Uh oh!

consulthys commented Nov 29, 2024

Uh oh!

henningandersen commented Dec 3, 2024

Uh oh!

3kt commented Dec 3, 2024 •

edited

Loading

Uh oh!

3kt commented Dec 3, 2024 •

edited

Loading

Uh oh!

3kt commented Dec 8, 2024 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

3kt commented Nov 15, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Proposed commit message

Checklist

Disruptive User Impact

Author's Checklist

How to test this PR locally

Screenshots

Uh oh!

mergify bot commented Nov 15, 2024

Uh oh!

mergify bot commented Nov 15, 2024

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

3kt commented Nov 28, 2024

Management thread pool count:

CPU usage:

Garbage collections:

Heap usage:

Uh oh!

consulthys commented Nov 29, 2024

Uh oh!

henningandersen commented Dec 3, 2024

Uh oh!

3kt commented Dec 3, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

3kt commented Dec 3, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

3kt commented Dec 8, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

3kt commented Nov 15, 2024 •

edited

Loading

3kt commented Dec 3, 2024 •

edited

Loading

3kt commented Dec 3, 2024 •

edited

Loading

3kt commented Dec 8, 2024 •

edited

Loading