[ResponseOps] Integrate rule and action monitoring data to the monitoring collection plugin by chrisronline · Pull Request #123416 · elastic/kibana

chrisronline · 2022-01-19T22:05:22Z

Relates to #123637

Summary

This PR makes use of the new monitoring collection system by collecting metrics for both the alerting and actions plugins. The specific metrics answer these questions and some were readily available, while others were added in this PR.

We are segmenting our metrics (in each plugin) by either a metric that pertains to just this specific Kibana instance or a metric that pertains to all Kibanas in a cluster. We call the former a "node" level metric and the latter a "cluster" level metric (and the terminology in this PR should reflect this).

The "node" level metrics added in this PR is the most new code and involves in memory counters that are incremented each time a rule or action finishes execution, and each time a rule or action's execution results in a failure. These metrics are represented by the type node_actions and node_rules.

The "cluster" level metrics added in this PR are the results of a query to the task manager index where we return the number of delayed tasks. A delayed task is a task that is either runAt() < now and status = Idle or retryAt() < now and (status = Running || status = Claimed). We include a count of delayed tasks, as well as p50 and p99 data points around how long the delay is (in ms)

As part of the integration with the monitoring collection plugin, the registration of these new collectors means that the following routes will now return the above data:

/api/monitoring_collection/node_rules
/api/monitoring_collection/cluster_rules
/api/monitoring_collection/node_actions
/api/monitoring_collection/cluster_actions

Testing

To test, you'll need to create some rules with some actions and verify the data from the above endpoints matches what you expect. To simulate delayed tasks, you can configure task manager to run 1 rule every 1min or something but make sure the rules run faster than that.

…ection

… rops/rule_monitoring

…ection

… rops/rule_monitoring

…ection

… rops/rule_monitoring

…ection

ymao1

Looks good overall! Left some minor comments about code consolidation. It would also be nice to add functional tests for this if possible.

Also wondering if the api/monitoring_collection endpoints should be internal?

chrisronline · 2022-03-16T19:42:08Z

It would also be nice to add functional tests for this if possible.

Great idea, I'll add some.

Also wondering if the api/monitoring_collection endpoints should be internal?

These APIs are designed to be publicly accessible, as users might want to consume this data and use it in other monitoring solutions and we want to make that possible

chrisronline · 2022-03-17T15:30:23Z

@ymao1 Thanks for all the great suggestions. I've implemented nearly all of them and this PR is ready for another round!

ymao1

LGTM! Just one comment about types

chrisronline · 2022-03-24T13:05:22Z

@elasticmachine merge upstream

kibana-ci · 2022-03-24T16:16:45Z

💚 Build Succeeded

Metrics [docs]

Public APIs missing comments

Total count of every public API that lacks a comment. Target amount is 0. Run node scripts/build_api_docs --plugin [yourplugin] --stats comments for more detailed information.

id	before	after	diff
`monitoringCollection`	5	9	+4
`taskManager`	33	39	+6
total			+10

Public APIs missing exports

Total count of every type that is part of your API that should be exported but is not. This will cause broken links in the API documentation system. Target amount is 0. Run node scripts/build_api_docs --plugin [yourplugin] --stats exports for more detailed information.

id	before	after	diff
`monitoringCollection`	1	0	-1

Unknown metric groups

API count

id	before	after	diff
`monitoringCollection`	5	9	+4
`taskManager`	71	77	+6
total			+10

History

💚 Build #33468 succeeded 5a112dc
💚 Build #33086 succeeded b24411f
💚 Build #32251 succeeded 4ad5943
💚 Build #31817 succeeded 4d2d0c7
💔 Build #31799 failed f153012
💚 Build #31485 succeeded dd54b92

To update your PR or re-run it, just comment with:
@elasticmachine merge upstream

cc @chrisronline

chrisronline and others added 18 commits January 18, 2022 10:29

Add new plugint to collect additional kibana monitoring metrics

464035c

Readme

8b85956

Update generated document

c888390

WIP

83102fb

Remove task manager and add support for max number

30e4364

Use MAX_SAFE_INTEGER

6d1dd61

Merge remote-tracking branch 'elastic/main' into rops/monitoring_coll…

01b47ea

…ection

We won't use this route

75a246c

Merge remote-tracking branch 'origin/rops/monitoring_collection' into…

f299203

… rops/rule_monitoring

Tests and lint

f356bb6

Merge branch 'main' into rops/monitoring_collection

f23b977

Track actions

ae29f8f

Use dynamic route style

1af47c3

Merge remote-tracking branch 'origin/rops/monitoring_collection' into…

ff420a6

… rops/rule_monitoring

Merge remote-tracking branch 'elastic/main' into rops/monitoring_coll…

3ccaef6

…ection

Merge remote-tracking branch 'origin/rops/monitoring_collection' into…

a1c3d07

… rops/rule_monitoring

Fix test

4f55170

Merge branch 'main' into rops/monitoring_collection

4e087a2

chrisronline mentioned this pull request Jan 28, 2022

[Response Ops] Collect telemetry on metrics used with Stack Monitoring integration #124047

Open

chrisronline added 11 commits January 28, 2022 15:52

Add in mapping verification

965d7fb

Merge remote-tracking branch 'elastic/main' into rops/monitoring_coll…

6ab55ea

…ection

Merge remote-tracking branch 'origin/rops/monitoring_collection' into…

c237843

… rops/rule_monitoring

Adapt to new changes in base PR

d054eef

Merge remote-tracking branch 'elastic/main' into rops/monitoring_coll…

c61c1f2

…ection

Fix types

ff2ad0a

Merge remote-tracking branch 'elastic/main' into rops/monitoring_coll…

feab3f4

…ection

Feedback from PR

f7a2870

PR feedback

b7effa9

We do not need this

a40ad0d

Merge remote-tracking branch 'elastic/main' into rops/monitoring_coll…

072c835

…ection

chrisronline added 6 commits March 10, 2022 09:44

Merge remote-tracking branch 'elastic/main' into rops/rule_monitoring

81fdbee

Add logging and use a class

2094ed8

Merge remote-tracking branch 'elastic/main' into rops/rule_monitoring

8d8642d

fix types

7c8912a

Merge remote-tracking branch 'elastic/main' into rops/rule_monitoring

28b2064

Fix tests

29c9bd2

chrisronline mentioned this pull request Mar 14, 2022

[ResponseOps] New metricsets for Kibana stack monitoring elastic/beats#29899

Merged

chrisronline added 2 commits March 15, 2022 12:25

Merge remote-tracking branch 'elastic/main' into rops/rule_monitoring

f1562de

Merge remote-tracking branch 'elastic/main' into rops/rule_monitoring

fd72f34

ymao1 approved these changes Mar 16, 2022

View reviewed changes

PR feedback

98d2443

chrisronline requested a review from ymao1 March 17, 2022 15:18

ymao1 approved these changes Mar 17, 2022

View reviewed changes

Comment thread x-pack/plugins/actions/server/monitoring/types.ts

Comment thread x-pack/plugins/alerting/server/monitoring/types.ts

chrisronline added 7 commits March 17, 2022 12:00

Add types

463dbe9

Fix types

dd54b92

Merge remote-tracking branch 'elastic/main' into rops/rule_monitoring

f153012

Linting fixes

8ed6458

Merge remote-tracking branch 'elastic/main' into rops/rule_monitoring

4d2d0c7

Merge remote-tracking branch 'elastic/main' into rops/rule_monitoring

4ad5943

Merge remote-tracking branch 'elastic/main' into rops/rule_monitoring

b24411f

kibanamachine and others added 2 commits March 24, 2022 09:05

Merge branch 'main' into rops/rule_monitoring

5a112dc

Remove unnecessary changes

e4aa850

chrisronline removed the request for review from a team March 24, 2022 14:49

chrisronline merged commit f981d53 into elastic:main Mar 24, 2022

chrisronline deleted the rops/rule_monitoring branch March 24, 2022 16:23

chrisronline mentioned this pull request May 20, 2022

[Alerting] Phase 3 of AoA: Integration with Stack Monitoring #122366

Closed

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[ResponseOps] Integrate rule and action monitoring data to the monitoring collection plugin#123416

[ResponseOps] Integrate rule and action monitoring data to the monitoring collection plugin#123416
chrisronline merged 62 commits intoelastic:mainfrom
chrisronline:rops/rule_monitoring

chrisronline commented Jan 19, 2022 •

edited

Loading

Uh oh!

ymao1 left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

chrisronline commented Mar 16, 2022

Uh oh!

chrisronline commented Mar 17, 2022

Uh oh!

ymao1 left a comment

Uh oh!

Uh oh!

Uh oh!

chrisronline commented Mar 24, 2022

Uh oh!

kibana-ci commented Mar 24, 2022

API count

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

Conversation

chrisronline commented Jan 19, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Testing

Uh oh!

ymao1 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

chrisronline commented Mar 16, 2022

Uh oh!

chrisronline commented Mar 17, 2022

Uh oh!

ymao1 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

chrisronline commented Mar 24, 2022

Uh oh!

kibana-ci commented Mar 24, 2022

💚 Build Succeeded

Metrics [docs]

Public APIs missing comments

Public APIs missing exports

API count

History

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

chrisronline commented Jan 19, 2022 •

edited

Loading