[Security Solution] Enables telemetry for endpoint rules#120225
[Security Solution] Enables telemetry for endpoint rules#120225madirey merged 10 commits intoelastic:mainfrom
Conversation
|
Pinging @elastic/security-solution (Team: SecuritySolution) |
|
Pinging @elastic/security-detections-response (Team:Detections and Resp) |
|
@stevewritescode @pjhampton @marshallmain This re-enables telemetry for query and threat match rules. As discussed, some code on the receiving end may need to be updated to handle the new Did we want to enable telemetry for any other rule types? It looked like these were the only ones we had enabled in the past. |
|
jenkins test this |
|
@elasticmachine merge upstream |
1 similar comment
|
@elasticmachine merge upstream |
marshallmain
left a comment
There was a problem hiding this comment.
Detection engine code changes LGTM
@stevewritescode @pjhampton what's the process for testing telemetry end-to-end?
pjhampton
left a comment
There was a problem hiding this comment.
If I'm understanding the scope of this PR correctly is that you have enabled detection rule alert telemetry on query + threat match rules. I think that is great! it doesn't resolve the telemetry issue described here with the usage collector: #119047. What you have done is on the roadmap fwiw.
One concern I have with this approach is that it is going to go into the data stream where endpoint alerts go in the security analytic cluster. The buffer we have only holds 100 items or 10MB a minute. Could these detection rule alerts create so many alerts that it drowns out the endpoint telemetry?
@pjhampton there's not detail as to what's broken in the linked issue, and I mistakenly conflated that issue with this one. Would you be able to write up a ticket describing what's broken with the existing usage collectors? Or I'd be happy to do so if you can provide details. Thanks! edit: It looks like @FrankHassanabad is working on fixing the outdated queries used by the usage collectors 👍 |
This sounds like a valid concern for a probable situation; do you have a recommendation for how to proceed? We could write to a new data stream, or disable this functionality until your cluster is ready, or both? |
|
@elasticmachine merge upstream |
|
It is possible this is broken on second look, based on https://github.com/elastic/kibana/blob/main/x-pack/plugins/security_solution/server/lib/detection_engine/rules/prepackaged_rules/elastic_endpoint_security.json being of type 'query'. It's possible the 8.0 telemetry documents are from old clusters and test data. Please hold fire while I investigate. |
|
@elasticmachine merge upstream |
💚 Build Succeeded
Metrics [docs]
History
To update your PR or re-run it, just comment with: |
|
Friendly reminder: Looks like this PR hasn’t been backported yet. |
9 similar comments
|
Friendly reminder: Looks like this PR hasn’t been backported yet. |
|
Friendly reminder: Looks like this PR hasn’t been backported yet. |
|
Friendly reminder: Looks like this PR hasn’t been backported yet. |
|
Friendly reminder: Looks like this PR hasn’t been backported yet. |
|
Friendly reminder: Looks like this PR hasn’t been backported yet. |
|
Friendly reminder: Looks like this PR hasn’t been backported yet. |
|
Friendly reminder: Looks like this PR hasn’t been backported yet. |
|
Friendly reminder: Looks like this PR hasn’t been backported yet. |
|
Friendly reminder: Looks like this PR hasn’t been backported yet. |
💚 Backport successful
This backport PR will be merged automatically after passing CI. |
…122254) * Add eventsTelemetry to query and threatMatch executors * Temp commit * Add query rule telemetry test * How'd that get in there? * Remove useless tests Co-authored-by: Kibana Machine <42973632+kibanamachine@users.noreply.github.com> Co-authored-by: Madison Caldwell <madison.rey.caldwell@gmail.com>
Summary
This PR passes the
eventsTelemetryobject through to thesiem.queryRuleandsiem.indicatorRulerule type executors. This enables those rules to queue telemetry data when alerts are created. The data includes the_sourceof each alert that is written.Additionally, the unit tests for
createQueryAlertTypehave been fixed and a test was added to ensure that alerts are queued for telemetry appropriately. This is adequate to test telemetry for all rule types for which it is enabled, since the telemetry is queued from a common codepath insearchAfterAndBulkCreate(just after callingbulkCreateto create the alerts).Finally, unit tests for all other rule types have been removed. For those tests, the executors were exiting early due to thrown exceptions when missing mocks were encountered. The tests are fragile, since they rely on internals of the executors to function, and they were producing passing results even though the executors were not running to completion. The tests were therefore misleading, as the test conditions were met without actually testing those conditions.
Checklist
Delete any items that are not applicable to this PR.
For maintainers