[Security Solution][Alerts] improves IM rule memory usage and performance by vitaliidm · Pull Request #149208 · elastic/kibana

vitaliidm · 2023-01-19T12:10:56Z

Summary

addresses [Security Solution][Alerts] Indicator Match rule performance and memory usage #148821
instead of loading all threats result in memory and also creating object map from it(which doubles memory consumption)
it creates signals map right straight from threat results(in getSignalsMatchesFromThreatIndex) and doesn't keep all threats in memory anymore
because threats are not kept in memory anymore, additional request introduced that fetches threats before enrichments, based on signals map created on the previous stage
performance measurements
additional issues opened while working on current PR:
- [Security Solution][Alerts] for IM rule type, in events search first path of execution, rule fails due to large content in one of the responses from ES #150041
- [Security Solution][Alerts] introduce limit of parallel executions for rules /preview route #150038
reduces further number of matched threats in alert to 200, to prevent Kibana browser tab becoming unresponsive

Checklist

Delete any items that are not applicable to this PR.

Unit or functional tests were updated or added to match the most common scenarios

For maintainers

This was checked for breaking API changes and was labeled appropriately

…ance

…/vitaliidm/kibana into alerts/im-memory-and-performance

…-ref HEAD~1..HEAD --fix'

…/vitaliidm/kibana into alerts/im-memory-and-performance

elasticmachine · 2023-02-01T15:30:20Z

Pinging @elastic/security-solution (Team: SecuritySolution)

…/vitaliidm/kibana into alerts/im-memory-and-performance

nkhristinin · 2023-02-06T11:45:33Z

...ity_solution/server/lib/detection_engine/signals/threat_mapping/threat_enrichment_factory.ts

+        },
+      };
+
+      const threatResponse = await getThreatList({


Should we here specify per page here (9k for example)?
Because looks like we query only 1k threats, but there can be more?

this part basically replicates current behaviour in 8.6, where max of 1,000 threats is fetched before building the enrichment.

I don't know whether this was conscious decision to avoid loading large number of documents in memory(with potentially large number of fields depends on _source: [${threatIndicatorPath}.*, 'threat.feed.*'], in threat params) or just overlook.

There is some hard limit of 10,000 items per page though.
Do you think it would be safe to load 9,000 items in one request with no control on size of the document?

We do fetch 9,000 threats in one request on initial phase of the execution. But it returns only indicator mapping fields, which response is significantly smaller

I think before it was like:
const calculatedPerPage = perPage ?? INDICATOR_PER_PAGE;
And perPage was 9000 I think

So, it worked before for this amount, at least I don't remember any client SDH about that.

Right now we can potentially miss some threat indicators for matches, and some alerts can't be without matches at all.
In this trade-off, I lean more toward keeping 9k as a limit.

I think before it was like:
const calculatedPerPage = perPage ?? INDICATOR_PER_PAGE;
And perPage was 9000 I think

yes, that's where this part is coming from:

We do fetch 9,000 threats in one request on initial phase of the execution. But it returns only indicator mapping fields, which response is significantly smaller

It fetches, 9,000 batch of threat documents, but only indicator mapping fields. So, the size of each document is small.

But, when it fetches threats to make enrichments, it uses default page size as 1,000 as perPage is undefined in that call. And it returns, the whole section of document, defined in query parameters

_source: [${threatIndicatorPath}.*, 'threat.feed.*'],

for that particular call, which fetches threats for enrichments
So, this implementation is essentially kept in the same way as in 8.6.

While, I agree, it can miss a lot of potential threats my concern is around fetching of large number of documents.
Which could result:

large memory consumption

too large content of response that causes rule execution error. Corresponding issue

So, it worked before for this amount, at least I don't remember any client SDH about that.
It worked for 1,000 for threats first search

Although it seems, 9,000 threats should not cause the issue as in 150041

@marshallmain, what are you thoughts on this?

nkhristinin · 2023-02-06T11:56:39Z

@elasticmachine merge upstream

kibana-ci · 2023-02-06T13:55:21Z

💚 Build Succeeded

Metrics [docs]

✅ unchanged

History

💔 Build #105603 failed e4ff97b
💚 Build #105559 succeeded b273ae0
💔 Build #105261 failed f783ac5
💔 Build #105237 failed fc1ad1f
💚 Build #105191 succeeded 4eecf86

To update your PR or re-run it, just comment with:
@elasticmachine merge upstream

cc @vitaliidm

marshallmain

I left a couple comments for follow up and longer term views on how we should handle threat match rules. The PR looks good to merge pending the comments in get_signals_map_from_threat_index.ts.

I still find the threat match code particularly tough to review due to the overall structure. The control flow is difficult to follow, names are not clear, and the overall design originally attempted to share the core searchAfterBulkCreate function with the query rule type, but there's now a ton of additional logic that was bolted on. I think splitting threat match rules away from searchAfterBulkCreate and making the logic independent of KQL rules would free us up to refactor more of the threat match rule type and make this a lot easier to follow. For event-first search, it seems to me that we won't need to make the extra queries in searchAfterBulkCreate at all since we already have the source events that searchAfterBulkCreate will fetch again.

@nkhristinin @vitaliidm I'd like to find some time after FF to discuss the code architecture and see if we can agree on a path forward.

marshallmain · 2023-02-06T19:45:31Z

...tion/server/lib/detection_engine/signals/threat_mapping/get_signals_map_from_threat_index.ts

+
+  while (
+    maxThreatsReachedMap.size < eventsCount &&
+    (threatList ? threatList?.hits.hits.length > 0 : true)


This condition defaulting to true seems like an unnecessary risk where a bug in getThreatList could cause an infinite loop. Can we fetch the first threat list outside the loop and have the loop break if threatList is undefined?

Suggested change

(threatList ? threatList?.hits.hits.length > 0 : true)

threatList?.hits.hits.length > 0

addressed in #150677

marshallmain · 2023-02-06T19:59:58Z

...tion/server/lib/detection_engine/signals/threat_mapping/get_signals_map_from_threat_index.ts

+    decodedQuery: ThreatMatchNamedQuery | ThreatTermNamedQuery;
+  }) => {
+    const signalMatch = signalsQueryMap.get(signalId);
+    if (!signalMatch) {


This if block looks redundant with the one on line 65 below?

addressed in #150677

marshallmain · 2023-02-06T20:39:31Z

...tion/server/lib/detection_engine/signals/threat_mapping/get_signals_map_from_threat_index.ts

+
+export type SignalsQueryMap = Map<string, ThreatMatchNamedQuery[]>;
+
+interface GetSignalsMatchesFromThreatIndexOptions {


this interface name doesn't match the function name

addressed in #150677

marshallmain · 2023-02-06T21:01:41Z

...tion/server/lib/detection_engine/signals/threat_mapping/get_signals_map_from_threat_index.ts

+          const values = Array.isArray(threatValue) ? threatValue : [threatValue];
+
+          values.forEach((value) => {
+            if (value && signalValueMap) {


It appears that signalValueMap is required for this function to work correctly when the query against the threatList uses a terms query. However, the type system doesn't enforce that constraint so it would be easy to miss and accidentally use a terms query without providing the signalValueMap.

We should continue to work on ways to simplify the threat match rule logic so it's easier to maintain moving forward, and use the TS compiler to enforce requirements like this whenever possible.

Good catch.
This part was adopted from the initial IM terms PR and integrated into this one.
The issue I see here, it's difficult to determine whether query has any terms clauses within it and whether they related to IM execution path. Also it can be determined only in a runtime environment, when all checks for index fields performed and it's clear terms query can be generated.

So, instead, I introduced parameter termsQueryAllowed, that would require from developer to set it to true when terms query can be used. Once its set, it will require signalValueMap to be passed as argument

marshallmain · 2023-02-06T21:30:03Z

...server/lib/detection_engine/signals/threat_mapping/get_signals_map_from_threat_index.test.ts

+    });
+
+    // the third request return empty results
+    getThreatListMock.mockReturnValueOnce({


I've found tests that mock ES responses are often brittle, either failing when they shouldn't or not failing when they should. It's fine to leave these in, but if we make changes to the implementation and find that these tests require a lot of maintenance it may be easier to remove them or refactor them so they don't rely on mock responses as much, e.g. by extracting the ES calls from the encoding logic and testing the encoding logic with unit tests and ES calls would be covered by integration tests.

Thanks for highlighting these concerns.

I share the opinion, it might break too easy if implementation of getSignalsQueryMapFromThreatIndex changes.

On the other hand, I think not all the cases might be possible to cover easily by functional tests. For example if we want to cover processing of multiple result pages, we would need to populate test index with thousands of documents. So, it might be easier to mock supposed ES response. Especially, given the fact it unlikely changes in future.

I haven't yet encountered issue with ES mocks you mentioned

I've found tests that mock ES responses are often brittle, either failing when they shouldn't or not failing when they should.

But will pay more attention in future to those kind of tests

…0677) ## Summary - addresses feedback from #149208 - typings for `getSignalsQueryMapFromThreatIndex` - fixes interface name for `getSignalsQueryMapFromThreatIndex` - small code refactorings More details in comments of the initial PR --------- Co-authored-by: kibanamachine <42973632+kibanamachine@users.noreply.github.com>

…stic#150677) ## Summary - addresses feedback from elastic#149208 - typings for `getSignalsQueryMapFromThreatIndex` - fixes interface name for `getSignalsQueryMapFromThreatIndex` - small code refactorings More details in comments of the initial PR --------- Co-authored-by: kibanamachine <42973632+kibanamachine@users.noreply.github.com> (cherry picked from commit c08cdc8)

#150677) (#152426) # Backport This will backport the following commits from `main` to `8.7`: - [[Security Solution][Alerts] addresses IM performance PR feedback (#150677)](#150677)  ### Questions ? Please refer to the [Backport tool documentation](https://github.com/sqren/backport)  Co-authored-by: Vitalii Dmyterko <92328789+vitaliidm@users.noreply.github.com>

…stic#150677) ## Summary - addresses feedback from elastic#149208 - typings for `getSignalsQueryMapFromThreatIndex` - fixes interface name for `getSignalsQueryMapFromThreatIndex` - small code refactorings More details in comments of the initial PR --------- Co-authored-by: kibanamachine <42973632+kibanamachine@users.noreply.github.com>

vitaliidm and others added 2 commits January 19, 2023 12:09

[Security Solution][Alerts] improves IM rule memory usage and perform…

04f902f

…ance

Merge branch 'main' into alerts/im-memory-and-performance

8e580b5

marshallmain requested review from marshallmain and nkhristinin January 19, 2023 18:20

vitaliidm and others added 9 commits January 30, 2023 10:51

Merge branch 'main' into alerts/im-memory-and-performance

e302199

_source update for threats

34cbdf8

Merge branch 'alerts/im-memory-and-performance' of https://github.com…

a0e362f

…/vitaliidm/kibana into alerts/im-memory-and-performance

Merge branch 'main' into alerts/im-memory-and-performance

7e97870

raise limit to 200

0e0ba14

Merge branch 'alerts/im-memory-and-performance' of https://github.com…

e54f644

…/vitaliidm/kibana into alerts/im-memory-and-performance

refactoring of changes

0a85d42

[CI] Auto-commit changed files from 'node scripts/precommit_hook.js -…

e1d5803

…-ref HEAD~1..HEAD --fix'

furtherrefactoring

406a509

elastic deleted a comment from kibana-ci Jan 31, 2023

vitaliidm and others added 6 commits January 31, 2023 13:30

Merge branch 'alerts/im-memory-and-performance' of https://github.com…

bfd1694

…/vitaliidm/kibana into alerts/im-memory-and-performance

refactoring

2486cdb

add tests

0605938

add more tests

ca31c5f

add tests

b0d2709

Merge branch 'main' into alerts/im-memory-and-performance

4124186

vitaliidm marked this pull request as ready for review February 1, 2023 15:28

vitaliidm requested a review from a team as a code owner February 1, 2023 15:28

vitaliidm self-assigned this Feb 1, 2023

vitaliidm added the v8.7.0 label Feb 1, 2023

vitaliidm and others added 7 commits February 2, 2023 18:41

fix types

a010dfb

refactoring

564fc00

code style

f783ac5

fix failed tests

9703272

Merge branch 'main' into alerts/im-memory-and-performance

0d5fa8f

add missing tests

8dcef9b

Merge branch 'alerts/im-memory-and-performance' of https://github.com…

b273ae0

…/vitaliidm/kibana into alerts/im-memory-and-performance

nkhristinin reviewed Feb 6, 2023

View reviewed changes

nkhristinin added the ci:cloud-deploy Create or update a Cloud deployment label Feb 6, 2023

Merge branch 'main' into alerts/im-memory-and-performance

e4ff97b

nkhristinin self-requested a review February 6, 2023 11:57

nkhristinin approved these changes Feb 6, 2023

View reviewed changes

vitaliidm added ci:cloud-redeploy Always create a new Cloud deployment and removed ci:cloud-deploy Create or update a Cloud deployment labels Feb 6, 2023

Merge branch 'main' into alerts/im-memory-and-performance

e86edf3

vitaliidm enabled auto-merge (squash) February 6, 2023 13:14

marshallmain approved these changes Feb 6, 2023

View reviewed changes

vitaliidm merged commit 2ff017e into elastic:main Feb 6, 2023

vitaliidm mentioned this pull request Feb 9, 2023

[Security Solution][Alerts] addresses IM performance PR feedback #150677

Merged

vitaliidm deleted the alerts/im-memory-and-performance branch March 4, 2024 17:31

	(threatList ? threatList?.hits.hits.length > 0 : true)
	threatList?.hits.hits.length > 0


		export type SignalsQueryMap = Map<string, ThreatMatchNamedQuery[]>;

		interface GetSignalsMatchesFromThreatIndexOptions {

Conversation

vitaliidm commented Jan 19, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Checklist

For maintainers

Uh oh!

elasticmachine commented Feb 1, 2023

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

nkhristinin commented Feb 6, 2023

Uh oh!

kibana-ci commented Feb 6, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

💚 Build Succeeded

Metrics [docs]

History

Uh oh!

marshallmain left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

vitaliidm Feb 9, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

vitaliidm commented Jan 19, 2023 •

edited

Loading

kibana-ci commented Feb 6, 2023 •

edited

Loading

vitaliidm Feb 9, 2023 •

edited

Loading