[Security Solution][Alerts] improves IM rule memory usage and performance#149208
[Security Solution][Alerts] improves IM rule memory usage and performance#149208vitaliidm merged 33 commits intoelastic:mainfrom
Conversation
…/vitaliidm/kibana into alerts/im-memory-and-performance
…/vitaliidm/kibana into alerts/im-memory-and-performance
…-ref HEAD~1..HEAD --fix'
…/vitaliidm/kibana into alerts/im-memory-and-performance
|
Pinging @elastic/security-solution (Team: SecuritySolution) |
…/vitaliidm/kibana into alerts/im-memory-and-performance
| }, | ||
| }; | ||
|
|
||
| const threatResponse = await getThreatList({ |
There was a problem hiding this comment.
Should we here specify per page here (9k for example)?
Because looks like we query only 1k threats, but there can be more?
There was a problem hiding this comment.
this part basically replicates current behaviour in 8.6, where max of 1,000 threats is fetched before building the enrichment.
I don't know whether this was conscious decision to avoid loading large number of documents in memory(with potentially large number of fields depends on _source: [${threatIndicatorPath}.*, 'threat.feed.*'], in threat params) or just overlook.
There is some hard limit of 10,000 items per page though.
Do you think it would be safe to load 9,000 items in one request with no control on size of the document?
We do fetch 9,000 threats in one request on initial phase of the execution. But it returns only indicator mapping fields, which response is significantly smaller
There was a problem hiding this comment.
I think before it was like:
const calculatedPerPage = perPage ?? INDICATOR_PER_PAGE;
And perPage was 9000 I think
So, it worked before for this amount, at least I don't remember any client SDH about that.
Right now we can potentially miss some threat indicators for matches, and some alerts can't be without matches at all.
In this trade-off, I lean more toward keeping 9k as a limit.
There was a problem hiding this comment.
I think before it was like:
const calculatedPerPage = perPage ?? INDICATOR_PER_PAGE;
And perPage was 9000 I think
yes, that's where this part is coming from:
We do fetch 9,000 threats in one request on initial phase of the execution. But it returns only indicator mapping fields, which response is significantly smaller
It fetches, 9,000 batch of threat documents, but only indicator mapping fields. So, the size of each document is small.
But, when it fetches threats to make enrichments, it uses default page size as 1,000 as perPage is undefined in that call. And it returns, the whole section of document, defined in query parameters
_source: [${threatIndicatorPath}.*, 'threat.feed.*'],
for that particular call, which fetches threats for enrichments
So, this implementation is essentially kept in the same way as in 8.6.
While, I agree, it can miss a lot of potential threats my concern is around fetching of large number of documents.
Which could result:
- large memory consumption
- too large content of response that causes rule execution error. Corresponding issue
So, it worked before for this amount, at least I don't remember any client SDH about that.
It worked for 1,000 for threats first search
Although it seems, 9,000 threats should not cause the issue as in 150041
@marshallmain, what are you thoughts on this?
|
@elasticmachine merge upstream |
💚 Build Succeeded
Metrics [docs]
History
To update your PR or re-run it, just comment with: cc @vitaliidm |
marshallmain
left a comment
There was a problem hiding this comment.
I left a couple comments for follow up and longer term views on how we should handle threat match rules. The PR looks good to merge pending the comments in get_signals_map_from_threat_index.ts.
I still find the threat match code particularly tough to review due to the overall structure. The control flow is difficult to follow, names are not clear, and the overall design originally attempted to share the core searchAfterBulkCreate function with the query rule type, but there's now a ton of additional logic that was bolted on. I think splitting threat match rules away from searchAfterBulkCreate and making the logic independent of KQL rules would free us up to refactor more of the threat match rule type and make this a lot easier to follow. For event-first search, it seems to me that we won't need to make the extra queries in searchAfterBulkCreate at all since we already have the source events that searchAfterBulkCreate will fetch again.
@nkhristinin @vitaliidm I'd like to find some time after FF to discuss the code architecture and see if we can agree on a path forward.
|
|
||
| while ( | ||
| maxThreatsReachedMap.size < eventsCount && | ||
| (threatList ? threatList?.hits.hits.length > 0 : true) |
There was a problem hiding this comment.
This condition defaulting to true seems like an unnecessary risk where a bug in getThreatList could cause an infinite loop. Can we fetch the first threat list outside the loop and have the loop break if threatList is undefined?
| (threatList ? threatList?.hits.hits.length > 0 : true) | |
| threatList?.hits.hits.length > 0 |
| decodedQuery: ThreatMatchNamedQuery | ThreatTermNamedQuery; | ||
| }) => { | ||
| const signalMatch = signalsQueryMap.get(signalId); | ||
| if (!signalMatch) { |
There was a problem hiding this comment.
This if block looks redundant with the one on line 65 below?
|
|
||
| export type SignalsQueryMap = Map<string, ThreatMatchNamedQuery[]>; | ||
|
|
||
| interface GetSignalsMatchesFromThreatIndexOptions { |
There was a problem hiding this comment.
this interface name doesn't match the function name
| const values = Array.isArray(threatValue) ? threatValue : [threatValue]; | ||
|
|
||
| values.forEach((value) => { | ||
| if (value && signalValueMap) { |
There was a problem hiding this comment.
It appears that signalValueMap is required for this function to work correctly when the query against the threatList uses a terms query. However, the type system doesn't enforce that constraint so it would be easy to miss and accidentally use a terms query without providing the signalValueMap.
We should continue to work on ways to simplify the threat match rule logic so it's easier to maintain moving forward, and use the TS compiler to enforce requirements like this whenever possible.
There was a problem hiding this comment.
Good catch.
This part was adopted from the initial IM terms PR and integrated into this one.
The issue I see here, it's difficult to determine whether query has any terms clauses within it and whether they related to IM execution path. Also it can be determined only in a runtime environment, when all checks for index fields performed and it's clear terms query can be generated.
So, instead, I introduced parameter termsQueryAllowed, that would require from developer to set it to true when terms query can be used. Once its set, it will require signalValueMap to be passed as argument
| }); | ||
|
|
||
| // the third request return empty results | ||
| getThreatListMock.mockReturnValueOnce({ |
There was a problem hiding this comment.
I've found tests that mock ES responses are often brittle, either failing when they shouldn't or not failing when they should. It's fine to leave these in, but if we make changes to the implementation and find that these tests require a lot of maintenance it may be easier to remove them or refactor them so they don't rely on mock responses as much, e.g. by extracting the ES calls from the encoding logic and testing the encoding logic with unit tests and ES calls would be covered by integration tests.
There was a problem hiding this comment.
Thanks for highlighting these concerns.
I share the opinion, it might break too easy if implementation of getSignalsQueryMapFromThreatIndex changes.
On the other hand, I think not all the cases might be possible to cover easily by functional tests. For example if we want to cover processing of multiple result pages, we would need to populate test index with thousands of documents. So, it might be easier to mock supposed ES response. Especially, given the fact it unlikely changes in future.
I haven't yet encountered issue with ES mocks you mentioned
I've found tests that mock ES responses are often brittle, either failing when they shouldn't or not failing when they should.
But will pay more attention in future to those kind of tests
…0677) ## Summary - addresses feedback from #149208 - typings for `getSignalsQueryMapFromThreatIndex` - fixes interface name for `getSignalsQueryMapFromThreatIndex` - small code refactorings More details in comments of the initial PR --------- Co-authored-by: kibanamachine <42973632+kibanamachine@users.noreply.github.com>
…stic#150677) ## Summary - addresses feedback from elastic#149208 - typings for `getSignalsQueryMapFromThreatIndex` - fixes interface name for `getSignalsQueryMapFromThreatIndex` - small code refactorings More details in comments of the initial PR --------- Co-authored-by: kibanamachine <42973632+kibanamachine@users.noreply.github.com> (cherry picked from commit c08cdc8)
#150677) (#152426) # Backport This will backport the following commits from `main` to `8.7`: - [[Security Solution][Alerts] addresses IM performance PR feedback (#150677)](#150677) <!--- Backport version: 8.9.7 --> ### Questions ? Please refer to the [Backport tool documentation](https://github.com/sqren/backport) <!--BACKPORT [{"author":{"name":"Vitalii Dmyterko","email":"92328789+vitaliidm@users.noreply.github.com"},"sourceCommit":{"committedDate":"2023-03-01T09:46:23Z","message":"[Security Solution][Alerts] addresses IM performance PR feedback (#150677)\n\n## Summary\r\n\r\n- addresses feedback from https://github.com/elastic/kibana/pull/149208\r\n- typings for `getSignalsQueryMapFromThreatIndex`\r\n- fixes interface name for `getSignalsQueryMapFromThreatIndex`\r\n- small code refactorings\r\n\r\nMore details in comments of the initial PR\r\n\r\n---------\r\n\r\nCo-authored-by: kibanamachine <42973632+kibanamachine@users.noreply.github.com>","sha":"c08cdc8db633a02ac65d0940a26e5f5a86542d9b","branchLabelMapping":{"^v8.8.0$":"main","^v(\\d+).(\\d+).\\d+$":"$1.$2"}},"sourcePullRequest":{"labels":["release_note:skip","Team: SecuritySolution","Feature:Detection Alerts","Team:Detection Alerts","backport:prev-minor","v8.7.0","v8.8.0"],"number":150677,"url":"https://github.com/elastic/kibana/pull/150677","mergeCommit":{"message":"[Security Solution][Alerts] addresses IM performance PR feedback (#150677)\n\n## Summary\r\n\r\n- addresses feedback from https://github.com/elastic/kibana/pull/149208\r\n- typings for `getSignalsQueryMapFromThreatIndex`\r\n- fixes interface name for `getSignalsQueryMapFromThreatIndex`\r\n- small code refactorings\r\n\r\nMore details in comments of the initial PR\r\n\r\n---------\r\n\r\nCo-authored-by: kibanamachine <42973632+kibanamachine@users.noreply.github.com>","sha":"c08cdc8db633a02ac65d0940a26e5f5a86542d9b"}},"sourceBranch":"main","suggestedTargetBranches":["8.7"],"targetPullRequestStates":[{"branch":"8.7","label":"v8.7.0","labelRegex":"^v(\\d+).(\\d+).\\d+$","isSourceBranch":false,"state":"NOT_CREATED"},{"branch":"main","label":"v8.8.0","labelRegex":"^v8.8.0$","isSourceBranch":true,"state":"MERGED","url":"https://github.com/elastic/kibana/pull/150677","number":150677,"mergeCommit":{"message":"[Security Solution][Alerts] addresses IM performance PR feedback (#150677)\n\n## Summary\r\n\r\n- addresses feedback from https://github.com/elastic/kibana/pull/149208\r\n- typings for `getSignalsQueryMapFromThreatIndex`\r\n- fixes interface name for `getSignalsQueryMapFromThreatIndex`\r\n- small code refactorings\r\n\r\nMore details in comments of the initial PR\r\n\r\n---------\r\n\r\nCo-authored-by: kibanamachine <42973632+kibanamachine@users.noreply.github.com>","sha":"c08cdc8db633a02ac65d0940a26e5f5a86542d9b"}}]}] BACKPORT--> Co-authored-by: Vitalii Dmyterko <92328789+vitaliidm@users.noreply.github.com>
…stic#150677) ## Summary - addresses feedback from elastic#149208 - typings for `getSignalsQueryMapFromThreatIndex` - fixes interface name for `getSignalsQueryMapFromThreatIndex` - small code refactorings More details in comments of the initial PR --------- Co-authored-by: kibanamachine <42973632+kibanamachine@users.noreply.github.com>
Summary
it creates signals map right straight from threat results(in
getSignalsMatchesFromThreatIndex) and doesn't keep all threats in memory anymoreChecklist
Delete any items that are not applicable to this PR.
For maintainers