Terms query for Indicator Match rule by nkhristinin · Pull Request #144511 · elastic/kibana

nkhristinin · 2022-11-03T10:46:15Z

Terms query for Indicator Match rule

TODO: [] need more unit/integrations tests, but ready for review

The indicator match rule will use terms query when it is possible to search for matches for threat-first-search and for events-first-search.

How the match query worked:

Example for threat-first-search.
If we have matching conditions like:
host.ip ==== indicator.host.ip or (source.name === indicator.source.name AND host.name === indicator.host.name)

It will generate queries like:

match: {host.ip: "1"},  
or
match: {host.ip: "2"}
or
match: {host.ip: "3"}
or
(match: {source.name: "1"} and match: {host.name: "1"})
or
(match: {source.name: "2"} and match: {host.name: "2"})
or
(match: {source.name: "3"} and match: {host.name: "3"})

Each match will also have _name fields like: ${threatId}_${threatIndex}_${threatFields}_${sourceField}
So and because it's 1:1 relation between match and response, later at enrichment stage will be clear which threat matches which event.

Terms query.

We do fetch info about mapping for fields which use for match conditions of the IM rule.
Terms query doesn't support all field types, this is why there is some allowed list which field types.
Terms query not applied for AND conditions.

For example:
Fields types

host.ip - ip
user.name - keyword
user.description - text
indicator.host.ip_range - ip_range

host.ip === indicator.host.ip or host.ip_range === indicator.host.ip or (source.name === indicator.source.name AND host.name === indicator.host.name)

It will generate queries like:

terms: {host.ip: ["1","2","3"]},  
or
match: {host.ip_range: "1"} // terms query support range fields, but it will be difficult later to understand which threat match which event, because we can have more than 1 response for this condition
or
match: {host.ip_range: "2"}
or
(match: {source.name: "1"} and match: {host.name: "1"})
or
(match: {source.name: "2"} and match: {host.name: "2"})
or
(match: {source.name: "3"} and match: {host.name: "3"})

For terms query, we don't know which response matches with events, this is why we do match it back in the code.

Other changes

Threat-first-search - will do one extra request to have all matched threats.
For example:
The threat index has 1.000.000 documents.
IM rule gets the first batch of 9.000 threats and builds a query to the events index.
It returns 100 events (max_signal = 100).
Then it tries to enrich those 100 events with threat info.
The problem is that the original implementation will enrich with the only threats from this 9.000 batch.
And it will ignore other matches in 1.000.000 threats.

This way we do one extra request in the end from potential alerts to threat index.

Tests performance

In the best case, it can improve performance by around 3x times.

Base
Threat Indicators - 1.500.000 documents
Source - 1.000.000 documents.
1 field for match condition

This PR:

nkhristinin · 2022-12-13T07:44:49Z

@elasticmachine merge upstream

nkhristinin · 2023-01-25T16:48:15Z

@elasticmachine merge upstream

nkhristinin · 2023-01-25T16:48:42Z

@elasticmachine merge upstream

nkhristinin · 2023-01-26T14:01:32Z

@elasticmachine merge upstream

nkhristinin · 2023-01-30T09:55:52Z

@elasticmachine merge upstream

vitaliidm · 2023-01-31T15:23:01Z

..._solution/server/lib/detection_engine/signals/threat_mapping/enrich_signal_threat_matches.ts

    : uniqueHits.map((signalHit) => ({
        signalId: signalHit._id,
-        queries: extractNamedQueries(signalHit),
+        queries: extractNamedQueries(signalHit) as ThreatMatchNamedQuery[],


why we started to need casting here? Can it be avoided?

Thanks for notice!

Removed whole uniqueHits.map((signalHit) => ({ signalId: signalHit._id, queries: extractNamedQueries(signalHit), queries: extractNamedQueries(signalHit) as ThreatMatchNamedQuery[], sections, because it's not used anymore

vitaliidm · 2023-01-31T15:29:52Z

...ion/server/lib/detection_engine/signals/threat_mapping/get_allowed_fields_for_terms_query.ts

+  return result;
+};
+
+// Return map of fields allowed for term query for source and threat indices


nit:
if wrap comment in JSDoc style, it will be highlighted in place where function is imported.
Will make slightly easier to read code

/** * Return map of fields allowed for term query for source and threat indices */

@nkhristinin, I've seen you've made this change in some of places, but not others. Would be great if it can be done for other methods for consistency.

vitaliidm · 2023-01-31T15:33:35Z

...k/plugins/security_solution/server/lib/detection_engine/signals/threat_mapping/utils.test.ts

+      expect(valueMap).toEqual({});
+    });
+
+    it('return empy object if there some events but no fields', () => {


Suggested change

it('return empy object if there some events but no fields', () => {

it('return empty object if there some events but no fields', () => {

vitaliidm · 2023-01-31T15:37:50Z

x-pack/plugins/security_solution/server/lib/detection_engine/signals/threat_mapping/utils.ts

-
-  if (queryValues.length !== 4 || !queryValues.every(Boolean)) {
-    const queryString = JSON.stringify(query);
-    throw new Error(`Decoded query is invalid. Decoded value: ${queryString}`);


curious, why this check was removed?

vitaliidm · 2023-01-31T16:36:43Z

..._solution/server/lib/detection_engine/signals/threat_mapping/enrich_signal_threat_matches.ts

+
+      if (query.queryType === ThreatMatchQueryType.term) {
+        const threatValue = get(threatHit?._source, query.value);
+        // TODO: check types


does this comment refer to a case, when source's field value could be array?
So, it will be needed to find in that array matched value?

vitaliidm · 2023-01-31T17:37:08Z

...ion/server/lib/detection_engine/signals/threat_mapping/get_allowed_fields_for_terms_query.ts

+  const indicies = Object.values(indexMapping);
+  indicies.forEach((index) => {


Suggested change

const indicies = Object.values(indexMapping);

indicies.forEach((index) => {

const indices = Object.values(indexMapping);

indices.forEach((index) => {

vitaliidm · 2023-01-31T17:42:14Z

...ion/server/lib/detection_engine/signals/threat_mapping/get_allowed_fields_for_terms_query.ts

+  indexMapping: IndicesGetFieldMappingResponse
+): { [key: string]: boolean } => {
+  const result: { [key: string]: boolean } = {};
+  const notAllowedTypes: string[] = [];


why notAllowedTypes is needed? It looks like only check in allowedFieldTypes should be sufficient, because the only way type will be added into notAllowedTypes when it's not in allowedFieldTypes.

as per code below

if (allowedFieldTypes.includes(fieldType) && !notAllowedTypes.includes(fieldType)) { result[field] = true; } else { notAllowedTypes.push(fieldType); ... }

Great catch.

Actually, the idea was different indices contain fields, and one of the mappings is not supported by term query we should remove this field, and not add it later.

I fixed the code and change notAllowedTypes to notAllowedFields

vitaliidm · 2023-01-31T17:53:40Z

...ion/server/lib/detection_engine/signals/threat_mapping/get_allowed_fields_for_terms_query.ts

+      await services.scopedClusterClient.asCurrentUser.indices.getFieldMapping({
+        index: threatIndex,
+        fields: threatMatchedFields.threat,
+      });


putting both requests in Promise.all would allow not wait until the first request is finished, to execute second one

vitaliidm · 2023-01-31T18:04:03Z

...y_solution/server/lib/detection_engine/signals/threat_mapping/build_threat_mapping_filter.ts

  threatListItem,
  entryKey,
-}: CreateAndOrClausesOptions): BooleanFilter => {
+}: CreateAndOrClausesOptions): unknown[] => {


why it becomes unknown? is there according type for should clause? Probably QueryDslQueryContainer?

vitaliidm · 2023-01-31T18:07:49Z

...y_solution/server/lib/detection_engine/signals/threat_mapping/build_threat_mapping_filter.ts

+        allowedFieldsForTermsQuery?.threat?.[entry.value]
+    );
+  const combinedShould = threatMapping.reduce<{
+    match: unknown[];


Probably it should be QueryDslQueryContainer?

nkhristinin · 2023-02-01T11:35:19Z

@elasticmachine merge upstream

vitaliidm · 2023-02-02T11:04:44Z

...tion/server/lib/detection_engine/signals/threat_mapping/enrich_signal_threat_matches.test.ts

+      },
+    };
+
+    const singnalValueMap = {


Suggested change

const singnalValueMap = {

const signalValueMap = {

nkhristinin · 2023-02-02T13:11:54Z

@elasticmachine merge upst

kibana-ci · 2023-02-02T15:06:38Z

💚 Build Succeeded

Metrics [docs]

✅ unchanged

History

💔 Build #105085 failed 05754fe
💔 Build #104999 failed add3153
💚 Build #104982 succeeded 6c44b70
💚 Build #104821 succeeded a4cca24
💔 Build #104591 failed 35b10b7

To update your PR or re-run it, just comment with:
@elasticmachine merge upstream

nkhristinin force-pushed the im-term-query branch from 467f32d to 8da4787 Compare December 19, 2022 15:41

nkhristinin added the ci:cloud-deploy Create or update a Cloud deployment label Jan 23, 2023

nkhristinin force-pushed the im-term-query branch from a86e093 to 35667bf Compare January 23, 2023 13:20

nkhristinin added 6 commits January 27, 2023 06:59

POC

e6057dc

IM rule query draft

3b37e11

remove console.log

8f54480

provide config for remove source and fields

88d0360

Fix rebae

a4f6aeb

Fix some tests

0d0b15c

nkhristinin force-pushed the im-term-query branch from f13b255 to 0d0b15c Compare January 27, 2023 06:07

Refactor and fix type/lint problems

f220503

Merge branch 'main' into im-term-query

291699b

nkhristinin changed the title ~~POC~~ Indicator match rule using terms query Jan 30, 2023

nkhristinin changed the title ~~Indicator match rule using terms query~~ Terms query for Indicator Match rule Jan 30, 2023

nkhristinin added 4 commits January 30, 2023 13:40

remove only

c2e9a19

Add unit tests for utils

246cc6b

Add more unit tests

d9a8c1a

Add more tests

139c534

nkhristinin marked this pull request as ready for review January 30, 2023 15:53

nkhristinin requested review from a team as code owners January 30, 2023 15:53

nkhristinin added the release_note:enhancement label Jan 30, 2023

Fix

55acfbd

marshallmain requested a review from vitaliidm January 31, 2023 04:54

Add tests for terms query

cc3a499

vitaliidm reviewed Jan 31, 2023

View reviewed changes

nkhristinin added 6 commits February 1, 2023 10:58

Make signalMatchesArg required param

a8a5ce5

Add check back for decoded query

614f521

Fix getAllowedFieldForTermQueryFromMapping

7409a7b

typi

af56a29

Promise.all

5db30e3

Fix type declarations

84f8c8b

kibanamachine and others added 3 commits February 1, 2023 06:35

Merge branch 'main' into im-term-query

35b10b7

Fix tests

028ddf1

Remove unused mock

a4cca24

nkhristinin requested a review from pmuellr February 1, 2023 19:25

nkhristinin added 2 commits February 2, 2023 08:50

Change types to common

6c44b70

Fix threat array values

add3153

vitaliidm reviewed Feb 2, 2023

View reviewed changes

Fix type and typo

05754fe

vitaliidm approved these changes Feb 2, 2023

View reviewed changes

nkhristinin added 2 commits February 2, 2023 15:34

Migrate to jsdoc and fix test

2f09d69

migrate types to records

c5ff4eb

nkhristinin merged commit b9488a0 into elastic:main Feb 2, 2023

kibanamachine added v8.7.0 backport:skip This PR does not require backporting labels Feb 2, 2023

vitaliidm mentioned this pull request Feb 9, 2023

[Security Solution][Alerts] improves IM rule memory usage and performance #149208

Merged

2 tasks

	it('return empy object if there some events but no fields', () => {
	it('return empty object if there some events but no fields', () => {

		const indicies = Object.values(indexMapping);
		indicies.forEach((index) => {

Conversation

nkhristinin commented Nov 3, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!