Skip to content

Terms query for Indicator Match rule#144511

Merged
nkhristinin merged 30 commits intoelastic:mainfrom
nkhristinin:im-term-query
Feb 2, 2023
Merged

Terms query for Indicator Match rule#144511
nkhristinin merged 30 commits intoelastic:mainfrom
nkhristinin:im-term-query

Conversation

@nkhristinin
Copy link
Contributor

@nkhristinin nkhristinin commented Nov 3, 2022

Terms query for Indicator Match rule

TODO: [] need more unit/integrations tests, but ready for review

The indicator match rule will use terms query when it is possible to search for matches for threat-first-search and for events-first-search.

How the match query worked:

Example for threat-first-search.
If we have matching conditions like:
host.ip ==== indicator.host.ip or (source.name === indicator.source.name AND host.name === indicator.host.name)

It will generate queries like:

match: {host.ip: "1"},  
or
match: {host.ip: "2"}
or
match: {host.ip: "3"}
or
(match: {source.name: "1"} and match: {host.name: "1"})
or
(match: {source.name: "2"} and match: {host.name: "2"})
or
(match: {source.name: "3"} and match: {host.name: "3"})

Each match will also have _name fields like: ${threatId}_${threatIndex}_${threatFields}_${sourceField}
So and because it's 1:1 relation between match and response, later at enrichment stage will be clear which threat matches which event.

Terms query.

We do fetch info about mapping for fields which use for match conditions of the IM rule.
Terms query doesn't support all field types, this is why there is some allowed list which field types.
Terms query not applied for AND conditions.

For example:
Fields types

host.ip - ip
user.name - keyword
user.description - text
indicator.host.ip_range - ip_range

host.ip === indicator.host.ip or host.ip_range === indicator.host.ip or (source.name === indicator.source.name AND host.name === indicator.host.name)

It will generate queries like:

terms: {host.ip: ["1","2","3"]},  
or
match: {host.ip_range: "1"} // terms query support range fields, but it will be difficult later to understand which threat match which event, because we can have more than 1 response for this condition
or
match: {host.ip_range: "2"}
or
(match: {source.name: "1"} and match: {host.name: "1"})
or
(match: {source.name: "2"} and match: {host.name: "2"})
or
(match: {source.name: "3"} and match: {host.name: "3"})

For terms query, we don't know which response matches with events, this is why we do match it back in the code.

Other changes

Threat-first-search - will do one extra request to have all matched threats.
For example:
The threat index has 1.000.000 documents.
IM rule gets the first batch of 9.000 threats and builds a query to the events index.
It returns 100 events (max_signal = 100).
Then it tries to enrich those 100 events with threat info.
The problem is that the original implementation will enrich with the only threats from this 9.000 batch.
And it will ignore other matches in 1.000.000 threats.

This way we do one extra request in the end from potential alerts to threat index.

Tests performance

In the best case, it can improve performance by around 3x times.

Base
Threat Indicators - 1.500.000 documents
Source - 1.000.000 documents.
1 field for match condition
213484531-3ab68c61-c3f5-4e28-b2c4-c1e90a5b1775

This PR:
Screenshot 2023-01-30 at 20 20 32

@nkhristinin
Copy link
Contributor Author

@elasticmachine merge upstream

@nkhristinin nkhristinin added the ci:cloud-deploy Create or update a Cloud deployment label Jan 23, 2023
@nkhristinin
Copy link
Contributor Author

@elasticmachine merge upstream

2 similar comments
@nkhristinin
Copy link
Contributor Author

@elasticmachine merge upstream

@nkhristinin
Copy link
Contributor Author

@elasticmachine merge upstream

@nkhristinin
Copy link
Contributor Author

@elasticmachine merge upstream

@nkhristinin nkhristinin changed the title POC Indicator match rule using terms query Jan 30, 2023
@nkhristinin nkhristinin changed the title Indicator match rule using terms query Terms query for Indicator Match rule Jan 30, 2023
@nkhristinin nkhristinin marked this pull request as ready for review January 30, 2023 15:53
@nkhristinin nkhristinin requested review from a team as code owners January 30, 2023 15:53
: uniqueHits.map((signalHit) => ({
signalId: signalHit._id,
queries: extractNamedQueries(signalHit),
queries: extractNamedQueries(signalHit) as ThreatMatchNamedQuery[],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why we started to need casting here? Can it be avoided?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for notice!

Removed whole uniqueHits.map((signalHit) => ({ signalId: signalHit._id, queries: extractNamedQueries(signalHit), queries: extractNamedQueries(signalHit) as ThreatMatchNamedQuery[], sections, because it's not used anymore

return result;
};

// Return map of fields allowed for term query for source and threat indices
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit:
if wrap comment in JSDoc style, it will be highlighted in place where function is imported.
Will make slightly easier to read code

/**
 * Return map of fields allowed for term query for source and threat indices
 */

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nkhristinin, I've seen you've made this change in some of places, but not others. Would be great if it can be done for other methods for consistency.

expect(valueMap).toEqual({});
});

it('return empy object if there some events but no fields', () => {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
it('return empy object if there some events but no fields', () => {
it('return empty object if there some events but no fields', () => {


if (queryValues.length !== 4 || !queryValues.every(Boolean)) {
const queryString = JSON.stringify(query);
throw new Error(`Decoded query is invalid. Decoded value: ${queryString}`);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

curious, why this check was removed?


if (query.queryType === ThreatMatchQueryType.term) {
const threatValue = get(threatHit?._source, query.value);
// TODO: check types
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does this comment refer to a case, when source's field value could be array?
So, it will be needed to find in that array matched value?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed it

Comment on lines +20 to +21
const indicies = Object.values(indexMapping);
indicies.forEach((index) => {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
const indicies = Object.values(indexMapping);
indicies.forEach((index) => {
const indices = Object.values(indexMapping);
indices.forEach((index) => {

indexMapping: IndicesGetFieldMappingResponse
): { [key: string]: boolean } => {
const result: { [key: string]: boolean } = {};
const notAllowedTypes: string[] = [];
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why notAllowedTypes is needed? It looks like only check in allowedFieldTypes should be sufficient, because the only way type will be added into notAllowedTypes when it's not in allowedFieldTypes.

as per code below

        if (allowedFieldTypes.includes(fieldType) && !notAllowedTypes.includes(fieldType)) {
          result[field] = true;
        } else {
          notAllowedTypes.push(fieldType);
         ...
        }

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great catch.

Actually, the idea was different indices contain fields, and one of the mappings is not supported by term query we should remove this field, and not add it later.

I fixed the code and change notAllowedTypes to notAllowedFields

await services.scopedClusterClient.asCurrentUser.indices.getFieldMapping({
index: threatIndex,
fields: threatMatchedFields.threat,
});
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

putting both requests in Promise.all would allow not wait until the first request is finished, to execute second one

threatListItem,
entryKey,
}: CreateAndOrClausesOptions): BooleanFilter => {
}: CreateAndOrClausesOptions): unknown[] => {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why it becomes unknown? is there according type for should clause? Probably QueryDslQueryContainer?

allowedFieldsForTermsQuery?.threat?.[entry.value]
);
const combinedShould = threatMapping.reduce<{
match: unknown[];
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably it should be QueryDslQueryContainer?

@nkhristinin
Copy link
Contributor Author

@elasticmachine merge upstream

@nkhristinin nkhristinin requested a review from pmuellr February 1, 2023 19:25
},
};

const singnalValueMap = {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
const singnalValueMap = {
const signalValueMap = {

@nkhristinin
Copy link
Contributor Author

@elasticmachine merge upst

@kibana-ci
Copy link

kibana-ci commented Feb 2, 2023

💚 Build Succeeded

Metrics [docs]

✅ unchanged

History

To update your PR or re-run it, just comment with:
@elasticmachine merge upstream

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backport:skip This PR does not require backporting ci:cloud-deploy Create or update a Cloud deployment release_note:enhancement v8.7.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants