Terms query for Indicator Match rule#144511
Conversation
|
@elasticmachine merge upstream |
467f32d to
8da4787
Compare
a86e093 to
35667bf
Compare
|
@elasticmachine merge upstream |
2 similar comments
|
@elasticmachine merge upstream |
|
@elasticmachine merge upstream |
f13b255 to
0d0b15c
Compare
|
@elasticmachine merge upstream |
| : uniqueHits.map((signalHit) => ({ | ||
| signalId: signalHit._id, | ||
| queries: extractNamedQueries(signalHit), | ||
| queries: extractNamedQueries(signalHit) as ThreatMatchNamedQuery[], |
There was a problem hiding this comment.
why we started to need casting here? Can it be avoided?
There was a problem hiding this comment.
Thanks for notice!
Removed whole uniqueHits.map((signalHit) => ({ signalId: signalHit._id, queries: extractNamedQueries(signalHit), queries: extractNamedQueries(signalHit) as ThreatMatchNamedQuery[], sections, because it's not used anymore
| return result; | ||
| }; | ||
|
|
||
| // Return map of fields allowed for term query for source and threat indices |
There was a problem hiding this comment.
nit:
if wrap comment in JSDoc style, it will be highlighted in place where function is imported.
Will make slightly easier to read code
/**
* Return map of fields allowed for term query for source and threat indices
*/
There was a problem hiding this comment.
@nkhristinin, I've seen you've made this change in some of places, but not others. Would be great if it can be done for other methods for consistency.
| expect(valueMap).toEqual({}); | ||
| }); | ||
|
|
||
| it('return empy object if there some events but no fields', () => { |
There was a problem hiding this comment.
| it('return empy object if there some events but no fields', () => { | |
| it('return empty object if there some events but no fields', () => { |
|
|
||
| if (queryValues.length !== 4 || !queryValues.every(Boolean)) { | ||
| const queryString = JSON.stringify(query); | ||
| throw new Error(`Decoded query is invalid. Decoded value: ${queryString}`); |
There was a problem hiding this comment.
curious, why this check was removed?
|
|
||
| if (query.queryType === ThreatMatchQueryType.term) { | ||
| const threatValue = get(threatHit?._source, query.value); | ||
| // TODO: check types |
There was a problem hiding this comment.
does this comment refer to a case, when source's field value could be array?
So, it will be needed to find in that array matched value?
| const indicies = Object.values(indexMapping); | ||
| indicies.forEach((index) => { |
There was a problem hiding this comment.
| const indicies = Object.values(indexMapping); | |
| indicies.forEach((index) => { | |
| const indices = Object.values(indexMapping); | |
| indices.forEach((index) => { |
| indexMapping: IndicesGetFieldMappingResponse | ||
| ): { [key: string]: boolean } => { | ||
| const result: { [key: string]: boolean } = {}; | ||
| const notAllowedTypes: string[] = []; |
There was a problem hiding this comment.
why notAllowedTypes is needed? It looks like only check in allowedFieldTypes should be sufficient, because the only way type will be added into notAllowedTypes when it's not in allowedFieldTypes.
as per code below
if (allowedFieldTypes.includes(fieldType) && !notAllowedTypes.includes(fieldType)) {
result[field] = true;
} else {
notAllowedTypes.push(fieldType);
...
}
There was a problem hiding this comment.
Great catch.
Actually, the idea was different indices contain fields, and one of the mappings is not supported by term query we should remove this field, and not add it later.
I fixed the code and change notAllowedTypes to notAllowedFields
| await services.scopedClusterClient.asCurrentUser.indices.getFieldMapping({ | ||
| index: threatIndex, | ||
| fields: threatMatchedFields.threat, | ||
| }); |
There was a problem hiding this comment.
putting both requests in Promise.all would allow not wait until the first request is finished, to execute second one
| threatListItem, | ||
| entryKey, | ||
| }: CreateAndOrClausesOptions): BooleanFilter => { | ||
| }: CreateAndOrClausesOptions): unknown[] => { |
There was a problem hiding this comment.
why it becomes unknown? is there according type for should clause? Probably QueryDslQueryContainer?
| allowedFieldsForTermsQuery?.threat?.[entry.value] | ||
| ); | ||
| const combinedShould = threatMapping.reduce<{ | ||
| match: unknown[]; |
There was a problem hiding this comment.
Probably it should be QueryDslQueryContainer?
|
@elasticmachine merge upstream |
| }, | ||
| }; | ||
|
|
||
| const singnalValueMap = { |
There was a problem hiding this comment.
| const singnalValueMap = { | |
| const signalValueMap = { |
|
@elasticmachine merge upst |
💚 Build Succeeded
Metrics [docs]
History
To update your PR or re-run it, just comment with: |
Terms query for Indicator Match rule
TODO: [] need more unit/integrations tests, but ready for review
The indicator match rule will use terms query when it is possible to search for matches for threat-first-search and for events-first-search.
How the match query worked:
Example for threat-first-search.
If we have matching conditions like:
host.ip ==== indicator.host.ipor (source.name === indicator.source.nameANDhost.name === indicator.host.name)It will generate queries like:
Each match will also have
_namefields like:${threatId}_${threatIndex}_${threatFields}_${sourceField}So and because it's 1:1 relation between match and response, later at enrichment stage will be clear which threat matches which event.
Terms query.
We do fetch info about mapping for fields which use for match conditions of the IM rule.
Terms query doesn't support all field types, this is why there is some allowed list which field types.
Terms query not applied for AND conditions.
For example:
Fields types
host.ip -
ipuser.name -
keyworduser.description -
textindicator.host.ip_range -
ip_rangehost.ip === indicator.host.iporhost.ip_range === indicator.host.ipor (source.name === indicator.source.nameANDhost.name === indicator.host.name)It will generate queries like:
For terms query, we don't know which response matches with events, this is why we do match it back in the code.
Other changes
Threat-first-search - will do one extra request to have all matched threats.
For example:
The threat index has 1.000.000 documents.
IM rule gets the first batch of 9.000 threats and builds a query to the events index.
It returns 100 events (max_signal = 100).
Then it tries to enrich those 100 events with threat info.
The problem is that the original implementation will enrich with the only threats from this 9.000 batch.
And it will ignore other matches in 1.000.000 threats.
This way we do one extra request in the end from potential alerts to threat index.
Tests performance
In the best case, it can improve performance by around 3x times.
Base

Threat Indicators - 1.500.000 documents
Source - 1.000.000 documents.
1 field for match condition
This PR:
