[Security Solution][Detection Engine] adds async ES|QL query#216667
[Security Solution][Detection Engine] adds async ES|QL query#216667vitaliidm merged 30 commits intoelastic:mainfrom
Conversation
… into de_9_1/async_esql
… into de_9_1/async_esql
…apper abort controller
|
Pinging @elastic/security-solution (Team: SecuritySolution) |
|
Pinging @elastic/security-detection-engine (Team:Detection Engine) |
| description: i18n.ESQL_SEARCH_REQUEST_DESCRIPTION, | ||
| }); | ||
| const asyncSearchStarted = performance.now(); | ||
| const asyncEsqlResponse = await esClient.transport.request<AsyncEsqlResponse>({ |
There was a problem hiding this comment.
why use transport.request instead of .esql?
There was a problem hiding this comment.
I have noticed esql method returns not correct typings. So decided to stick to transport.request as in previous implmentation
| requestBody: Record<string, unknown>; | ||
| requestBody: { | ||
| query: string; | ||
| filter: QueryDslQueryContainer; |
There was a problem hiding this comment.
wait_for_completion_timeout and keep_alive should probably be params here too
There was a problem hiding this comment.
I haven't added them, since they were not used.
Added them now
There was a problem hiding this comment.
turned out, these values applicable to body only, so I removed them from this type
| if (isCancelled) { | ||
| throw new Error('Rule execution cancelled due to timeout'); | ||
| } | ||
| await new Promise((resolve) => setTimeout(resolve, pollInterval)); |
There was a problem hiding this comment.
probably want to wait at the beginning of the loop so we wait in between the initial response and the first poll
| filter: requestFilter, | ||
| }, | ||
| }, | ||
| wait_for_completion_timeout: '4m', // hard limit request timeout is 5m set by ES proxy and alerting framework. So, we should be fine to wait 4m for async query completion. If rule execution is shorter than 4m and query was not completed, it will be aborted. |
There was a problem hiding this comment.
setting wait_for_completion_timeout this high makes this effectively a synchronous query. Should this be keep_alive instead? Some keep_alive value longer than the rule timeout would be sufficient. I think we want wait_for_completion_timeout to be some number of seconds, like 5 or 10 seconds.
Setting keep_alive will help ensure that the results are deleted quickly even if the cleanup DELETE request fails.
There was a problem hiding this comment.
Yes, the idea was not to start polling until rule timeouts. If rule timeout is greater then 5m, when we can hit ES requests timeout limitation, we would stop waiting for query to complete(4m) and start polling.
Setting keep_alive will help ensure that the results are deleted quickly even if the cleanup DELETE request fails.
We don't have access to rule timeout within executor - only to shouldStopExecution. Which can't be used to set up keep_alive beforehand. I am not sure we want to expose this to executor and in future to rely on that value while running rule.
There was a problem hiding this comment.
I see, that makes sense. Is 4m the right value for both ECH and serverless? The default rule timeout is different for serverless, right? Is the default connection timeout different?
I think it would be worth following up on this PR to see if we can get access to the rule timeout or maybe have the framework wrapper inject a keep_alive value depending on the rule timeout. There could be some use case eventually for async requests initiated by one rule execution and retrieved by a later one, so maybe not every async request needs to have keep_alive == rule timeout, but I think it's the typical scenario for us. It'll just help with system resilience to set it since there could be hundreds of ESQL rules running every few minutes and we'll never need those results for 5 days.
There was a problem hiding this comment.
I see, that makes sense. Is 4m the right value for both ECH and serverless? The default rule timeout is different for serverless, right? Is the default connection timeout different?
Default rule timeout for serverless is 1m. If async query request runs longer that this, it would be aborted and query cancelled/deleted in ES.
It's 5m for ECH. If query takes longer than this to finish, it would be cancelled and deleted as well
There was a problem hiding this comment.
@marshallmain , issue to expose rule timeout to rule executor: #218072
There was a problem hiding this comment.
Sounds good, we'll just have to be aware of any timeout discrepancies in the future. If the rule timeout increases in serverless (which is a desirable change for us) but the connection timeout values are different (e.g. lower in serverless), we could start seeing environment specific failures.
There was a problem hiding this comment.
I think errors would be in the same category - timeout related. Can be just for different requests(query or poll) and different times. But, we already have different timeout for different envs
rylnd
left a comment
There was a problem hiding this comment.
Reviewed previously, but gave it another pass since I was still listed as blocking merge.
LGTM.
...curity/plugins/security_solution/server/lib/detection_engine/rule_types/esql/esql_request.ts
Outdated
Show resolved
Hide resolved
| import { logEsqlRequest } from '../utils/logged_requests'; | ||
| import * as i18n from '../translations'; | ||
|
|
||
| const logDuration = (startTime: number, loggedRequests: RulePreviewLoggedRequest[] | undefined) => { |
There was a problem hiding this comment.
Nit: would this more accurately be something like
| const logDuration = (startTime: number, loggedRequests: RulePreviewLoggedRequest[] | undefined) => { | |
| const setLatestRequestDuration = (startTime: number, loggedRequests: RulePreviewLoggedRequest[] | undefined) => { |
Since this function seems to only permute loggedRequests, and doesn't actually "log" anything itself?
…/detection_engine/rule_types/esql/esql_request.ts Co-authored-by: Ryland Herrick <ryalnd@gmail.com>
💛 Build succeeded, but was flaky
Failed CI StepsTest FailuresMetrics [docs]Public APIs missing comments
History
cc @vitaliidm |
|
Starting backport for target branches: 8.18, 8.19, 9.0 https://github.com/elastic/kibana/actions/runs/14517920458 |
…#216667) ## Summary - addresses elastic/security-team#11116 (list item 2) Introducing async query would allow to overcome ES request timeout for long running rules and queries. Timeout for ES request is [defined in alerting framework](https://github.com/elastic/kibana/blob/8.18/x-pack/platform/plugins/shared/alerting/server/lib/get_es_request_timeout.ts#L21) and is smaller value out of rule execution timeout or default ES request timeout(which is 5m and hardcoded [here](https://github.com/elastic/kibana/blob/8.18/x-pack/platform/plugins/shared/alerting/server/lib/get_rule_task_timeout.ts)). If ES|QL rule performs a single long-running ES query, it can time out after 5m due to this ES request timeout. This value can't be changed, unlike rule execution timeout. It can be overwritten in Kibana config ``` xpack.alerting.rules.run: timeout: '10m' ruleTypeOverrides: - id: 'siem.esqlRule' timeout: '15m' ``` So, we can encounter situations when rule fails execution after 5m due to ES request timeout, despite a fact it configured with longer timeout of 15m By using async query, we can overcome this limitation and can poll async query results until it completes or rule timeouts More details in internal [issue](elastic/sdh-security-team#1224) --------- Co-authored-by: Ryland Herrick <ryalnd@gmail.com> (cherry picked from commit 3d7aac1)
💔 Some backports could not be created
Note: Successful backport PRs will be merged automatically after passing CI. Manual backportTo create the backport manually run: Questions ?Please refer to the Backport tool documentation |
…#216667) ## Summary - addresses elastic/security-team#11116 (list item 2) Introducing async query would allow to overcome ES request timeout for long running rules and queries. Timeout for ES request is [defined in alerting framework](https://github.com/elastic/kibana/blob/8.18/x-pack/platform/plugins/shared/alerting/server/lib/get_es_request_timeout.ts#L21) and is smaller value out of rule execution timeout or default ES request timeout(which is 5m and hardcoded [here](https://github.com/elastic/kibana/blob/8.18/x-pack/platform/plugins/shared/alerting/server/lib/get_rule_task_timeout.ts)). If ES|QL rule performs a single long-running ES query, it can time out after 5m due to this ES request timeout. This value can't be changed, unlike rule execution timeout. It can be overwritten in Kibana config ``` xpack.alerting.rules.run: timeout: '10m' ruleTypeOverrides: - id: 'siem.esqlRule' timeout: '15m' ``` So, we can encounter situations when rule fails execution after 5m due to ES request timeout, despite a fact it configured with longer timeout of 15m By using async query, we can overcome this limitation and can poll async query results until it completes or rule timeouts More details in internal [issue](elastic/sdh-security-team#1224) --------- Co-authored-by: Ryland Herrick <ryalnd@gmail.com> (cherry picked from commit 3d7aac1) # Conflicts: # x-pack/solutions/security/plugins/security_solution/server/lib/detection_engine/rule_preview/api/preview_rules/route.ts # x-pack/solutions/security/plugins/security_solution/server/lib/detection_engine/rule_types/esql/esql.ts
💚 All backports created successfully
Note: Successful backport PRs will be merged automatically after passing CI. Questions ?Please refer to the Backport tool documentation |
…#216667) ## Summary - addresses elastic/security-team#11116 (list item 2) Introducing async query would allow to overcome ES request timeout for long running rules and queries. Timeout for ES request is [defined in alerting framework](https://github.com/elastic/kibana/blob/8.18/x-pack/platform/plugins/shared/alerting/server/lib/get_es_request_timeout.ts#L21) and is smaller value out of rule execution timeout or default ES request timeout(which is 5m and hardcoded [here](https://github.com/elastic/kibana/blob/8.18/x-pack/platform/plugins/shared/alerting/server/lib/get_rule_task_timeout.ts)). If ES|QL rule performs a single long-running ES query, it can time out after 5m due to this ES request timeout. This value can't be changed, unlike rule execution timeout. It can be overwritten in Kibana config ``` xpack.alerting.rules.run: timeout: '10m' ruleTypeOverrides: - id: 'siem.esqlRule' timeout: '15m' ``` So, we can encounter situations when rule fails execution after 5m due to ES request timeout, despite a fact it configured with longer timeout of 15m By using async query, we can overcome this limitation and can poll async query results until it completes or rule timeouts More details in internal [issue](elastic/sdh-security-team#1224) --------- Co-authored-by: Ryland Herrick <ryalnd@gmail.com> (cherry picked from commit 3d7aac1) # Conflicts: # x-pack/solutions/security/plugins/security_solution/server/lib/detection_engine/rule_preview/api/preview_rules/route.ts # x-pack/solutions/security/plugins/security_solution/server/lib/detection_engine/rule_types/esql/esql.ts
…216667) (#218567) # Backport This will backport the following commits from `main` to `8.19`: - [[Security Solution][Detection Engine] adds async ES|QL query (#216667)](#216667) <!--- Backport version: 9.6.6 --> ### Questions ? Please refer to the [Backport tool documentation](https://github.com/sorenlouv/backport) <!--BACKPORT [{"author":{"name":"Vitalii Dmyterko","email":"92328789+vitaliidm@users.noreply.github.com"},"sourceCommit":{"committedDate":"2025-04-17T14:23:07Z","message":"[Security Solution][Detection Engine] adds async ES|QL query (#216667)\n\n## Summary\n\n- addresses elastic/security-team#11116 (list\nitem 2)\n\nIntroducing async query would allow to overcome ES request timeout for\nlong running rules and queries.\n\nTimeout for ES request is [defined in alerting\nframework](https://github.com/elastic/kibana/blob/8.18/x-pack/platform/plugins/shared/alerting/server/lib/get_es_request_timeout.ts#L21)\nand is smaller value out of rule execution timeout or default ES request\ntimeout(which is 5m and hardcoded\n[here](https://github.com/elastic/kibana/blob/8.18/x-pack/platform/plugins/shared/alerting/server/lib/get_rule_task_timeout.ts)).\n\nIf ES|QL rule performs a single long-running ES query, it can time out\nafter 5m due to this ES request timeout. This value can't be changed,\nunlike rule execution timeout. It can be overwritten in Kibana config\n\n```\nxpack.alerting.rules.run:\n timeout: '10m'\n ruleTypeOverrides:\n - id: 'siem.esqlRule'\n timeout: '15m'\n```\nSo, we can encounter situations when rule fails execution after 5m due\nto ES request timeout, despite a fact it configured with longer timeout\nof 15m\n\nBy using async query, we can overcome this limitation and can poll async\nquery results until it completes or rule timeouts\n\nMore details in internal\n[issue](https://github.com/elastic/sdh-security-team/issues/1224)\n\n---------\n\nCo-authored-by: Ryland Herrick <ryalnd@gmail.com>","sha":"3d7aac1a443092ebdbc20fbd9345d373bcb16c48","branchLabelMapping":{"^v9.1.0$":"main","^v(\\d+).(\\d+).\\d+$":"$1.$2"}},"sourcePullRequest":{"labels":["release_note:skip","Team: SecuritySolution","Team:Detection Engine","backport:version","v9.1.0","v8.19.0","v8.18.1","v9.0.1"],"title":"[Security Solution][Detection Engine] adds async ES|QL query","number":216667,"url":"https://github.com/elastic/kibana/pull/216667","mergeCommit":{"message":"[Security Solution][Detection Engine] adds async ES|QL query (#216667)\n\n## Summary\n\n- addresses elastic/security-team#11116 (list\nitem 2)\n\nIntroducing async query would allow to overcome ES request timeout for\nlong running rules and queries.\n\nTimeout for ES request is [defined in alerting\nframework](https://github.com/elastic/kibana/blob/8.18/x-pack/platform/plugins/shared/alerting/server/lib/get_es_request_timeout.ts#L21)\nand is smaller value out of rule execution timeout or default ES request\ntimeout(which is 5m and hardcoded\n[here](https://github.com/elastic/kibana/blob/8.18/x-pack/platform/plugins/shared/alerting/server/lib/get_rule_task_timeout.ts)).\n\nIf ES|QL rule performs a single long-running ES query, it can time out\nafter 5m due to this ES request timeout. This value can't be changed,\nunlike rule execution timeout. It can be overwritten in Kibana config\n\n```\nxpack.alerting.rules.run:\n timeout: '10m'\n ruleTypeOverrides:\n - id: 'siem.esqlRule'\n timeout: '15m'\n```\nSo, we can encounter situations when rule fails execution after 5m due\nto ES request timeout, despite a fact it configured with longer timeout\nof 15m\n\nBy using async query, we can overcome this limitation and can poll async\nquery results until it completes or rule timeouts\n\nMore details in internal\n[issue](https://github.com/elastic/sdh-security-team/issues/1224)\n\n---------\n\nCo-authored-by: Ryland Herrick <ryalnd@gmail.com>","sha":"3d7aac1a443092ebdbc20fbd9345d373bcb16c48"}},"sourceBranch":"main","suggestedTargetBranches":["8.19","8.18","9.0"],"targetPullRequestStates":[{"branch":"main","label":"v9.1.0","branchLabelMappingKey":"^v9.1.0$","isSourceBranch":true,"state":"MERGED","url":"https://github.com/elastic/kibana/pull/216667","number":216667,"mergeCommit":{"message":"[Security Solution][Detection Engine] adds async ES|QL query (#216667)\n\n## Summary\n\n- addresses elastic/security-team#11116 (list\nitem 2)\n\nIntroducing async query would allow to overcome ES request timeout for\nlong running rules and queries.\n\nTimeout for ES request is [defined in alerting\nframework](https://github.com/elastic/kibana/blob/8.18/x-pack/platform/plugins/shared/alerting/server/lib/get_es_request_timeout.ts#L21)\nand is smaller value out of rule execution timeout or default ES request\ntimeout(which is 5m and hardcoded\n[here](https://github.com/elastic/kibana/blob/8.18/x-pack/platform/plugins/shared/alerting/server/lib/get_rule_task_timeout.ts)).\n\nIf ES|QL rule performs a single long-running ES query, it can time out\nafter 5m due to this ES request timeout. This value can't be changed,\nunlike rule execution timeout. It can be overwritten in Kibana config\n\n```\nxpack.alerting.rules.run:\n timeout: '10m'\n ruleTypeOverrides:\n - id: 'siem.esqlRule'\n timeout: '15m'\n```\nSo, we can encounter situations when rule fails execution after 5m due\nto ES request timeout, despite a fact it configured with longer timeout\nof 15m\n\nBy using async query, we can overcome this limitation and can poll async\nquery results until it completes or rule timeouts\n\nMore details in internal\n[issue](https://github.com/elastic/sdh-security-team/issues/1224)\n\n---------\n\nCo-authored-by: Ryland Herrick <ryalnd@gmail.com>","sha":"3d7aac1a443092ebdbc20fbd9345d373bcb16c48"}},{"branch":"8.19","label":"v8.19.0","branchLabelMappingKey":"^v(\\d+).(\\d+).\\d+$","isSourceBranch":false,"state":"NOT_CREATED"},{"branch":"8.18","label":"v8.18.1","branchLabelMappingKey":"^v(\\d+).(\\d+).\\d+$","isSourceBranch":false,"state":"NOT_CREATED"},{"branch":"9.0","label":"v9.0.1","branchLabelMappingKey":"^v(\\d+).(\\d+).\\d+$","isSourceBranch":false,"state":"NOT_CREATED"}]}] BACKPORT--> Co-authored-by: Vitalii Dmyterko <92328789+vitaliidm@users.noreply.github.com>
…216667) (#218583) # Backport This will backport the following commits from `main` to `9.0`: - [[Security Solution][Detection Engine] adds async ES|QL query (#216667)](#216667) <!--- Backport version: 9.6.6 --> ### Questions ? Please refer to the [Backport tool documentation](https://github.com/sorenlouv/backport) <!--BACKPORT [{"author":{"name":"Vitalii Dmyterko","email":"92328789+vitaliidm@users.noreply.github.com"},"sourceCommit":{"committedDate":"2025-04-17T14:23:07Z","message":"[Security Solution][Detection Engine] adds async ES|QL query (#216667)\n\n## Summary\n\n- addresses elastic/security-team#11116 (list\nitem 2)\n\nIntroducing async query would allow to overcome ES request timeout for\nlong running rules and queries.\n\nTimeout for ES request is [defined in alerting\nframework](https://github.com/elastic/kibana/blob/8.18/x-pack/platform/plugins/shared/alerting/server/lib/get_es_request_timeout.ts#L21)\nand is smaller value out of rule execution timeout or default ES request\ntimeout(which is 5m and hardcoded\n[here](https://github.com/elastic/kibana/blob/8.18/x-pack/platform/plugins/shared/alerting/server/lib/get_rule_task_timeout.ts)).\n\nIf ES|QL rule performs a single long-running ES query, it can time out\nafter 5m due to this ES request timeout. This value can't be changed,\nunlike rule execution timeout. It can be overwritten in Kibana config\n\n```\nxpack.alerting.rules.run:\n timeout: '10m'\n ruleTypeOverrides:\n - id: 'siem.esqlRule'\n timeout: '15m'\n```\nSo, we can encounter situations when rule fails execution after 5m due\nto ES request timeout, despite a fact it configured with longer timeout\nof 15m\n\nBy using async query, we can overcome this limitation and can poll async\nquery results until it completes or rule timeouts\n\nMore details in internal\n[issue](https://github.com/elastic/sdh-security-team/issues/1224)\n\n---------\n\nCo-authored-by: Ryland Herrick <ryalnd@gmail.com>","sha":"3d7aac1a443092ebdbc20fbd9345d373bcb16c48","branchLabelMapping":{"^v9.1.0$":"main","^v(\\d+).(\\d+).\\d+$":"$1.$2"}},"sourcePullRequest":{"labels":["release_note:skip","Team: SecuritySolution","Team:Detection Engine","backport:version","v9.1.0","v8.19.0","v8.18.1","v9.0.1"],"title":"[Security Solution][Detection Engine] adds async ES|QL query","number":216667,"url":"https://github.com/elastic/kibana/pull/216667","mergeCommit":{"message":"[Security Solution][Detection Engine] adds async ES|QL query (#216667)\n\n## Summary\n\n- addresses elastic/security-team#11116 (list\nitem 2)\n\nIntroducing async query would allow to overcome ES request timeout for\nlong running rules and queries.\n\nTimeout for ES request is [defined in alerting\nframework](https://github.com/elastic/kibana/blob/8.18/x-pack/platform/plugins/shared/alerting/server/lib/get_es_request_timeout.ts#L21)\nand is smaller value out of rule execution timeout or default ES request\ntimeout(which is 5m and hardcoded\n[here](https://github.com/elastic/kibana/blob/8.18/x-pack/platform/plugins/shared/alerting/server/lib/get_rule_task_timeout.ts)).\n\nIf ES|QL rule performs a single long-running ES query, it can time out\nafter 5m due to this ES request timeout. This value can't be changed,\nunlike rule execution timeout. It can be overwritten in Kibana config\n\n```\nxpack.alerting.rules.run:\n timeout: '10m'\n ruleTypeOverrides:\n - id: 'siem.esqlRule'\n timeout: '15m'\n```\nSo, we can encounter situations when rule fails execution after 5m due\nto ES request timeout, despite a fact it configured with longer timeout\nof 15m\n\nBy using async query, we can overcome this limitation and can poll async\nquery results until it completes or rule timeouts\n\nMore details in internal\n[issue](https://github.com/elastic/sdh-security-team/issues/1224)\n\n---------\n\nCo-authored-by: Ryland Herrick <ryalnd@gmail.com>","sha":"3d7aac1a443092ebdbc20fbd9345d373bcb16c48"}},"sourceBranch":"main","suggestedTargetBranches":["8.18","9.0"],"targetPullRequestStates":[{"branch":"main","label":"v9.1.0","branchLabelMappingKey":"^v9.1.0$","isSourceBranch":true,"state":"MERGED","url":"https://github.com/elastic/kibana/pull/216667","number":216667,"mergeCommit":{"message":"[Security Solution][Detection Engine] adds async ES|QL query (#216667)\n\n## Summary\n\n- addresses elastic/security-team#11116 (list\nitem 2)\n\nIntroducing async query would allow to overcome ES request timeout for\nlong running rules and queries.\n\nTimeout for ES request is [defined in alerting\nframework](https://github.com/elastic/kibana/blob/8.18/x-pack/platform/plugins/shared/alerting/server/lib/get_es_request_timeout.ts#L21)\nand is smaller value out of rule execution timeout or default ES request\ntimeout(which is 5m and hardcoded\n[here](https://github.com/elastic/kibana/blob/8.18/x-pack/platform/plugins/shared/alerting/server/lib/get_rule_task_timeout.ts)).\n\nIf ES|QL rule performs a single long-running ES query, it can time out\nafter 5m due to this ES request timeout. This value can't be changed,\nunlike rule execution timeout. It can be overwritten in Kibana config\n\n```\nxpack.alerting.rules.run:\n timeout: '10m'\n ruleTypeOverrides:\n - id: 'siem.esqlRule'\n timeout: '15m'\n```\nSo, we can encounter situations when rule fails execution after 5m due\nto ES request timeout, despite a fact it configured with longer timeout\nof 15m\n\nBy using async query, we can overcome this limitation and can poll async\nquery results until it completes or rule timeouts\n\nMore details in internal\n[issue](https://github.com/elastic/sdh-security-team/issues/1224)\n\n---------\n\nCo-authored-by: Ryland Herrick <ryalnd@gmail.com>","sha":"3d7aac1a443092ebdbc20fbd9345d373bcb16c48"}},{"branch":"8.19","label":"v8.19.0","branchLabelMappingKey":"^v(\\d+).(\\d+).\\d+$","isSourceBranch":false,"url":"https://github.com/elastic/kibana/pull/218567","number":218567,"state":"OPEN"},{"branch":"8.18","label":"v8.18.1","branchLabelMappingKey":"^v(\\d+).(\\d+).\\d+$","isSourceBranch":false,"state":"NOT_CREATED"},{"branch":"9.0","label":"v9.0.1","branchLabelMappingKey":"^v(\\d+).(\\d+).\\d+$","isSourceBranch":false,"state":"NOT_CREATED"}]}] BACKPORT-->
…216667) (#218585) # Backport This will backport the following commits from `main` to `8.18`: - [[Security Solution][Detection Engine] adds async ES|QL query (#216667)](#216667) <!--- Backport version: 9.6.6 --> ### Questions ? Please refer to the [Backport tool documentation](https://github.com/sorenlouv/backport) <!--BACKPORT [{"author":{"name":"Vitalii Dmyterko","email":"92328789+vitaliidm@users.noreply.github.com"},"sourceCommit":{"committedDate":"2025-04-17T14:23:07Z","message":"[Security Solution][Detection Engine] adds async ES|QL query (#216667)\n\n## Summary\n\n- addresses elastic/security-team#11116 (list\nitem 2)\n\nIntroducing async query would allow to overcome ES request timeout for\nlong running rules and queries.\n\nTimeout for ES request is [defined in alerting\nframework](https://github.com/elastic/kibana/blob/8.18/x-pack/platform/plugins/shared/alerting/server/lib/get_es_request_timeout.ts#L21)\nand is smaller value out of rule execution timeout or default ES request\ntimeout(which is 5m and hardcoded\n[here](https://github.com/elastic/kibana/blob/8.18/x-pack/platform/plugins/shared/alerting/server/lib/get_rule_task_timeout.ts)).\n\nIf ES|QL rule performs a single long-running ES query, it can time out\nafter 5m due to this ES request timeout. This value can't be changed,\nunlike rule execution timeout. It can be overwritten in Kibana config\n\n```\nxpack.alerting.rules.run:\n timeout: '10m'\n ruleTypeOverrides:\n - id: 'siem.esqlRule'\n timeout: '15m'\n```\nSo, we can encounter situations when rule fails execution after 5m due\nto ES request timeout, despite a fact it configured with longer timeout\nof 15m\n\nBy using async query, we can overcome this limitation and can poll async\nquery results until it completes or rule timeouts\n\nMore details in internal\n[issue](https://github.com/elastic/sdh-security-team/issues/1224)\n\n---------\n\nCo-authored-by: Ryland Herrick <ryalnd@gmail.com>","sha":"3d7aac1a443092ebdbc20fbd9345d373bcb16c48","branchLabelMapping":{"^v9.1.0$":"main","^v(\\d+).(\\d+).\\d+$":"$1.$2"}},"sourcePullRequest":{"labels":["release_note:skip","Team: SecuritySolution","Team:Detection Engine","backport:version","v9.1.0","v8.19.0","v8.18.1","v9.0.1"],"title":"[Security Solution][Detection Engine] adds async ES|QL query","number":216667,"url":"https://github.com/elastic/kibana/pull/216667","mergeCommit":{"message":"[Security Solution][Detection Engine] adds async ES|QL query (#216667)\n\n## Summary\n\n- addresses elastic/security-team#11116 (list\nitem 2)\n\nIntroducing async query would allow to overcome ES request timeout for\nlong running rules and queries.\n\nTimeout for ES request is [defined in alerting\nframework](https://github.com/elastic/kibana/blob/8.18/x-pack/platform/plugins/shared/alerting/server/lib/get_es_request_timeout.ts#L21)\nand is smaller value out of rule execution timeout or default ES request\ntimeout(which is 5m and hardcoded\n[here](https://github.com/elastic/kibana/blob/8.18/x-pack/platform/plugins/shared/alerting/server/lib/get_rule_task_timeout.ts)).\n\nIf ES|QL rule performs a single long-running ES query, it can time out\nafter 5m due to this ES request timeout. This value can't be changed,\nunlike rule execution timeout. It can be overwritten in Kibana config\n\n```\nxpack.alerting.rules.run:\n timeout: '10m'\n ruleTypeOverrides:\n - id: 'siem.esqlRule'\n timeout: '15m'\n```\nSo, we can encounter situations when rule fails execution after 5m due\nto ES request timeout, despite a fact it configured with longer timeout\nof 15m\n\nBy using async query, we can overcome this limitation and can poll async\nquery results until it completes or rule timeouts\n\nMore details in internal\n[issue](https://github.com/elastic/sdh-security-team/issues/1224)\n\n---------\n\nCo-authored-by: Ryland Herrick <ryalnd@gmail.com>","sha":"3d7aac1a443092ebdbc20fbd9345d373bcb16c48"}},"sourceBranch":"main","suggestedTargetBranches":["8.18","9.0"],"targetPullRequestStates":[{"branch":"main","label":"v9.1.0","branchLabelMappingKey":"^v9.1.0$","isSourceBranch":true,"state":"MERGED","url":"https://github.com/elastic/kibana/pull/216667","number":216667,"mergeCommit":{"message":"[Security Solution][Detection Engine] adds async ES|QL query (#216667)\n\n## Summary\n\n- addresses elastic/security-team#11116 (list\nitem 2)\n\nIntroducing async query would allow to overcome ES request timeout for\nlong running rules and queries.\n\nTimeout for ES request is [defined in alerting\nframework](https://github.com/elastic/kibana/blob/8.18/x-pack/platform/plugins/shared/alerting/server/lib/get_es_request_timeout.ts#L21)\nand is smaller value out of rule execution timeout or default ES request\ntimeout(which is 5m and hardcoded\n[here](https://github.com/elastic/kibana/blob/8.18/x-pack/platform/plugins/shared/alerting/server/lib/get_rule_task_timeout.ts)).\n\nIf ES|QL rule performs a single long-running ES query, it can time out\nafter 5m due to this ES request timeout. This value can't be changed,\nunlike rule execution timeout. It can be overwritten in Kibana config\n\n```\nxpack.alerting.rules.run:\n timeout: '10m'\n ruleTypeOverrides:\n - id: 'siem.esqlRule'\n timeout: '15m'\n```\nSo, we can encounter situations when rule fails execution after 5m due\nto ES request timeout, despite a fact it configured with longer timeout\nof 15m\n\nBy using async query, we can overcome this limitation and can poll async\nquery results until it completes or rule timeouts\n\nMore details in internal\n[issue](https://github.com/elastic/sdh-security-team/issues/1224)\n\n---------\n\nCo-authored-by: Ryland Herrick <ryalnd@gmail.com>","sha":"3d7aac1a443092ebdbc20fbd9345d373bcb16c48"}},{"branch":"8.19","label":"v8.19.0","branchLabelMappingKey":"^v(\\d+).(\\d+).\\d+$","isSourceBranch":false,"url":"https://github.com/elastic/kibana/pull/218567","number":218567,"state":"OPEN"},{"branch":"8.18","label":"v8.18.1","branchLabelMappingKey":"^v(\\d+).(\\d+).\\d+$","isSourceBranch":false,"state":"NOT_CREATED"},{"branch":"9.0","label":"v9.0.1","branchLabelMappingKey":"^v(\\d+).(\\d+).\\d+$","isSourceBranch":false,"state":"NOT_CREATED"}]}] BACKPORT--> --------- Co-authored-by: kibanamachine <42973632+kibanamachine@users.noreply.github.com>
…#216667) ## Summary - addresses elastic/security-team#11116 (list item 2) Introducing async query would allow to overcome ES request timeout for long running rules and queries. Timeout for ES request is [defined in alerting framework](https://github.com/elastic/kibana/blob/8.18/x-pack/platform/plugins/shared/alerting/server/lib/get_es_request_timeout.ts#L21) and is smaller value out of rule execution timeout or default ES request timeout(which is 5m and hardcoded [here](https://github.com/elastic/kibana/blob/8.18/x-pack/platform/plugins/shared/alerting/server/lib/get_rule_task_timeout.ts)). If ES|QL rule performs a single long-running ES query, it can time out after 5m due to this ES request timeout. This value can't be changed, unlike rule execution timeout. It can be overwritten in Kibana config ``` xpack.alerting.rules.run: timeout: '10m' ruleTypeOverrides: - id: 'siem.esqlRule' timeout: '15m' ``` So, we can encounter situations when rule fails execution after 5m due to ES request timeout, despite a fact it configured with longer timeout of 15m By using async query, we can overcome this limitation and can poll async query results until it completes or rule timeouts More details in internal [issue](elastic/sdh-security-team#1224) --------- Co-authored-by: Ryland Herrick <ryalnd@gmail.com>
…#216667) ## Summary - addresses elastic/security-team#11116 (list item 2) Introducing async query would allow to overcome ES request timeout for long running rules and queries. Timeout for ES request is [defined in alerting framework](https://github.com/elastic/kibana/blob/8.18/x-pack/platform/plugins/shared/alerting/server/lib/get_es_request_timeout.ts#L21) and is smaller value out of rule execution timeout or default ES request timeout(which is 5m and hardcoded [here](https://github.com/elastic/kibana/blob/8.18/x-pack/platform/plugins/shared/alerting/server/lib/get_rule_task_timeout.ts)). If ES|QL rule performs a single long-running ES query, it can time out after 5m due to this ES request timeout. This value can't be changed, unlike rule execution timeout. It can be overwritten in Kibana config ``` xpack.alerting.rules.run: timeout: '10m' ruleTypeOverrides: - id: 'siem.esqlRule' timeout: '15m' ``` So, we can encounter situations when rule fails execution after 5m due to ES request timeout, despite a fact it configured with longer timeout of 15m By using async query, we can overcome this limitation and can poll async query results until it completes or rule timeouts More details in internal [issue](elastic/sdh-security-team#1224) --------- Co-authored-by: Ryland Herrick <ryalnd@gmail.com>
Summary
Introducing async query would allow to overcome ES request timeout for long running rules and queries.
Timeout for ES request is defined in alerting framework and is smaller value out of rule execution timeout or default ES request timeout(which is 5m and hardcoded here).
If ES|QL rule performs a single long-running ES query, it can time out after 5m due to this ES request timeout. This value can't be changed, unlike rule execution timeout. It can be overwritten in Kibana config
So, we can encounter situations when rule fails execution after 5m due to ES request timeout, despite a fact it configured with longer timeout of 15m
By using async query, we can overcome this limitation and can poll async query results until it completes or rule timeouts
More details in internal issue