[Security Solution] Add retryIfConflict util for 409 conflicts in Integration tests#174185
[Security Solution] Add retryIfConflict util for 409 conflicts in Integration tests#174185jpdjere merged 9 commits intoelastic:mainfrom
retryIfConflict util for 409 conflicts in Integration tests#174185Conversation
retryIfConflict utilretryIfConflict util for 409 conflicts in Integration tests
|
Pinging @elastic/security-solution (Team: SecuritySolution) |
|
Pinging @elastic/security-detections-response (Team:Detections and Resp) |
|
Pinging @elastic/security-detection-rule-management (Team:Detection Rule Management) |
banderror
left a comment
There was a problem hiding this comment.
I don't think retryIfConflicts does what it should 🙂
...ecurity_solution_api_integration/test_suites/detections_response/utils/retry_if_conflicts.ts
Outdated
Show resolved
Hide resolved
...ecurity_solution_api_integration/test_suites/detections_response/utils/retry_if_conflicts.ts
Outdated
Show resolved
Hide resolved
...est_suites/detections_response/utils/rules/prebuilt_rules/delete_all_prebuilt_rule_assets.ts
Outdated
Show resolved
Hide resolved
|
Ok, so I had to rethink this a bit. Responses from the Elasticsearch client vary from one type of operation to another, so there's no export interface DeleteByQueryResponse {
batches?: long;
deleted?: long;
failures?: BulkIndexByScrollFailure[];
noops?: long;
requests_per_second?: float;
// other stuff......
total?: long;
version_conflicts?: long;
}and as you can see from the error in the ticket, the (Also, I cannot get access to anything else apart from the body when doing requests directly to ES in integration tests) So I refactored the code so that this utility is specific to retry DeleteByQuery operations if it hits 409 conflict errors, which seems to be, at least until now, the ES operation where this conflict is happening. I think we can try to generalize this utility, or create other specific version of retries if we need them in the future, and use this as a first step. Let me know what you think. Also, rerunning the Flaky test runner. |
..._license/prebuilt_rules/large_prebuilt_rules_package/install_large_prebuilt_rules_package.ts
Outdated
Show resolved
Hide resolved
| for (const failure of operationResult?.failures) { | ||
| if (failure.status === 409) { | ||
| // if no retries left, throw it | ||
| if (retries <= 0) { | ||
| logger.error(`${name} conflict, exceeded retries`); | ||
| throw new Error(`${name} conflict, exceeded retries`); | ||
| } |
There was a problem hiding this comment.
If we had this in the deleteAllPrebuiltRuleAssets function where we call the delete by query method, we could use the generic retry function and get rid of this retryIfDeleteByQueryConflicts.
If you think that retryIfDeleteByQueryConflicts has enough use cases to exist, maybe we could use the retry function inside of it to avoid conceptual code duplication between retry and retryIfDeleteByQueryConflicts.
...ion_api_integration/test_suites/detections_response/utils/retry_delete_by_query_conflicts.ts
Outdated
Show resolved
Hide resolved
banderror
left a comment
There was a problem hiding this comment.
The Flaky Test Runner is happy. Approving the PR to not being a blocker for it - posted only a bunch of minor comments. Thanks for addressing the feedback, LGTM 👍
💛 Build succeeded, but was flaky
Failed CI StepsMetrics [docs]
History
To update your PR or re-run it, just comment with: cc @jpdjere |
💔 All backports failed
Manual backportTo create the backport manually run: Questions ?Please refer to the Backport tool documentation |
… Integration tests (elastic#174185) ## Summary Fixes: elastic#171428 **NOTE: the test where this was reported wasn't skipped, so this PR does not unskip any tests.** However, the Flaky Test Runs help us determine that the issue is no longer reproducible. The `deleteAllPrebuiltRuleAssets` utility reported a `409 Conflict`, presumably from `security-rule` assets that were attempted to be deleted while they were being updated by a parallel process. This PR wraps the `es.deleteByQuery` calls in the utils `deleteAllPrebuiltRuleAssets` and `deleteAllTimelines` with a new `retryIfConflict` helper, that will retry the operation if the ES request fails with a `409`. ## Flaky test run `bundled_prebuilt_rules_package` - **ESS** and **Serverless**: https://buildkite.com/elastic/kibana-flaky-test-suite-runner/builds/4790 `large_prebuilt_rules_package` - **ESS** and **Serverless**: https://buildkite.com/elastic/kibana-flaky-test-suite-runner/builds/4791 `update_prebuilt_rules_package` - **ESS** and **Serverless**: https://buildkite.com/elastic/kibana-flaky-test-suite-runner/builds/4792 `management` - **ESS** and **Serverless**: https://buildkite.com/elastic/kibana-flaky-test-suite-runner/builds/4793 ### For maintainers - [ ] This was checked for breaking API changes and was [labeled appropriately](https://www.elastic.co/guide/en/kibana/master/contributing.html#kibana-release-notes-process) (cherry picked from commit b8c7306) # Conflicts: # x-pack/test/security_solution_api_integration/package.json
💚 All backports created successfully
Note: Successful backport PRs will be merged automatically after passing CI. Questions ?Please refer to the Backport tool documentation |
|
Looks like this PR has a backport PR but it still hasn't been merged. Please merge it ASAP to keep the branches relatively in sync. |
…icts in Integration tests (#174185) (#174762) # Backport This will backport the following commits from `main` to `8.12`: - [[Security Solution] Add `retryIfConflict` util for `409` conflicts in Integration tests (#174185)](#174185) <!--- Backport version: 8.9.8 --> ### Questions ? Please refer to the [Backport tool documentation](https://github.com/sqren/backport) <!--BACKPORT [{"author":{"name":"Juan Pablo Djeredjian","email":"jpdjeredjian@gmail.com"},"sourceCommit":{"committedDate":"2024-01-11T12:39:45Z","message":"[Security Solution] Add `retryIfConflict` util for `409` conflicts in Integration tests (#174185)\n\n## Summary\r\n\r\nFixes: https://github.com/elastic/kibana/issues/171428\r\n\r\n**NOTE: the test where this was reported wasn't skipped, so this PR does\r\nnot unskip any tests.** However, the Flaky Test Runs help us determine\r\nthat the issue is no longer reproducible.\r\n\r\nThe `deleteAllPrebuiltRuleAssets` utility reported a `409 Conflict`,\r\npresumably from `security-rule` assets that were attempted to be deleted\r\nwhile they were being updated by a parallel process.\r\n\r\nThis PR wraps the `es.deleteByQuery` calls in the utils\r\n`deleteAllPrebuiltRuleAssets` and `deleteAllTimelines` with a new\r\n`retryIfConflict` helper, that will retry the operation if the ES\r\nrequest fails with a `409`.\r\n\r\n## Flaky test run\r\n\r\n`bundled_prebuilt_rules_package` - **ESS** and **Serverless**:\r\nhttps://buildkite.com/elastic/kibana-flaky-test-suite-runner/builds/4790\r\n\r\n`large_prebuilt_rules_package` - **ESS** and **Serverless**:\r\nhttps://buildkite.com/elastic/kibana-flaky-test-suite-runner/builds/4791\r\n\r\n`update_prebuilt_rules_package` - **ESS** and **Serverless**:\r\nhttps://buildkite.com/elastic/kibana-flaky-test-suite-runner/builds/4792\r\n\r\n`management` - **ESS** and **Serverless**:\r\nhttps://buildkite.com/elastic/kibana-flaky-test-suite-runner/builds/4793\r\n\r\n### For maintainers\r\n\r\n- [ ] This was checked for breaking API changes and was [labeled\r\nappropriately](https://www.elastic.co/guide/en/kibana/master/contributing.html#kibana-release-notes-process)","sha":"b8c7306d241807b68bedbd477dcec232e203f6ad","branchLabelMapping":{"^v8.13.0$":"main","^v(\\d+).(\\d+).\\d+$":"$1.$2"}},"sourcePullRequest":{"labels":["test","release_note:skip","Team:Detections and Resp","Team: SecuritySolution","Team:Detection Rule Management","Feature:Prebuilt Detection Rules","v8.12.0","v8.12.1","v8.13.0"],"number":174185,"url":"https://github.com/elastic/kibana/pull/174185","mergeCommit":{"message":"[Security Solution] Add `retryIfConflict` util for `409` conflicts in Integration tests (#174185)\n\n## Summary\r\n\r\nFixes: https://github.com/elastic/kibana/issues/171428\r\n\r\n**NOTE: the test where this was reported wasn't skipped, so this PR does\r\nnot unskip any tests.** However, the Flaky Test Runs help us determine\r\nthat the issue is no longer reproducible.\r\n\r\nThe `deleteAllPrebuiltRuleAssets` utility reported a `409 Conflict`,\r\npresumably from `security-rule` assets that were attempted to be deleted\r\nwhile they were being updated by a parallel process.\r\n\r\nThis PR wraps the `es.deleteByQuery` calls in the utils\r\n`deleteAllPrebuiltRuleAssets` and `deleteAllTimelines` with a new\r\n`retryIfConflict` helper, that will retry the operation if the ES\r\nrequest fails with a `409`.\r\n\r\n## Flaky test run\r\n\r\n`bundled_prebuilt_rules_package` - **ESS** and **Serverless**:\r\nhttps://buildkite.com/elastic/kibana-flaky-test-suite-runner/builds/4790\r\n\r\n`large_prebuilt_rules_package` - **ESS** and **Serverless**:\r\nhttps://buildkite.com/elastic/kibana-flaky-test-suite-runner/builds/4791\r\n\r\n`update_prebuilt_rules_package` - **ESS** and **Serverless**:\r\nhttps://buildkite.com/elastic/kibana-flaky-test-suite-runner/builds/4792\r\n\r\n`management` - **ESS** and **Serverless**:\r\nhttps://buildkite.com/elastic/kibana-flaky-test-suite-runner/builds/4793\r\n\r\n### For maintainers\r\n\r\n- [ ] This was checked for breaking API changes and was [labeled\r\nappropriately](https://www.elastic.co/guide/en/kibana/master/contributing.html#kibana-release-notes-process)","sha":"b8c7306d241807b68bedbd477dcec232e203f6ad"}},"sourceBranch":"main","suggestedTargetBranches":["8.12"],"targetPullRequestStates":[{"branch":"8.12","label":"v8.12.0","labelRegex":"^v(\\d+).(\\d+).\\d+$","isSourceBranch":false,"state":"NOT_CREATED"},{"branch":"main","label":"v8.13.0","labelRegex":"^v8.13.0$","isSourceBranch":true,"state":"MERGED","url":"https://github.com/elastic/kibana/pull/174185","number":174185,"mergeCommit":{"message":"[Security Solution] Add `retryIfConflict` util for `409` conflicts in Integration tests (#174185)\n\n## Summary\r\n\r\nFixes: https://github.com/elastic/kibana/issues/171428\r\n\r\n**NOTE: the test where this was reported wasn't skipped, so this PR does\r\nnot unskip any tests.** However, the Flaky Test Runs help us determine\r\nthat the issue is no longer reproducible.\r\n\r\nThe `deleteAllPrebuiltRuleAssets` utility reported a `409 Conflict`,\r\npresumably from `security-rule` assets that were attempted to be deleted\r\nwhile they were being updated by a parallel process.\r\n\r\nThis PR wraps the `es.deleteByQuery` calls in the utils\r\n`deleteAllPrebuiltRuleAssets` and `deleteAllTimelines` with a new\r\n`retryIfConflict` helper, that will retry the operation if the ES\r\nrequest fails with a `409`.\r\n\r\n## Flaky test run\r\n\r\n`bundled_prebuilt_rules_package` - **ESS** and **Serverless**:\r\nhttps://buildkite.com/elastic/kibana-flaky-test-suite-runner/builds/4790\r\n\r\n`large_prebuilt_rules_package` - **ESS** and **Serverless**:\r\nhttps://buildkite.com/elastic/kibana-flaky-test-suite-runner/builds/4791\r\n\r\n`update_prebuilt_rules_package` - **ESS** and **Serverless**:\r\nhttps://buildkite.com/elastic/kibana-flaky-test-suite-runner/builds/4792\r\n\r\n`management` - **ESS** and **Serverless**:\r\nhttps://buildkite.com/elastic/kibana-flaky-test-suite-runner/builds/4793\r\n\r\n### For maintainers\r\n\r\n- [ ] This was checked for breaking API changes and was [labeled\r\nappropriately](https://www.elastic.co/guide/en/kibana/master/contributing.html#kibana-release-notes-process)","sha":"b8c7306d241807b68bedbd477dcec232e203f6ad"}}]}] BACKPORT-->
|
This PR didn't make it into the latest BC for 8.12.0. Updating the labels. |
…sts (#189813) Fixes #176445 ## Summary Flaky test runner: https://buildkite.com/elastic/kibana-flaky-test-suite-runner/builds/6679 This PR fixes the advanced settings API integration tests that seem to be flaky. The reason for the occasional failures is most likely a `version_conflict_engine_exception` which is thrown when another node indexes the same documents. This can happen when we save an advanced setting since the settings API uses saved objects under the hood, and in CI, multiple nodes can try to save an advanced setting simultaneously. The solution in this PR is to retry the request if we encounter a 409 error. This is adapted from the solution in #174185 which resolves a similar failure.
…sts (elastic#189813) Fixes elastic#176445 Flaky test runner: https://buildkite.com/elastic/kibana-flaky-test-suite-runner/builds/6679 This PR fixes the advanced settings API integration tests that seem to be flaky. The reason for the occasional failures is most likely a `version_conflict_engine_exception` which is thrown when another node indexes the same documents. This can happen when we save an advanced setting since the settings API uses saved objects under the hood, and in CI, multiple nodes can try to save an advanced setting simultaneously. The solution in this PR is to retry the request if we encounter a 409 error. This is adapted from the solution in elastic#174185 which resolves a similar failure. (cherry picked from commit cc29eea)
…sts (elastic#189813) Fixes elastic#176445 Flaky test runner: https://buildkite.com/elastic/kibana-flaky-test-suite-runner/builds/6679 This PR fixes the advanced settings API integration tests that seem to be flaky. The reason for the occasional failures is most likely a `version_conflict_engine_exception` which is thrown when another node indexes the same documents. This can happen when we save an advanced setting since the settings API uses saved objects under the hood, and in CI, multiple nodes can try to save an advanced setting simultaneously. The solution in this PR is to retry the request if we encounter a 409 error. This is adapted from the solution in elastic#174185 which resolves a similar failure. (cherry picked from commit cc29eea)
Summary
Fixes: #171428
NOTE: the test where this was reported wasn't skipped, so this PR does not unskip any tests. However, the Flaky Test Runs help us determine that the issue is no longer reproducible.
The
deleteAllPrebuiltRuleAssetsutility reported a409 Conflict, presumably fromsecurity-ruleassets that were attempted to be deleted while they were being updated by a parallel process.This PR wraps the
es.deleteByQuerycalls in the utilsdeleteAllPrebuiltRuleAssetsanddeleteAllTimelineswith a newretryIfConflicthelper, that will retry the operation if the ES request fails with a409.Flaky test run
bundled_prebuilt_rules_package- ESS and Serverless: https://buildkite.com/elastic/kibana-flaky-test-suite-runner/builds/4790large_prebuilt_rules_package- ESS and Serverless: https://buildkite.com/elastic/kibana-flaky-test-suite-runner/builds/4791update_prebuilt_rules_package- ESS and Serverless: https://buildkite.com/elastic/kibana-flaky-test-suite-runner/builds/4792management- ESS and Serverless: https://buildkite.com/elastic/kibana-flaky-test-suite-runner/builds/4793For maintainers