feat(slo): bulk delete APIs#217405
Conversation
64181a3 to
a23c287
Compare
21cf7d4 to
5ced909
Compare
5ced909 to
d8cdf01
Compare
|
Pinging @elastic/obs-ux-management-team (Team:obs-ux-management) |
There was a problem hiding this comment.
Pull Request Overview
This PR implements bulk delete APIs for SLOs and enhances the existing DeleteSLO service by adding options to skip specific deletion stages.
- Updated DeleteSLO service to include options (skipDataDeletion and skipRuleDeletion) and refactored the deletion steps.
- Introduced new bulk delete endpoints (POST and GET) along with corresponding OpenAPI documentation updates.
- Made minor adjustments in the SLO creation service and test files to align with the revised deletion behavior.
Reviewed Changes
Copilot reviewed 21 out of 27 changed files in this pull request and generated no comments.
Show a summary per file
| File | Description |
|---|---|
| x-pack/solutions/observability/plugins/slo/server/services/delete_slo.ts | Adds options to control data and rule deletion, and adjusts deletion ordering and client usage. |
| x-pack/solutions/observability/plugins/slo/server/services/delete_slo.test.ts | Removes the unused ElasticsearchClient mock and updates expectations. |
| x-pack/solutions/observability/plugins/slo/server/services/create_slo.ts | Updates repository deletion rollback to use the options object. |
| x-pack/solutions/observability/plugins/slo/server/routes/slo/route.ts | Incorporates the new bulk delete routes. |
| x-pack/solutions/observability/plugins/slo/server/routes/slo/delete_slo.ts | Refactors imports and client usage to use the current user’s scoped client. |
| x-pack/solutions/observability/plugins/slo/server/routes/slo/bulk_delete.ts | Introduces the bulk delete API endpoints and status checking. |
| x-pack/solutions/observability/plugins/slo/server/plugin.ts | Registers the new BulkDeleteTask and maps plugins accordingly. |
| YAML files under docs/openapi/slo | Adds OpenAPI definitions for the new bulk delete endpoints. |
| x-pack/platform/packages/shared/kbn-slo-schema | Adds bulk delete schemas and route definitions. |
Files not reviewed (6)
- oas_docs/output/kibana.serverless.yaml: Language not supported
- oas_docs/output/kibana.yaml: Language not supported
- x-pack/solutions/observability/plugins/slo/docs/openapi/slo/bundled.json: Language not supported
- x-pack/solutions/observability/plugins/slo/docs/openapi/slo/bundled.yaml: Language not supported
- x-pack/solutions/observability/plugins/slo/docs/openapi/slo/entrypoint.yaml: Language not supported
- x-pack/solutions/observability/plugins/slo/server/services/snapshots/delete_slo.test.ts.snap: Language not supported
Comments suppressed due to low confidence (2)
x-pack/solutions/observability/plugins/slo/server/services/delete_slo.ts:96
- Verify that setting 'refresh' to false in 'deleteSummaryData' is intentional, as this differs from the previous behavior where it was set to true.
refresh: false,
x-pack/solutions/observability/plugins/slo/server/services/delete_slo.ts:57
- Confirm that moving the repository deletion step to the first Promise.all does not interfere with the subsequent data deletion operations.
this.repository.deleteById(slo.id, { ignoreNotFound: true }),
dmlemeshko
left a comment
There was a problem hiding this comment.
x-pack/test/api_integration/deployment_agnostic/services/slo_api.ts changes LGTM
pmuellr
left a comment
There was a problem hiding this comment.
I left two comments to be considered. Not blockers, but will likely improve the overall aspect of the new task.
| }, | ||
|
|
||
| cancel: async () => { | ||
| this.abortController.abort('Timed out'); |
There was a problem hiding this comment.
This is the only reference to the abort controller, so basically it's a no-op. The controller will signal the abort, but with no code depending on it, nothing will happen. Suggest you pass it in to runBulkDelete and check it there, especially since it seems to be dealing with multiple operations ...
There was a problem hiding this comment.
You're right. I wanted to use it but then noticed all the services used within runBulkDelete (especially deleteSLO) need to be updated to use an abortController, and stopped there. I'll refactor these services to use it.
| .filter((result) => result.success === true) | ||
| .map((result) => result.id); | ||
|
|
||
| await Promise.all([ |
There was a problem hiding this comment.
So there are 3 independent things going on here - two DBQ's run "in parallel", followed by bulk rule deletion.
We've had issues in the past when ES runs into problems "in the middle" of things that you would like to wrap in a "transaction", so it would be good to think about this for a few minutes. Relatedly, you may not want to do the DBQ's "in parallel", since either one of them could fail - is the order important?
The basic question to ask is - if one of these operations fails, what is the state of the system after that? Is it possible for a user to "try again" or "continue"? Would it take manual intervention (customer using DevTools to make some HTTP requests) to get things "working" again?
There was a problem hiding this comment.
Relatedly, you may not want to do the DBQ's "in parallel", since either one of them could fail - is the order important?
The order is not important since I'm deleting the data only for the successfully deleted SLO (therefore their transforms are not running anymore, and no new data is being produced). if the DBQ fails during their execution (they run async, and only preflight checks is done synchronously), I think that's acceptable, but you're right that I should handle the potential failure to continue to the next step regardless.
There was a problem hiding this comment.
The basic question to ask is - if one of these operations fails, what is the state of the system after that? Is it possible for a user to "try again" or "continue"? Would it take manual intervention (customer using DevTools to make some HTTP requests) to get things "working" again?
So yes, I think catching the DBQ errors and continuing is enough.
Worst case, we did not delete the data: the user can still do it manually or not, the system won't care.
So I'll add a catch on these promises
|
@pmuellr I've changed the flow to catch potential errors while scheduling DBQs, and i'm using the abortController in most services |
💔 Build Failed
Failed CI StepsHistory
|
|
Starting backport for target branches: 8.19 https://github.com/elastic/kibana/actions/runs/14654066416 |
💔 All backports failed
Manual backportTo create the backport manually run: Questions ?Please refer to the Backport tool documentation |
(cherry picked from commit 1882697) # Conflicts: # x-pack/solutions/observability/plugins/slo/tsconfig.json
💚 All backports created successfully
Note: Successful backport PRs will be merged automatically after passing CI. Questions ?Please refer to the Backport tool documentation |
# Backport This will backport the following commits from `main` to `8.19`: - [feat(slo): bulk delete APIs (#217405)](#217405) <!--- Backport version: 9.6.6 --> ### Questions ? Please refer to the [Backport tool documentation](https://github.com/sorenlouv/backport) <!--BACKPORT [{"author":{"name":"Kevin Delemme","email":"kevin.delemme@elastic.co"},"sourceCommit":{"committedDate":"2025-04-25T00:11:26Z","message":"feat(slo): bulk delete APIs (#217405)","sha":"18826975c7321cfa1a11d392d852e43e179d4f2f","branchLabelMapping":{"^v9.1.0$":"main","^v(\\d+).(\\d+).\\d+$":"$1.$2"}},"sourcePullRequest":{"labels":["release_note:skip","Team:obs-ux-management","backport:version","v9.1.0","v8.19.0"],"title":"feat(slo): bulk delete APIs","number":217405,"url":"https://github.com/elastic/kibana/pull/217405","mergeCommit":{"message":"feat(slo): bulk delete APIs (#217405)","sha":"18826975c7321cfa1a11d392d852e43e179d4f2f"}},"sourceBranch":"main","suggestedTargetBranches":["8.19"],"targetPullRequestStates":[{"branch":"main","label":"v9.1.0","branchLabelMappingKey":"^v9.1.0$","isSourceBranch":true,"state":"MERGED","url":"https://github.com/elastic/kibana/pull/217405","number":217405,"mergeCommit":{"message":"feat(slo): bulk delete APIs (#217405)","sha":"18826975c7321cfa1a11d392d852e43e179d4f2f"}},{"branch":"8.19","label":"v8.19.0","branchLabelMappingKey":"^v(\\d+).(\\d+).\\d+$","isSourceBranch":false,"state":"NOT_CREATED"}]}] BACKPORT--> --------- Co-authored-by: kibanamachine <42973632+kibanamachine@users.noreply.github.com>
🍒 Summary
Related to #209925
This PR introduces a
POST /_bulk_deleteandGET /_bulk_delete/{taskId}APIs that leverage the task manager to run a bulk deletion. We reuse the Delete SLO application service as much as possible, while grouping the delete_by_query and bulkDeleteRule for all successfully deleted SLOs in order to reduce the number of sub-task (delete_by_query) created.We keep the result of the task for 1 hour, so we can display the result on the frontend.
Manual testing