Skip to content

feat(slo): bulk delete APIs#217405

Merged
kdelemme merged 28 commits intoelastic:mainfrom
kdelemme:poc/bulk_delete
Apr 25, 2025
Merged

feat(slo): bulk delete APIs#217405
kdelemme merged 28 commits intoelastic:mainfrom
kdelemme:poc/bulk_delete

Conversation

@kdelemme
Copy link
Contributor

@kdelemme kdelemme commented Apr 7, 2025

🍒 Summary

Related to #209925

This PR introduces a POST /_bulk_delete and GET /_bulk_delete/{taskId} APIs that leverage the task manager to run a bulk deletion. We reuse the Delete SLO application service as much as possible, while grouping the delete_by_query and bulkDeleteRule for all successfully deleted SLOs in order to reduce the number of sub-task (delete_by_query) created.

We keep the result of the task for 1 hour, so we can display the result on the frontend.

Manual testing

  • Create some SLOs
  • Bulk delete some of the SLO and some inexistant:
curl --request POST \
  --url http://localhost:5601/api/observability/slos/_bulk_delete \
  --header 'Authorization: Basic ZWxhc3RpYzpjaGFuZ2VtZQ==' \
  --header 'Content-Type: application/json' \
  --header 'kbn-xsrf: oui' \
  --data '{
	"list": [
	"8030bf8e-b047-4af2-a696-afb6f6bf7813",
	"8465d14b-3054-4eed-94f5-6e904d59838a",
	"9e82240f-2f0c-4fa4-a37c-1f6f4627951d",
        "inexistant"
	]
}'
  • Get the bulk status by taskId
curl --request GET \
  --url http://localhost:5601/api/observability/slos/_bulk_delete/{taskId} \
  --header 'Authorization: Basic ZWxhc3RpYzpjaGFuZ2VtZQ==' \
  --header 'Content-Type: application/json' \
  --header 'kbn-xsrf: oui'

@kdelemme kdelemme changed the title poc: bulk delete task feat(slo): bulk delete APIs Apr 17, 2025
@kdelemme kdelemme marked this pull request as ready for review April 17, 2025 14:11
@kdelemme kdelemme requested review from a team as code owners April 17, 2025 14:11
@botelastic botelastic bot added the Team:actionable-obs Formerly "obs-ux-management", responsible for SLO, o11y alerting, significant events, & synthetics. label Apr 17, 2025
@elasticmachine
Copy link
Contributor

Pinging @elastic/obs-ux-management-team (Team:obs-ux-management)

@kdelemme kdelemme added release_note:skip Skip the PR/issue when compiling release notes backport:version Backport to applied version labels v9.1.0 v8.19.0 labels Apr 17, 2025
@kdelemme kdelemme requested a review from Copilot April 21, 2025 14:37
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR implements bulk delete APIs for SLOs and enhances the existing DeleteSLO service by adding options to skip specific deletion stages.

  • Updated DeleteSLO service to include options (skipDataDeletion and skipRuleDeletion) and refactored the deletion steps.
  • Introduced new bulk delete endpoints (POST and GET) along with corresponding OpenAPI documentation updates.
  • Made minor adjustments in the SLO creation service and test files to align with the revised deletion behavior.

Reviewed Changes

Copilot reviewed 21 out of 27 changed files in this pull request and generated no comments.

Show a summary per file
File Description
x-pack/solutions/observability/plugins/slo/server/services/delete_slo.ts Adds options to control data and rule deletion, and adjusts deletion ordering and client usage.
x-pack/solutions/observability/plugins/slo/server/services/delete_slo.test.ts Removes the unused ElasticsearchClient mock and updates expectations.
x-pack/solutions/observability/plugins/slo/server/services/create_slo.ts Updates repository deletion rollback to use the options object.
x-pack/solutions/observability/plugins/slo/server/routes/slo/route.ts Incorporates the new bulk delete routes.
x-pack/solutions/observability/plugins/slo/server/routes/slo/delete_slo.ts Refactors imports and client usage to use the current user’s scoped client.
x-pack/solutions/observability/plugins/slo/server/routes/slo/bulk_delete.ts Introduces the bulk delete API endpoints and status checking.
x-pack/solutions/observability/plugins/slo/server/plugin.ts Registers the new BulkDeleteTask and maps plugins accordingly.
YAML files under docs/openapi/slo Adds OpenAPI definitions for the new bulk delete endpoints.
x-pack/platform/packages/shared/kbn-slo-schema Adds bulk delete schemas and route definitions.
Files not reviewed (6)
  • oas_docs/output/kibana.serverless.yaml: Language not supported
  • oas_docs/output/kibana.yaml: Language not supported
  • x-pack/solutions/observability/plugins/slo/docs/openapi/slo/bundled.json: Language not supported
  • x-pack/solutions/observability/plugins/slo/docs/openapi/slo/bundled.yaml: Language not supported
  • x-pack/solutions/observability/plugins/slo/docs/openapi/slo/entrypoint.yaml: Language not supported
  • x-pack/solutions/observability/plugins/slo/server/services/snapshots/delete_slo.test.ts.snap: Language not supported
Comments suppressed due to low confidence (2)

x-pack/solutions/observability/plugins/slo/server/services/delete_slo.ts:96

  • Verify that setting 'refresh' to false in 'deleteSummaryData' is intentional, as this differs from the previous behavior where it was set to true.
refresh: false,

x-pack/solutions/observability/plugins/slo/server/services/delete_slo.ts:57

  • Confirm that moving the repository deletion step to the first Promise.all does not interfere with the subsequent data deletion operations.
this.repository.deleteById(slo.id, { ignoreNotFound: true }),

@kdelemme kdelemme requested a review from a team as a code owner April 21, 2025 14:53
Copy link
Contributor

@baileycash-elastic baileycash-elastic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

Copy link
Contributor

@dmlemeshko dmlemeshko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

x-pack/test/api_integration/deployment_agnostic/services/slo_api.ts changes LGTM

Copy link
Contributor

@pmuellr pmuellr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left two comments to be considered. Not blockers, but will likely improve the overall aspect of the new task.

},

cancel: async () => {
this.abortController.abort('Timed out');
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the only reference to the abort controller, so basically it's a no-op. The controller will signal the abort, but with no code depending on it, nothing will happen. Suggest you pass it in to runBulkDelete and check it there, especially since it seems to be dealing with multiple operations ...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right. I wanted to use it but then noticed all the services used within runBulkDelete (especially deleteSLO) need to be updated to use an abortController, and stopped there. I'll refactor these services to use it.

.filter((result) => result.success === true)
.map((result) => result.id);

await Promise.all([
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So there are 3 independent things going on here - two DBQ's run "in parallel", followed by bulk rule deletion.

We've had issues in the past when ES runs into problems "in the middle" of things that you would like to wrap in a "transaction", so it would be good to think about this for a few minutes. Relatedly, you may not want to do the DBQ's "in parallel", since either one of them could fail - is the order important?

The basic question to ask is - if one of these operations fails, what is the state of the system after that? Is it possible for a user to "try again" or "continue"? Would it take manual intervention (customer using DevTools to make some HTTP requests) to get things "working" again?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Relatedly, you may not want to do the DBQ's "in parallel", since either one of them could fail - is the order important?

The order is not important since I'm deleting the data only for the successfully deleted SLO (therefore their transforms are not running anymore, and no new data is being produced). if the DBQ fails during their execution (they run async, and only preflight checks is done synchronously), I think that's acceptable, but you're right that I should handle the potential failure to continue to the next step regardless.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The basic question to ask is - if one of these operations fails, what is the state of the system after that? Is it possible for a user to "try again" or "continue"? Would it take manual intervention (customer using DevTools to make some HTTP requests) to get things "working" again?

So yes, I think catching the DBQ errors and continuing is enough.
Worst case, we did not delete the data: the user can still do it manually or not, the system won't care.

So I'll add a catch on these promises

@kdelemme
Copy link
Contributor Author

@pmuellr I've changed the flow to catch potential errors while scheduling DBQs, and i'm using the abortController in most services

@elasticmachine
Copy link
Contributor

elasticmachine commented Apr 24, 2025

💔 Build Failed

Failed CI Steps

History

@kdelemme kdelemme merged commit 1882697 into elastic:main Apr 25, 2025
9 checks passed
@kibanamachine
Copy link
Contributor

Starting backport for target branches: 8.19

https://github.com/elastic/kibana/actions/runs/14654066416

@kibanamachine
Copy link
Contributor

💔 All backports failed

Status Branch Result
8.19 Backport failed because of merge conflicts

You might need to backport the following PRs to 8.19:
- [SLO] Bulk Purge SLI data (#218287)
- [Cloud Connector] Add cloud_connectors config in Agentless API (#215421)

Manual backport

To create the backport manually run:

node scripts/backport --pr 217405

Questions ?

Please refer to the Backport tool documentation

kdelemme added a commit to kdelemme/kibana that referenced this pull request Apr 25, 2025
(cherry picked from commit 1882697)

# Conflicts:
#	x-pack/solutions/observability/plugins/slo/tsconfig.json
@kdelemme
Copy link
Contributor Author

💚 All backports created successfully

Status Branch Result
8.19

Note: Successful backport PRs will be merged automatically after passing CI.

Questions ?

Please refer to the Backport tool documentation

kdelemme added a commit that referenced this pull request Apr 28, 2025
# Backport

This will backport the following commits from `main` to `8.19`:
- [feat(slo): bulk delete APIs
(#217405)](#217405)

<!--- Backport version: 9.6.6 -->

### Questions ?
Please refer to the [Backport tool
documentation](https://github.com/sorenlouv/backport)

<!--BACKPORT [{"author":{"name":"Kevin
Delemme","email":"kevin.delemme@elastic.co"},"sourceCommit":{"committedDate":"2025-04-25T00:11:26Z","message":"feat(slo):
bulk delete APIs
(#217405)","sha":"18826975c7321cfa1a11d392d852e43e179d4f2f","branchLabelMapping":{"^v9.1.0$":"main","^v(\\d+).(\\d+).\\d+$":"$1.$2"}},"sourcePullRequest":{"labels":["release_note:skip","Team:obs-ux-management","backport:version","v9.1.0","v8.19.0"],"title":"feat(slo):
bulk delete
APIs","number":217405,"url":"https://github.com/elastic/kibana/pull/217405","mergeCommit":{"message":"feat(slo):
bulk delete APIs
(#217405)","sha":"18826975c7321cfa1a11d392d852e43e179d4f2f"}},"sourceBranch":"main","suggestedTargetBranches":["8.19"],"targetPullRequestStates":[{"branch":"main","label":"v9.1.0","branchLabelMappingKey":"^v9.1.0$","isSourceBranch":true,"state":"MERGED","url":"https://github.com/elastic/kibana/pull/217405","number":217405,"mergeCommit":{"message":"feat(slo):
bulk delete APIs
(#217405)","sha":"18826975c7321cfa1a11d392d852e43e179d4f2f"}},{"branch":"8.19","label":"v8.19.0","branchLabelMappingKey":"^v(\\d+).(\\d+).\\d+$","isSourceBranch":false,"state":"NOT_CREATED"}]}]
BACKPORT-->

---------

Co-authored-by: kibanamachine <42973632+kibanamachine@users.noreply.github.com>
akowalska622 pushed a commit to akowalska622/kibana that referenced this pull request May 29, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backport:version Backport to applied version labels release_note:skip Skip the PR/issue when compiling release notes Team:actionable-obs Formerly "obs-ux-management", responsible for SLO, o11y alerting, significant events, & synthetics. v8.19.0 v9.1.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants