Skip to content

Add support in SM Plugin to delete snapshots created manually#1452

Merged
bowenlan-amzn merged 12 commits intoopensearch-project:mainfrom
Tarun-kishore:feature/snapshot-deletion-support
Sep 29, 2025
Merged

Add support in SM Plugin to delete snapshots created manually#1452
bowenlan-amzn merged 12 commits intoopensearch-project:mainfrom
Tarun-kishore:feature/snapshot-deletion-support

Conversation

@Tarun-kishore
Copy link
Contributor

Description

This PR introduces optional creation workflows and snapshot pattern support for Snapshot Management policies, enabling users to create deletion-only policies and manage external snapshots alongside policy-created ones.

Key Changes

1. Optional Creation Field

  • Made the creation field optional in SMPolicy, allowing deletion-only policies
  • Added version compatibility checks (Version.V_3_2_0) for backward compatibility
  • Updated JSON parsing and serialization to handle nullable creation workflows

2. Snapshot Pattern Support

  • Added snapshotPattern field to SMPolicy.Deletion to include external snapshots in deletion workflows
  • Enhanced deletion states to combine policy-created and pattern-matched snapshots
  • Applied deletion conditions (max_age, min_count) across all matching snapshots

3. Enhanced State Machine Logic

  • Updated DeletingState to retrieve and combine snapshots from both policy and pattern sources
  • Modified DeletionFinishedState to handle completion logic for pattern-based deletions
  • Ensured proper state transitions for deletion-only workflows

4. Index Mapping Updates

  • Added snapshot_pattern field to deletion properties in index mappings
  • Updated both main and test mapping files to support strict mode

Use Cases Enabled

  • Cleanup-only workflows: Delete old snapshots without automatic creation
  • External snapshot management: Include snapshots created by external tools in retention policies
  • Flexible policy configurations: Support creation-only, deletion-only, or combined workflows

Backward Compatibility

All changes are fully backward compatible:

  • Existing policies with required creation fields continue to work unchanged
  • Version checks ensure proper serialization/deserialization across different versions
  • No breaking changes to existing APIs or data structures
  • No data migration required

Testing

Comprehensive test coverage including:

  • Unit tests for optional creation, snapshot patterns, and serialization
  • State machine tests for deletion-only and combined workflows
  • Integration tests for REST API policy creation

Related Issues

Resolves #867

Check List

  • New functionality includes testing.
  • New functionality has been documented.
  • API changes companion pull request created.
  • Commits are signed per the DCO using --signoff.
  • Public documentation issue/PR created.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

@Tarun-kishore
Copy link
Contributor Author

Hey @bowenlan-amzn — since you were involved in the original issue, would you be open to reviewing the PR when you get a chance? Appreciate your insights!

@bowenlan-amzn
Copy link
Member

Will find time to do a review

Copy link
Member

@bowenlan-amzn bowenlan-amzn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overrall looks pretty good!

I think as long as user only use the deletion-only policy after fully upgraded. There shouldn't be any issue. We can call this out in the documentation.

It's unfortunate we don't have code coverage, for some reason it hasn't been working for a long time. But I can see tests being added.

@bowenlan-amzn bowenlan-amzn self-assigned this Aug 4, 2025
@Tarun-kishore
Copy link
Contributor Author

I think as long as user only use the deletion-only policy after fully upgraded. There shouldn't be any issue. We can call this out in the documentation.

Yes, I'll update opensearch documentation after this PR is merged to use deletion-only policy after fully upgrading cluster.

It's unfortunate we don't have code coverage, for some reason it hasn't been working for a long time. But I can see tests being added.

Adding a ci check for code coverage (or preferrably new line changes) would be nice, it would ensure that tests are covering all cases.

@Tarun-kishore Tarun-kishore force-pushed the feature/snapshot-deletion-support branch 3 times, most recently from 461d4d8 to 9f464fb Compare August 4, 2025 09:04
@Tarun-kishore
Copy link
Contributor Author

These CI failures look unrelated to this change, Other integration tests are failing which are not being reproduced in local environment.

@codecov
Copy link

codecov bot commented Aug 8, 2025

Codecov Report

❌ Patch coverage is 76.64234% with 32 lines in your changes missing coverage. Please review.
✅ Project coverage is 76.37%. Comparing base (f67a308) to head (b0ff462).
⚠️ Report is 1 commits behind head on main.

Files with missing lines Patch % Lines
...xmanagement/snapshotmanagement/model/SMMetadata.kt 68.57% 0 Missing and 11 partials ⚠️
...nt/engine/states/creation/CreationFinishedState.kt 60.00% 0 Missing and 6 partials ⚠️
...rch/indexmanagement/snapshotmanagement/SMRunner.kt 76.92% 0 Missing and 3 partials ⚠️
...management/engine/states/creation/CreatingState.kt 57.14% 0 Missing and 3 partials ⚠️
...gement/snapshotmanagement/engine/SMStateMachine.kt 50.00% 0 Missing and 2 partials ⚠️
...ngine/states/creation/CreationConditionMetState.kt 89.47% 0 Missing and 2 partials ⚠️
...management/engine/states/deletion/DeletingState.kt 75.00% 0 Missing and 2 partials ⚠️
...dexmanagement/snapshotmanagement/model/SMPolicy.kt 90.47% 0 Missing and 2 partials ⚠️
...nt/engine/states/deletion/DeletionFinishedState.kt 83.33% 0 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1452      +/-   ##
==========================================
+ Coverage   76.22%   76.37%   +0.14%     
==========================================
  Files         375      375              
  Lines       18773    18835      +62     
  Branches     2366     2401      +35     
==========================================
+ Hits        14310    14385      +75     
+ Misses       3225     3202      -23     
- Partials     1238     1248      +10     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

… manually

Signed-off-by: Tarun-kishore <tarun2kishore@gmail.com>
Signed-off-by: Tarun-kishore <tarun2kishore@gmail.com>
Signed-off-by: Tarun-kishore <tarun2kishore@gmail.com>
@Tarun-kishore Tarun-kishore force-pushed the feature/snapshot-deletion-support branch 6 times, most recently from 28d1e94 to efbdcc8 Compare August 8, 2025 11:24
Signed-off-by: Tarun-kishore <tarun2kishore@gmail.com>
@Tarun-kishore Tarun-kishore force-pushed the feature/snapshot-deletion-support branch from efbdcc8 to a1a3097 Compare August 8, 2025 11:28
@Tarun-kishore
Copy link
Contributor Author

Hi @bowenlan-amzn, I've updated version check to V_3_3_0. Please review when you get some time.

@Tarun-kishore
Copy link
Contributor Author

Hi @bowenlan-amzn , gentle ping for review

Copy link
Member

@bowenlan-amzn bowenlan-amzn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, left some comments. Let's target get this in for 3.3 which is by this month I remember.

val policySnapshots = getSnapshotsRes.snapshots

// Get pattern-based snapshots if pattern is specified
val patternSnapshotsResult = DeletionStateUtils.getPatternSnapshots(context, metadataBuilder)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we reuse logic from existing getSnapshots please.

And can we combine these 2 calls into one?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated to use getSnapshots,
If we want to use single call, we'll have to make a call for sm-policy-name*,snapshot-pattern* pattern, which will return snapshots matching both of them, but then we'll need to decide which snapshot to filter based on policy, which increase computation on SM logic.

What do you prefer, single call with slightly more computation for prefix matching or two calls without any need to prefix matching in SM code?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can only define one condition for deletion. I am refering this doc.
So the same filtering logic apply to all the snapshots, then I don't feel there would be much overhead on SM side.

Please correct if I misunderstood sth.

Copy link
Contributor Author

@Tarun-kishore Tarun-kishore Sep 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently, deletion supports deleting snapshot made by SM policy, It does that by getting snapshots with pattern policy-name* and checking if they are made by the policy or not.

This feature adds a new inputsnapshotPattern to deletion, which will delete snapshots with specified pattern irrespective of policy. This input is an addition to already deletion feature where snapshots by current policy are deleted.

These two deletion has difference of filtering, In first one (deletion of snapshot made by policy), we have filtering, while we don't have it in second one (snapshot specified by pattern input).

I hope this clears up any confusion.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I just remember this. We are refering to this filterBySMPolicyInSnapshotMetadata.
I think using one call is better, easier to handle the failure and retry, also network call is bigger overhead.
Can modify this filtering to only on the snapshots started with policy name I guess.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated to use single call for getting snapshot instead of two.

Signed-off-by: Tarun-kishore <tarun2kishore@gmail.com>
@Tarun-kishore
Copy link
Contributor Author

Addressed comments, I do see CI failure but those seems to be unrelated to this change.

Signed-off-by: Tarun-kishore <tarun2kishore@gmail.com>
@Tarun-kishore Tarun-kishore force-pushed the feature/snapshot-deletion-support branch from 96fe81c to 251cb4d Compare September 6, 2025 07:15
@Tarun-kishore
Copy link
Contributor Author

Hi @bowenlan-amzn, gentle ping for review.

bowenlan-amzn
bowenlan-amzn previously approved these changes Sep 25, 2025
@bowenlan-amzn
Copy link
Member

@vikasvb90 could use a second eye here, please review if you or anyone get time. Plan to merge in by next Monday to catch 3.3.

Co-authored-by: bowenlan-amzn <bowenlan23@gmail.com>
Signed-off-by: Tarun Kishore <75606327+Tarun-kishore@users.noreply.github.com>
@bowenlan-amzn
Copy link
Member

@Tarun-kishore will fix the build by this weekend and update this PR #1491

@bowenlan-amzn bowenlan-amzn merged commit 1a4416b into opensearch-project:main Sep 29, 2025
25 of 26 checks passed
@Tarun-kishore Tarun-kishore deleted the feature/snapshot-deletion-support branch September 29, 2025 09:43
Tarun-kishore added a commit to Tarun-kishore/documentation-website that referenced this pull request Sep 29, 2025
This commit updated documentation for snapshot management API to include feature introduced in opensearch-project/index-management#1452 

Signed-off-by: Tarun Kishore <75606327+Tarun-kishore@users.noreply.github.com>
@4orty
Copy link

4orty commented Dec 10, 2025

@Tarun-kishore

First of all, thank you for providing the feature I was looking for !!

Can you give me HTTP Request example for delete-only policy?

Here is my test reqeust but it seems like it doesn't work.

opensearch version: 3.3.2
opensearch-dashboard version: 3.3.0

POST _plugins/_sm/policies/deletion_only_policy

{
  "snapshot_config": {
    "repository": "my_test_repo"
  }.
  "deletion": {
    "schedule": {
      "cron": {
        "expression": "* * * * *",
        "timezone": "UTC"
      }
    },
    "condition": {
      "max_age": "1h",
      "min_count": 1
    },
    "time_limit": "1h",
    "snapshot_pattern": "example-*"
  }
}

I have three snapshots and all the snapshots made by index management policy. (not snapshot management, so policy field emptyed)
example-snapshot-2025.12.10-05:02:34.802
example-snapshot-2025.12.09-07:45:07.838
example-snapshot-2025.12.09-07:09:50.170

Opensearch Log: No snapshots found under policy while getting snapshots to decide which snapshots to delete.

I expected snapshots created before 1 hour ago would be deleted as soon as I created the delete-only policy but none of the snapshots were deleted

And the delete request is okay in Dev tools, but the error occurred when I clicked snapshot management in openserach Dashboard after delete policy created.

  • one more question
    Does snapshot_config affect the delete-only policy except repository field ? For example, incides field. If delete-only policy only use repository field in snapshot_config, I think it seems approciated for the configuration values to be seperated.

@Tarun-kishore
Copy link
Contributor Author

Your policy looks correct. There was a bug which was causing the snapshot not found issue. It has been fixed in this PR: #1503.

Does snapshot_config affect the delete-only policy except repository field ? For example, incides field. If delete-only policy only use repository field in snapshot_config, I think it seems approciated for the configuration values to be seperated.

For a delete-only policy, the only relevant field in snapshot_config is the repository. However, I don’t think it should be separated. In most cases, creation and deletion are expected to use the same repository, and introducing a separate deletion-specific config would create ambiguity around priority. Keeping a single snapshot_config keeps the model simple and avoids those conflicts.

@4orty
Copy link

4orty commented Dec 11, 2025

@Tarun-kishore

Thanks for the very quick reply!! It seems like the bug fix change not released yet and I have to wait release 3.4 to use delete-only policy, right? And the title of #1503 says 'comma-seperated', but does that mean nothing gets removed regardless of commas? I'm asking because my snapshot_pattern regex doesn't include any commas.

Regarding the repository field, I also think creation and deletion are expected to use the same repository. So the structure I thought was

{
  "repository": {}
  "creation": {
    "snapshot_config": {}
  }
  "deletion": {}
}

But it's not a big issue, so you can just ignore what I said. I just wanted to mention it.

@Tarun-kishore
Copy link
Contributor Author

@4orty During deletion, policy snapshots are added by default. So, if your policy name is sm-policy and snapshot pattern is snap-pattern, then it will look for snapshots snap-pattern*,sm-policy*. However, that doesn't mean that sm-policy* pattern snapshots will be deleted if they are not made by the policy as there is a filter which checks that snapshot should either be made by the policy or it should match snapshot_pattern.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[FEATURE] SM to support deletion of snapshots created manually

3 participants