Skip to content

add unrecoverable error handling and clock for DLM#145132

Merged
seanzatzdev merged 25 commits intoelastic:mainfrom
seanzatzdev:dlm-improve-error-handling
Apr 2, 2026
Merged

add unrecoverable error handling and clock for DLM#145132
seanzatzdev merged 25 commits intoelastic:mainfrom
seanzatzdev:dlm-improve-error-handling

Conversation

@seanzatzdev
Copy link
Copy Markdown
Contributor

@seanzatzdev seanzatzdev commented Mar 27, 2026

This PR does the following:

  • adds some checks before each maybe...() call in the DLM frozen transition task to allow for better error handling in the case of the given repo or index being deleted unexpectedly
  • adds a clock to the constructor of the converter to allow the steps to use the current time.
  • renames dlm classes to use DLM as the prefix

closes https://github.com/elastic/elasticsearch-team/issues/2574

@seanzatzdev seanzatzdev changed the title add error handling add unrecoverable error handling and clock for DLM Mar 27, 2026
@seanzatzdev seanzatzdev marked this pull request as ready for review March 27, 2026 22:19
@seanzatzdev seanzatzdev requested review from dakrone and lukewhiting and removed request for dakrone March 27, 2026 22:19
@elasticsearchmachine elasticsearchmachine added the needs:triage Requires assignment of a team area label label Mar 27, 2026
@elasticsearchmachine elasticsearchmachine removed the needs:triage Requires assignment of a team area label label Mar 27, 2026
@elasticsearchmachine
Copy link
Copy Markdown
Collaborator

Pinging @elastic/es-storage-engine (Team:StorageEngine)

Copy link
Copy Markdown
Member

@dakrone dakrone left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left a couple of comments, are we going to pass in the ClusterService here to get a "fresh" state also?

@lukewhiting
Copy link
Copy Markdown
Contributor

lukewhiting commented Mar 30, 2026

Not sure if this is an improvement or just different but I think you can consolidate down the pre-checks if you 1) store state between steps in fields 2) invoke the steps by a method reference in a loop...

Something like this:

Functional_steps_with_consolidated_checks.patch

Subject: [PATCH] Functional steps with consolidated checks
---
Index: x-pack/plugin/dlm-frozen-transition/src/main/java/org/elasticsearch/xpack/dlm/frozen/DataStreamLifecycleConvertToFrozen.java
IDEA additional info:
Subsystem: com.intellij.openapi.diff.impl.patch.CharsetEP
<+>UTF-8
===================================================================
diff --git a/x-pack/plugin/dlm-frozen-transition/src/main/java/org/elasticsearch/xpack/dlm/frozen/DataStreamLifecycleConvertToFrozen.java b/x-pack/plugin/dlm-frozen-transition/src/main/java/org/elasticsearch/xpack/dlm/frozen/DataStreamLifecycleConvertToFrozen.java
--- a/x-pack/plugin/dlm-frozen-transition/src/main/java/org/elasticsearch/xpack/dlm/frozen/DataStreamLifecycleConvertToFrozen.java	(revision de661f409d41bcce4dd491aa0ef2c5ef8b1e3d44)
+++ b/x-pack/plugin/dlm-frozen-transition/src/main/java/org/elasticsearch/xpack/dlm/frozen/DataStreamLifecycleConvertToFrozen.java	(date 1774884364394)
@@ -76,6 +76,7 @@
     private final ProjectState projectState;
     private final XPackLicenseState licenseState;
     private final Clock clock;
+    private String indexForForceMerge;
 
     public DataStreamLifecycleConvertToFrozen(
         String indexName,
@@ -100,11 +101,19 @@
             return;
         }
         // Todo: WIP - steps will be implemented in follow-up PRs
-        maybeMarkIndexReadOnly();
-        String indexForForceMerge = maybeCloneIndex();
-        maybeForceMergeIndex(indexForForceMerge);
-        maybeTakeSnapshot();
-        maybeMountSearchableSnapshot();
+        List<Runnable> steps = List.of(
+            this::maybeMarkIndexReadOnly,
+            this::maybeCloneIndex,
+            this::maybeForceMergeIndex,
+            this::maybeTakeSnapshot,
+            this::maybeMountSearchableSnapshot
+        );
+
+        for (Runnable step : steps) {
+            checkIfThreadInterrupted();
+            isEligibleForConvertToFrozen();
+            step.run();
+        }
     }
 
     /**
@@ -117,8 +126,6 @@
      * exception or an unacknowledged response from the cluster.
      */
     public void maybeMarkIndexReadOnly() {
-        checkIfThreadInterrupted();
-        isEligibleForConvertToFrozen();
         if (isIndexReadOnly()) {
             logger.debug("Index [{}] is already marked as read-only, skipping to clone step", indexName);
             return;
@@ -148,11 +155,9 @@
      * Returns the name of the index to be used for force merge in the next step, which will be either the existing clone,
      * the original index (if it has 0 replicas), or a newly created clone.
      */
-    String maybeCloneIndex() {
-        checkIfThreadInterrupted();
-        isEligibleForConvertToFrozen();
+    void maybeCloneIndex() {
         if (isCloneNeeded() == false) {
-            return getIndexForForceMerge();
+            indexForForceMerge = getIndexForForceMerge();
         }
 
         String cloneIndexName = getDLMCloneIndexName();
@@ -167,7 +172,7 @@
                 );
             }
             logger.info("DLM successfully cloned index [{}] to index [{}]", indexName, cloneIndexName);
-            return cloneIndexName;
+            indexForForceMerge = cloneIndexName;
         } catch (Exception e) {
             if (e instanceof InterruptedException || ExceptionsHelper.unwrapCause(e) instanceof IndexNotFoundException) {
                 Thread.currentThread().interrupt();
@@ -189,9 +194,7 @@
         }
     }
 
-    public void maybeForceMergeIndex(String indexForForceMerge) {
-        checkIfThreadInterrupted();
-        isEligibleForConvertToFrozen();
+    public void maybeForceMergeIndex() {
         boolean indexMissing = Optional.ofNullable(projectState)
             .map(ProjectState::metadata)
             .map(metadata -> metadata.index(indexForForceMerge))
@@ -244,13 +247,9 @@
     }
 
     public void maybeTakeSnapshot() {
-        checkIfThreadInterrupted();
-        isEligibleForConvertToFrozen();
     }
 
     public void maybeMountSearchableSnapshot() {
-        checkIfThreadInterrupted();
-        isEligibleForConvertToFrozen();
     }
 
     private boolean isIndexReadOnly() {

@seanzatzdev seanzatzdev requested a review from dakrone March 30, 2026 20:58
Copy link
Copy Markdown
Member

@dakrone dakrone left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we gain little from having the error handling code de-duplicated. I prefer the older approach where we don't have class state, as I find it more readable. What do you think?

@seanzatzdev seanzatzdev requested a review from dakrone March 30, 2026 22:51
@seanzatzdev
Copy link
Copy Markdown
Contributor Author

working on updating the integ test

@seanzatzdev seanzatzdev requested review from a team as code owners March 31, 2026 17:53
@seanzatzdev seanzatzdev force-pushed the dlm-improve-error-handling branch from 30b4fbb to 28edc17 Compare March 31, 2026 17:54
@seanzatzdev seanzatzdev removed request for a team March 31, 2026 17:56
@seanzatzdev seanzatzdev marked this pull request as draft March 31, 2026 17:57
@seanzatzdev seanzatzdev force-pushed the dlm-improve-error-handling branch from 28edc17 to 27a5b45 Compare March 31, 2026 19:13
# Conflicts:
#	x-pack/plugin/dlm-frozen-transition/src/main/java/org/elasticsearch/xpack/dlm/frozen/DataStreamLifecycleConvertToFrozen.java
@seanzatzdev seanzatzdev force-pushed the dlm-improve-error-handling branch from 27a5b45 to b49790f Compare March 31, 2026 19:29
@seanzatzdev seanzatzdev marked this pull request as ready for review March 31, 2026 19:33
Copy link
Copy Markdown
Member

@dakrone dakrone left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, I left a few comments, but nothing blocking.

@seanzatzdev seanzatzdev enabled auto-merge (squash) March 31, 2026 23:40
Copy link
Copy Markdown
Contributor

@lukewhiting lukewhiting left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also happy with the changes. LGMT 👍🏻

@seanzatzdev seanzatzdev merged commit 423dd78 into elastic:main Apr 2, 2026
35 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants