Skip to content

Mark indices ready for frozen conversion in DLM service#144248

Merged
dakrone merged 19 commits intoelastic:mainfrom
dakrone:dlm-mark-indices-for-frozen
Mar 17, 2026
Merged

Mark indices ready for frozen conversion in DLM service#144248
dakrone merged 19 commits intoelastic:mainfrom
dakrone:dlm-mark-indices-for-frozen

Conversation

@dakrone
Copy link
Copy Markdown
Member

@dakrone dakrone commented Mar 13, 2026

This commit enhances the DLM service (DatastreamLifecycleService) to collect indices that are ready to be converted to a frozen index, and then mark those indices with the repository in which they should be converted.

This behavior is behind the DLM frozen feature flag, and the marking only happens if the cluster already has a configured repositories.default_repository setting.

This commit enhances the DLM service (`DatastreamLifecycleService`) to collect indices that are
ready to be converted to a frozen index, and then mark those indices with the repository in which
they should be converted.

This behavior is behind the DLM frozen feature flag, and the marking only happens if the cluster
already has a configured `repositories.default_repository` setting.
@elasticsearchmachine
Copy link
Copy Markdown
Collaborator

Pinging @elastic/es-storage-engine (Team:StorageEngine)

public void testLifecycleAppliedToFailureStore() throws Exception {
DataStreamLifecycle.Template lifecycle = DataStreamLifecycle.failuresLifecycleBuilder()
.dataRetention(TimeValue.timeValueSeconds(20))
.dataRetention(TimeValue.timeValueMinutes(20))
Copy link
Copy Markdown
Member Author

@dakrone dakrone Mar 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In case you are wondering why this seemingly unrelated change was made: I ran these tests many many times while I was developing the test for this PR. This one in particular was flaky, because on a slower machine the index ended up deleted before we could do the check. This change makes the test no longer flaky on my machine.

It does not actually change the test behavior, or what we're testing for this particular test.

@lukewhiting
Copy link
Copy Markdown
Contributor

@coderabbitai review

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Mar 16, 2026

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Mar 16, 2026

📝 Walkthrough

Walkthrough

This change implements a feature for marking Elasticsearch data stream indices as candidates for freezing. The implementation introduces a new cluster state task (MarkIndicesForFrozenTask) that annotates indices with repository metadata when a default repository is configured, alongside service-level methods to identify candidate indices, check freeze-marking status, and trigger marking operations. The lifecycle service is extended with default repository tracking, a new executor for processing freeze-marking tasks, and comprehensive unit and integration tests validating the logic.

Possibly related PRs

  • Ensure DLM only runs one general loop at a time #143883: Modifies DataStreamLifecycleService's run workflow logic — the related PR adds a dlmCurrentlyRunning guard, while this PR extends the run method to collect and mark frozen candidates alongside repository handling integration.
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • 🛠️ Update Documentation: Commit on current branch
  • 🛠️ Update Documentation: Create PR
📝 Coding Plan
  • Generate coding plan for human review comments

Warning

There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure.

🔧 ast-grep (0.41.1)
modules/data-streams/src/test/java/org/elasticsearch/datastreams/lifecycle/DataStreamLifecycleServiceTests.java

Comment @coderabbitai help to get the list of available commands and usage tips.

Tip

You can enable review details to help with troubleshooting, context usage and more.

Enable the reviews.review_details setting to include review details such as the model used, the time taken for each step and more in the review comments.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In
`@modules/data-streams/src/internalClusterTest/java/org/elasticsearch/datastreams/lifecycle/DataStreamLifecycleServiceIT.java`:
- Around line 999-1047: The test mutates every DataStreamLifecycleService by
calling setNowSupplier(...) and never restores it; wrap the mutation and
assertions in a try/finally: before you change suppliers capture each instance's
original supplier (iterate
internalCluster().getInstances(DataStreamLifecycleService.class) and store
current supplier references), set the test supplier (now::get /
twoDaysLater::get) as you do now, then in a finally block restore each
DataStreamLifecycleService by calling setNowSupplier(originalSupplier) so the
injected clock is reset; follow the same try/finally/reset pattern used in
testSystemDataStreamRetention and reference
DataStreamLifecycleService.setNowSupplier, internalCluster().getInstances,
now/twoDaysLater suppliers in your change.

In
`@modules/data-streams/src/main/java/org/elasticsearch/datastreams/lifecycle/DataStreamLifecycleService.java`:
- Around line 1975-1977: Fix the Javadoc typo in DataStreamLifecycleService by
updating the comment above the executor declaration that currently reads
"Executor for marking indices for conversation to frozen" to "Executor for
marking indices for conversion to frozen" (locate the Javadoc attached to the
executor field/method in DataStreamLifecycleService and correct the single
word).
- Around line 546-548: The catch in DataStreamLifecycleService currently logs
"Data stream lifecycle failed to mark candidates for converting to frozen index
for data stream [%s]" but no data stream name is available so the [%s]
placeholder remains literal; update the catch log to remove the unused
placeholder and log the exception as the throwable (e.g., change the message to
"Data stream lifecycle failed to mark candidates for converting to frozen index"
and pass the Exception e as the throwable) so the stacktrace is preserved and no
unsubstituted placeholder is emitted.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository YAML (base), Organization UI (inherited)

Review profile: CHILL

Plan: Pro

Run ID: 75823af3-d6c6-493a-b3b2-5cff04a91cdc

📥 Commits

Reviewing files that changed from the base of the PR and between 13b6b55 and 9be0828.

📒 Files selected for processing (6)
  • modules/data-streams/src/internalClusterTest/java/org/elasticsearch/datastreams/lifecycle/DataStreamLifecycleServiceIT.java
  • modules/data-streams/src/main/java/module-info.java
  • modules/data-streams/src/main/java/org/elasticsearch/datastreams/lifecycle/DataStreamLifecycleService.java
  • modules/data-streams/src/main/java/org/elasticsearch/datastreams/lifecycle/MarkIndicesForFrozenTask.java
  • modules/data-streams/src/test/java/org/elasticsearch/datastreams/lifecycle/DataStreamLifecycleServiceTests.java
  • modules/data-streams/src/test/java/org/elasticsearch/datastreams/lifecycle/MarkIndicesForFrozenTaskTests.java

Comment on lines +999 to +1047
Iterable<DataStreamLifecycleService> dataStreamLifecycleServices = internalCluster().getInstances(DataStreamLifecycleService.class);
Clock clock = Clock.systemUTC();
AtomicLong now = new AtomicLong(clock.millis());
dataStreamLifecycleServices.forEach(dataStreamLifecycleService -> dataStreamLifecycleService.setNowSupplier(now::get));

putComposableIndexTemplate(
"mytemplate",
null,
List.of("foo*"),
Settings.builder().put(IndexMetadata.SETTING_AUTO_EXPAND_REPLICAS, "0-1").build(),
null,
lifecycle,
null,
false
);

String dataStream = "foo-ds";
CreateDataStreamAction.Request createDataStreamRequest = new CreateDataStreamAction.Request(
TEST_REQUEST_TIMEOUT,
TEST_REQUEST_TIMEOUT,
dataStream
);
client().execute(CreateDataStreamAction.INSTANCE, createDataStreamRequest).get();

indexDocs(dataStream, randomIntBetween(10, 50));

// Let's verify the rollover
List<String> backingIndices = waitForDataStreamIndices(dataStream, 2, false);
String candidateIndex = backingIndices.get(0);
String writeIndex = backingIndices.get(1);

AtomicLong twoDaysLater = new AtomicLong(clock.millis() + TimeValue.timeValueDays(2).millis());
dataStreamLifecycleServices.forEach(dataStreamLifecycleService -> dataStreamLifecycleService.setNowSupplier(twoDaysLater::get));

assertBusy(() -> {
logger.info("--> checking to see if index has been marked for frozen");
ClusterStateResponse resp = client().execute(ClusterStateAction.INSTANCE, new ClusterStateRequest(TEST_REQUEST_TIMEOUT)).get();
ClusterState state = resp.getState();
String setRepo = Optional.ofNullable(state.metadata().getProject(Metadata.DEFAULT_PROJECT_ID))
.map(pm -> pm.index(candidateIndex))
.map(peek(im -> logger.info("--> found index {}", candidateIndex)))
.map(im -> im.getCustomData(DataStreamsPlugin.LIFECYCLE_CUSTOM_INDEX_METADATA_KEY))
.map(peek(custom -> logger.info("--> index {} has custom metadata: {}", candidateIndex, custom)))
.map(meta -> meta.get(DataStreamLifecycleService.FROZEN_CANDIDATE_REPOSITORY_METADATA_KEY))
.map(peek(repo -> logger.info("--> index {} has repo {} configured", candidateIndex, repo)))
.orElse("_unset_");
logger.info("--> repository set to: {}", setRepo);
assertThat(setRepo, equalTo(DEFAULT_REPO));
}, 30, TimeUnit.SECONDS);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Reset the injected clock in a finally.

This test mutates every DataStreamLifecycleService via setNowSupplier(...) and never restores it. With randomized internal-cluster test execution, later methods can inherit the frozen clock and fail nondeterministically. Please mirror the try/finally reset pattern already used in testSystemDataStreamRetention.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@modules/data-streams/src/internalClusterTest/java/org/elasticsearch/datastreams/lifecycle/DataStreamLifecycleServiceIT.java`
around lines 999 - 1047, The test mutates every DataStreamLifecycleService by
calling setNowSupplier(...) and never restores it; wrap the mutation and
assertions in a try/finally: before you change suppliers capture each instance's
original supplier (iterate
internalCluster().getInstances(DataStreamLifecycleService.class) and store
current supplier references), set the test supplier (now::get /
twoDaysLater::get) as you do now, then in a finally block restore each
DataStreamLifecycleService by calling setNowSupplier(originalSupplier) so the
injected clock is reset; follow the same try/finally/reset pattern used in
testSystemDataStreamRetention and reference
DataStreamLifecycleService.setNowSupplier, internalCluster().getInstances,
now/twoDaysLater suppliers in your change.

Copy link
Copy Markdown
Contributor

@lukewhiting lukewhiting left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good :-) Just a few bits to tidy up

@dakrone dakrone requested a review from lukewhiting March 16, 2026 15:02
Copy link
Copy Markdown
Contributor

@lukewhiting lukewhiting left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 👍🏻

}

// Update the custom metadata with the key (dlm_freeze_with) and value of the currently configured default repository
newMetadata.put(DataStreamLifecycleService.FROZEN_CANDIDATE_REPOSITORY_METADATA_KEY, defaultRepository);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

leaving a note to myself, i should probably update the converter method to use this metadata for the repository checks

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, once this is in.

Copy link
Copy Markdown
Contributor

@seanzatzdev seanzatzdev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@dakrone dakrone enabled auto-merge (squash) March 16, 2026 18:04
@dakrone dakrone merged commit 107688d into elastic:main Mar 17, 2026
36 checks passed
dakrone added a commit to dakrone/elasticsearch that referenced this pull request Mar 17, 2026
This framework was behind a feature flag (and thus not used in production). We are moving to a
different execution model for the frozen conversion.

Relates to elastic#144248 (comment)
dakrone added a commit to dakrone/elasticsearch that referenced this pull request Mar 17, 2026
Previously this method returned a list of indices, but it now returns a set. These indices don't
actually need to be iterated in order for the DLM actions.

Relates to elastic#144248 (comment)
dakrone added a commit that referenced this pull request Mar 17, 2026
This framework was behind a feature flag (and thus not used in production). We are moving to a
different execution model for the frozen conversion.

Relates to #144248 (comment)
dakrone added a commit that referenced this pull request Mar 18, 2026
* Refactor DataStream.getIndicesOlderThan to return a Set

Previously this method returned a list of indices, but it now returns a set. These indices don't
actually need to be iterated in order for the DLM actions.

Relates to #144248 (comment)

* Return unmodifiable set to be consistent

* Now that it's immutable, we need a mutable version in DLM service

* [CI] Auto commit changes from spotless

---------

Co-authored-by: elasticsearchmachine <infra-root+elasticsearchmachine@elastic.co>
dakrone added a commit to dakrone/elasticsearch that referenced this pull request Mar 18, 2026
As a result of elastic#144248 we mark the repository name to use for frozen conversion in the custom metadata of the backing index, so that a consistent repository is used for the whole process. This commit enhances the conversion faculties to use this repository name.
michalborek pushed a commit to michalborek/elasticsearch that referenced this pull request Mar 23, 2026
* Mark indices ready for frozen conversion in DLM service

This commit enhances the DLM service (`DatastreamLifecycleService`) to collect indices that are
ready to be converted to a frozen index, and then mark those indices with the repository in which
they should be converted.

This behavior is behind the DLM frozen feature flag, and the marking only happens if the cluster
already has a configured `repositories.default_repository` setting.

* Remove extra %s

* Let's not just talk about converting, let's actually convert

* Index -> Indices for executor

* Remove unused variable

* Add missing custom metadata in test

* Unset timer after test
michalborek pushed a commit to michalborek/elasticsearch that referenced this pull request Mar 23, 2026
This framework was behind a feature flag (and thus not used in production). We are moving to a
different execution model for the frozen conversion.

Relates to elastic#144248 (comment)
michalborek pushed a commit to michalborek/elasticsearch that referenced this pull request Mar 23, 2026
* Refactor DataStream.getIndicesOlderThan to return a Set

Previously this method returned a list of indices, but it now returns a set. These indices don't
actually need to be iterated in order for the DLM actions.

Relates to elastic#144248 (comment)

* Return unmodifiable set to be consistent

* Now that it's immutable, we need a mutable version in DLM service

* [CI] Auto commit changes from spotless

---------

Co-authored-by: elasticsearchmachine <infra-root+elasticsearchmachine@elastic.co>
dakrone added a commit that referenced this pull request Mar 26, 2026
…144511)

* Use marked repository name in `DataStreamLifecycleConvertToFrozen`

As a result of #144248 we mark the repository name to use for frozen conversion in the custom metadata of the backing index, so that a consistent repository is used for the whole process. This commit enhances the conversion faculties to use this repository name.
mamazzol pushed a commit to mamazzol/elasticsearch that referenced this pull request Mar 30, 2026
…lastic#144511)

* Use marked repository name in `DataStreamLifecycleConvertToFrozen`

As a result of elastic#144248 we mark the repository name to use for frozen conversion in the custom metadata of the backing index, so that a consistent repository is used for the whole process. This commit enhances the conversion faculties to use this repository name.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants