Batch index creation by inespot · Pull Request #144074 · elastic/elasticsearch

inespot · 2026-03-12T02:07:09Z

Processes all the waiting create-index tasks in a single cluster-state update.

Relates to ES-13198

The reroute behavior was embedded in CreateIndexClusterStateUpdateRequest and CreateDataStreamClusterStateUpdateRequest, while the reroute listener was passed separately. This change clarifies the relationship between the two by passing RerouteBehavior alongside the listener as a method parameter. It also enables the batched reroute logic in elastic#144074. Relates to ES-13198.

* Extract reroute behavior from create-index request classes The reroute behavior was embedded in CreateIndexClusterStateUpdateRequest and CreateDataStreamClusterStateUpdateRequest, while the reroute listener was passed separately. This change clarifies the relationship between the two by passing RerouteBehavior alongside the listener as a method parameter. It also enables the batched reroute logic in #144074. Relates to ES-13198. * Re-add deleted comment * Small fixes * Missing param in javadoc

Processes all the waiting create-index tasks in a single cluster-state update.

inespot · 2026-03-14T03:30:04Z

server/src/main/java/org/elasticsearch/cluster/metadata/MetadataCreateIndexService.java

+                        RerouteBehavior.SKIP_REROUTE,
+                        rerouteCompletionIsNotRequired()
+                    );
+                    taskContext.success(task.getAckListener(allocationActionMultiListener.delay(task.listener)));


The N ack listeners feel slightly wasteful since there's ultimately a single cluster state publication, but the batch executor API doesn't expose a batch-level ack listener. Combined with the need to delay all task listener responses until reroute() completes and the already existing AllocationActionMultiListener, your original proposal seems like the cleanest way to wire this up.

Agreed, I'd like to move this whole acking thing elsewhere tbh but not today. Struggling to imagine a way we could be creating enough indices at once that the number of listeners here is a problem without having already hit some other way more serious problems first tho.

inespot · 2026-03-14T03:35:25Z

server/src/main/java/org/elasticsearch/cluster/metadata/MetadataCreateIndexService.java

 /**
 * Service responsible for submitting create index requests
 */
 public class MetadataCreateIndexService {


Also noticed MetadataCreateIndexService is getting quite large and difficult to read. Outside the scope of this PR but I was wondering if it would make sense to extract some of the validation logic (which constitutes a large part of the code) into a dedicated class. Maybe a utils-style helper (like SnapshotUtils) or a validation-focused class (like AliasValidator). Any thoughts?

Mmmaybe, I mean I agree it's not very nice to work with, but I'm not sure simply moving the code elsewhere will help. There's some underlying structure that I feel we're missing which would simplify this. Possibly some amount of Strategy pattern would help? Not sure really.

Interesting, thanks for the link! Maybe that could work, although I'm not certain strategies would separate so cleanly here. I'll try to look into it a little more as a FLUP.

elasticsearchmachine · 2026-03-14T04:09:50Z

Pinging @elastic/es-distributed (Team:Distributed)

…44140) * Extract reroute behavior from create-index request classes The reroute behavior was embedded in CreateIndexClusterStateUpdateRequest and CreateDataStreamClusterStateUpdateRequest, while the reroute listener was passed separately. This change clarifies the relationship between the two by passing RerouteBehavior alongside the listener as a method parameter. It also enables the batched reroute logic in elastic#144074. Relates to ES-13198. * Re-add deleted comment * Small fixes * Missing param in javadoc

inespot · 2026-03-16T13:45:13Z

ncordon pushed a commit to ncordon/elasticsearch that referenced this pull request [2 hours ago]

I think this noise is caused by the fact that I added a ref to this PR in the actual commit messages of PR 144140 (oops). Will know to avoid adding those types of refs next time

DaveCTurner

Overall looks good - a couple of top-level requests:

Could you pull the change to testValidateIndexName out to a separate PR
Could you fix the position of the RerouteBehavior argument (either revert its changes here, or leave them as-is here but bring main in line with a separate PR)

Other than that, a few inline comments mostly around testing.

DaveCTurner · 2026-03-19T19:59:20Z

...nternalClusterTest/java/org/elasticsearch/cluster/metadata/MetadataCreateIndexServiceIT.java

+            assertThat(successCount, equalTo(validIndicesNames.size()));
+            assertThat(invalidNameExceptionCount, equalTo(invalidNameCount));
+            assertThat(alreadyExistsExceptionCount, equalTo(duplicateCount));
+            assertThat(indexCreationExceptionCount, equalTo(invalidSettingsCount));


Rather than just checking the counts, could we check that each request gets the outcome we expect? You can use ActionTestUtils#assertNoFailureListener for the expected successes, and ActionTestUtils#assertNoSuccessListener for the expected failures. Then we can wait on all the responses with a single CountDownLatch rather than multiple Future.get() calls each of which might take 30s to time out.

Thanks a lot for the tip! Refactored in 3a5295

DaveCTurner · 2026-03-19T20:01:21Z

...nternalClusterTest/java/org/elasticsearch/cluster/metadata/MetadataCreateIndexServiceIT.java

+            allRequestNames.add(indexName);
+        }
+        // No collisions
+        assertThat(validIndicesNames.size(), equalTo(validRequestCount));


Technically this could fail, with very low probability, but it's a little unclear to the reader what's happening here. We want N distinct strings, so I'd suggest adding a random string to a set repeatedly until the set's size is as desired.

Addressed the possible collision in 3a5295

DaveCTurner · 2026-03-19T20:02:38Z

...nternalClusterTest/java/org/elasticsearch/cluster/metadata/MetadataCreateIndexServiceIT.java

+                    invalidNameCount++;
+                }
+                case 1 -> {
+                    final var indexName = randomIndexName();


Likewise here, we need this not to be one of the valid index names. But it can match an invalid one since we won't be creating those indices.

Done in 3a5295

DaveCTurner · 2026-03-19T20:02:52Z

...nternalClusterTest/java/org/elasticsearch/cluster/metadata/MetadataCreateIndexServiceIT.java

+            int failureType = validIndicesNames.isEmpty() ? randomIntBetween(0, 1) : randomIntBetween(0, 2);
+            switch (failureType) {
+                case 0 -> {
+                    allRequestNames.add("INVALID_" + randomAlphaOfLength(6).toLowerCase(Locale.ROOT));


Maybe randomIdentifier("INVALID_")? Or even randomIdentifier("INVALID_BECAUSE_UPPER_CASE_")?

Done in 3a5295

DaveCTurner · 2026-03-19T20:04:43Z

server/src/main/java/org/elasticsearch/cluster/metadata/MetadataCreateIndexService.java

 /**
 * Service responsible for submitting create index requests
 */
 public class MetadataCreateIndexService {


Mmmaybe, I mean I agree it's not very nice to work with, but I'm not sure simply moving the code elsewhere will help. There's some underlying structure that I feel we're missing which would simplify this. Possibly some amount of Strategy pattern would help? Not sure really.

DaveCTurner · 2026-03-19T20:07:20Z

server/src/main/java/org/elasticsearch/cluster/metadata/MetadataCreateIndexService.java

+                        RerouteBehavior.SKIP_REROUTE,
+                        rerouteCompletionIsNotRequired()
+                    );
+                    taskContext.success(task.getAckListener(allocationActionMultiListener.delay(task.listener)));


Agreed, I'd like to move this whole acking thing elsewhere tbh but not today. Struggling to imagine a way we could be creating enough indices at once that the number of listeners here is a problem without having already hit some other way more serious problems first tho.

DaveCTurner · 2026-03-19T20:14:49Z

...nternalClusterTest/java/org/elasticsearch/cluster/metadata/MetadataCreateIndexServiceIT.java

+        Collections.shuffle(allRequestNames, random());
+
+        final ClusterStateListener listener = event -> {
+            final var projectMetadata = event.state().metadata().getProject(ProjectId.DEFAULT);


Oh yes also any chance we can test batching across several projects? Not sure quite what that might entail for this test.

So it's possible, see 23eabb.
That said, it does require overriding the project resolver to use TestOnlyMultiProjectResolver (instead of the DefaultProjectResolver), which isn't used in production. So we're exercising a somewhat artificial code path to test a future feature. It also requires a new gradle import (MP IT testing does not appear to be leveraged other tests in this package). We could consider splitting into two test classes (one for single-project and one for multi-project) to keep coverage for both or we could drop the multi-project testing for now and revisit when multi-project is closer to production-ready. Let me know your thoughts on this.
Side note, I have not yet managed to make the multi project setup work without disabling routing via routing.allocation.enable: none and using waitForActiveShards(NONE). Still looking into it though

I see ok, thanks for investigating so deeply! The need for the new Gradle import tells us that nothing else is testing in this way today and there's nothing particularly special about these tests in terms of multi-project support so let's back this out and leave it for the multi-project team to strengthen all these tests when the time is right.

Sounds good to me, thanks!

DaveCTurner · 2026-03-20T07:33:18Z

...nternalClusterTest/java/org/elasticsearch/cluster/metadata/MetadataCreateIndexServiceIT.java

+            final var indexName = addRandomIndexNameNoCollision(allIndexNames);
+            validIndicesByProject.putIfAbsent(projectId, new HashSet<>());
+            validIndicesByProject.get(projectId).add(indexName);
+            allRequests.add(new CreateIndexRequestSpec(indexName, false, CreateIndexResult.SUCCESS, projectId));


Do we need to track all the request-specs in a separate data structure like this? I would expect we can do this with one loop, after the master service is blocked, which constructs each request together with its expected-result listener, and immediately sends it.

You are right, and that's much nicer, thanks! Refactored as suggested in d1e63e

This reverts commit 23eabb5.

inespot · 2026-03-20T14:40:34Z

part-1 test failure in TsidExtractingIdFieldMapperTests, which should be unrelated to this PR. Previous runs were passing. I'll merge in main later today, CI tends to be more stable during US evenings 🙂

DaveCTurner

LGTM great stuff. I think this should be an >enhancement rather than >non-issue- it very much deserves a mention in the release notes.

elasticsearchmachine · 2026-03-20T17:09:42Z

Hi @inespot, I've created a changelog YAML for you.

…44140) * Extract reroute behavior from create-index request classes The reroute behavior was embedded in CreateIndexClusterStateUpdateRequest and CreateDataStreamClusterStateUpdateRequest, while the reroute listener was passed separately. This change clarifies the relationship between the two by passing RerouteBehavior alongside the listener as a method parameter. It also enables the batched reroute logic in elastic#144074. Relates to ES-13198. * Re-add deleted comment * Small fixes * Missing param in javadoc

* Batch create-index tasks Processes all the waiting create-index tasks in a single cluster-state update. * Small nits and fixes * Add small comment * Revert param position move * Add failure testing + fix unused variables * More testing + style fixes * Rename + small javadoc * More nits * Further strengthen tests * Tests improvement: no collision, latches & nits * testCreateIndexBatching supports multi projects * Revert "testCreateIndexBatching supports multi projects" This reverts commit 23eabb5. * Test consolidate request build and send * Missed this one * Update docs/changelog/144074.yaml --------- Co-authored-by: David Turner <david.turner@elastic.co>

elasticsearchmachine added the v9.4.0 label Mar 12, 2026

inespot force-pushed the feature/batch-index-creation branch 2 times, most recently from 9ef9e00 to a1395e1 Compare March 12, 2026 17:38

inespot mentioned this pull request Mar 12, 2026

Extract reroute behavior from create-index request classes #144140

Merged

inespot force-pushed the feature/batch-index-creation branch from a1395e1 to 115beed Compare March 13, 2026 02:20

DaveCTurner and others added 3 commits March 13, 2026 12:35

Batch create-index tasks

0010211

Processes all the waiting create-index tasks in a single cluster-state update.

Small nits and fixes

3cc912d

Add small comment

afad4a3

inespot force-pushed the feature/batch-index-creation branch from 115beed to afad4a3 Compare March 13, 2026 16:35

inespot and others added 6 commits March 13, 2026 16:25

Revert param position move

bcdfc8c

Add failure testing + fix unused variables

4fddbce

More testing + style fixes

0e8d371

Merge branch 'main' into feature/batch-index-creation

04e9043

Rename + small javadoc

da1a6a2

More nits

0a237dc

inespot commented Mar 14, 2026

View reviewed changes

inespot marked this pull request as ready for review March 14, 2026 04:08

elasticsearchmachine added the needs:triage Requires assignment of a team area label label Mar 14, 2026

inespot added >non-issue :Distributed/Distributed A catch all label for anything in the Distributed Area. Please avoid if you can. and removed needs:triage Requires assignment of a team area label labels Mar 14, 2026

elasticsearchmachine added the Team:Distributed Meta label for distributed team. label Mar 14, 2026

inespot requested a review from DaveCTurner March 16, 2026 02:25

Further strengthen tests

e23f2f1

Merge branch 'main' into feature/batch-index-creation

cf274f7

DaveCTurner reviewed Mar 19, 2026

View reviewed changes

This was referenced Mar 19, 2026

Clean up unused variables in MetadataCreateIndexServiceTests #144609

Merged

Consistent rerouteBehavior parameter position #144610

Merged

inespot and others added 3 commits March 19, 2026 18:59

Tests improvement: no collision, latches & nits

3a52952

testCreateIndexBatching supports multi projects

23eabb5

Merge branch 'main' into feature/batch-index-creation

8a60cc3

DaveCTurner reviewed Mar 20, 2026

View reviewed changes

inespot and others added 4 commits March 20, 2026 08:38

Revert "testCreateIndexBatching supports multi projects"

5a74bc6

This reverts commit 23eabb5.

Test consolidate request build and send

d1e63eb

Merge branch 'main' into feature/batch-index-creation

79a14af

Missed this one

216bebe

inespot requested a review from DaveCTurner March 20, 2026 14:41

DaveCTurner approved these changes Mar 20, 2026

View reviewed changes

inespot added >enhancement and removed >non-issue labels Mar 20, 2026

Update docs/changelog/144074.yaml

4128e56

Merge branch 'main' into feature/batch-index-creation

1eef9fe

inespot merged commit 248bbe1 into elastic:main Mar 20, 2026
36 checks passed

Conversation

inespot commented Mar 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

elasticsearchmachine commented Mar 14, 2026

Uh oh!

inespot commented Mar 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

DaveCTurner left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

inespot commented Mar 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

DaveCTurner left a comment

Choose a reason for hiding this comment

Uh oh!

elasticsearchmachine commented Mar 20, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

inespot commented Mar 12, 2026 •

edited

Loading

inespot commented Mar 16, 2026 •

edited

Loading

inespot commented Mar 20, 2026 •

edited

Loading