Conversation
9ef9e00 to
a1395e1
Compare
The reroute behavior was embedded in CreateIndexClusterStateUpdateRequest and CreateDataStreamClusterStateUpdateRequest, while the reroute listener was passed separately. This change clarifies the relationship between the two by passing RerouteBehavior alongside the listener as a method parameter. It also enables the batched reroute logic in elastic#144074. Relates to ES-13198.
a1395e1 to
115beed
Compare
* Extract reroute behavior from create-index request classes The reroute behavior was embedded in CreateIndexClusterStateUpdateRequest and CreateDataStreamClusterStateUpdateRequest, while the reroute listener was passed separately. This change clarifies the relationship between the two by passing RerouteBehavior alongside the listener as a method parameter. It also enables the batched reroute logic in #144074. Relates to ES-13198. * Re-add deleted comment * Small fixes * Missing param in javadoc
Processes all the waiting create-index tasks in a single cluster-state update.
115beed to
afad4a3
Compare
| RerouteBehavior.SKIP_REROUTE, | ||
| rerouteCompletionIsNotRequired() | ||
| ); | ||
| taskContext.success(task.getAckListener(allocationActionMultiListener.delay(task.listener))); |
There was a problem hiding this comment.
The N ack listeners feel slightly wasteful since there's ultimately a single cluster state publication, but the batch executor API doesn't expose a batch-level ack listener. Combined with the need to delay all task listener responses until reroute() completes and the already existing AllocationActionMultiListener, your original proposal seems like the cleanest way to wire this up.
There was a problem hiding this comment.
Agreed, I'd like to move this whole acking thing elsewhere tbh but not today. Struggling to imagine a way we could be creating enough indices at once that the number of listeners here is a problem without having already hit some other way more serious problems first tho.
| /** | ||
| * Service responsible for submitting create index requests | ||
| */ | ||
| public class MetadataCreateIndexService { |
There was a problem hiding this comment.
Also noticed MetadataCreateIndexService is getting quite large and difficult to read. Outside the scope of this PR but I was wondering if it would make sense to extract some of the validation logic (which constitutes a large part of the code) into a dedicated class. Maybe a utils-style helper (like SnapshotUtils) or a validation-focused class (like AliasValidator). Any thoughts?
There was a problem hiding this comment.
Mmmaybe, I mean I agree it's not very nice to work with, but I'm not sure simply moving the code elsewhere will help. There's some underlying structure that I feel we're missing which would simplify this. Possibly some amount of Strategy pattern would help? Not sure really.
There was a problem hiding this comment.
Interesting, thanks for the link! Maybe that could work, although I'm not certain strategies would separate so cleanly here. I'll try to look into it a little more as a FLUP.
|
Pinging @elastic/es-distributed (Team:Distributed) |
…44140) * Extract reroute behavior from create-index request classes The reroute behavior was embedded in CreateIndexClusterStateUpdateRequest and CreateDataStreamClusterStateUpdateRequest, while the reroute listener was passed separately. This change clarifies the relationship between the two by passing RerouteBehavior alongside the listener as a method parameter. It also enables the batched reroute logic in elastic#144074. Relates to ES-13198. * Re-add deleted comment * Small fixes * Missing param in javadoc
I think this noise is caused by the fact that I added a ref to this PR in the actual commit messages of PR 144140 (oops). Will know to avoid adding those types of refs next time |
DaveCTurner
left a comment
There was a problem hiding this comment.
Overall looks good - a couple of top-level requests:
- Could you pull the change to
testValidateIndexNameout to a separate PR - Could you fix the position of the
RerouteBehaviorargument (either revert its changes here, or leave them as-is here but bringmainin line with a separate PR)
Other than that, a few inline comments mostly around testing.
| assertThat(successCount, equalTo(validIndicesNames.size())); | ||
| assertThat(invalidNameExceptionCount, equalTo(invalidNameCount)); | ||
| assertThat(alreadyExistsExceptionCount, equalTo(duplicateCount)); | ||
| assertThat(indexCreationExceptionCount, equalTo(invalidSettingsCount)); |
There was a problem hiding this comment.
Rather than just checking the counts, could we check that each request gets the outcome we expect? You can use ActionTestUtils#assertNoFailureListener for the expected successes, and ActionTestUtils#assertNoSuccessListener for the expected failures. Then we can wait on all the responses with a single CountDownLatch rather than multiple Future.get() calls each of which might take 30s to time out.
There was a problem hiding this comment.
Thanks a lot for the tip! Refactored in 3a5295
| allRequestNames.add(indexName); | ||
| } | ||
| // No collisions | ||
| assertThat(validIndicesNames.size(), equalTo(validRequestCount)); |
There was a problem hiding this comment.
Technically this could fail, with very low probability, but it's a little unclear to the reader what's happening here. We want N distinct strings, so I'd suggest adding a random string to a set repeatedly until the set's size is as desired.
There was a problem hiding this comment.
Addressed the possible collision in 3a5295
| invalidNameCount++; | ||
| } | ||
| case 1 -> { | ||
| final var indexName = randomIndexName(); |
There was a problem hiding this comment.
Likewise here, we need this not to be one of the valid index names. But it can match an invalid one since we won't be creating those indices.
| int failureType = validIndicesNames.isEmpty() ? randomIntBetween(0, 1) : randomIntBetween(0, 2); | ||
| switch (failureType) { | ||
| case 0 -> { | ||
| allRequestNames.add("INVALID_" + randomAlphaOfLength(6).toLowerCase(Locale.ROOT)); |
There was a problem hiding this comment.
Maybe randomIdentifier("INVALID_")? Or even randomIdentifier("INVALID_BECAUSE_UPPER_CASE_")?
| /** | ||
| * Service responsible for submitting create index requests | ||
| */ | ||
| public class MetadataCreateIndexService { |
There was a problem hiding this comment.
Mmmaybe, I mean I agree it's not very nice to work with, but I'm not sure simply moving the code elsewhere will help. There's some underlying structure that I feel we're missing which would simplify this. Possibly some amount of Strategy pattern would help? Not sure really.
| RerouteBehavior.SKIP_REROUTE, | ||
| rerouteCompletionIsNotRequired() | ||
| ); | ||
| taskContext.success(task.getAckListener(allocationActionMultiListener.delay(task.listener))); |
There was a problem hiding this comment.
Agreed, I'd like to move this whole acking thing elsewhere tbh but not today. Struggling to imagine a way we could be creating enough indices at once that the number of listeners here is a problem without having already hit some other way more serious problems first tho.
| Collections.shuffle(allRequestNames, random()); | ||
|
|
||
| final ClusterStateListener listener = event -> { | ||
| final var projectMetadata = event.state().metadata().getProject(ProjectId.DEFAULT); |
There was a problem hiding this comment.
Oh yes also any chance we can test batching across several projects? Not sure quite what that might entail for this test.
There was a problem hiding this comment.
So it's possible, see 23eabb.
That said, it does require overriding the project resolver to use TestOnlyMultiProjectResolver (instead of the DefaultProjectResolver), which isn't used in production. So we're exercising a somewhat artificial code path to test a future feature. It also requires a new gradle import (MP IT testing does not appear to be leveraged other tests in this package). We could consider splitting into two test classes (one for single-project and one for multi-project) to keep coverage for both or we could drop the multi-project testing for now and revisit when multi-project is closer to production-ready. Let me know your thoughts on this.
Side note, I have not yet managed to make the multi project setup work without disabling routing via routing.allocation.enable: none and using waitForActiveShards(NONE). Still looking into it though
There was a problem hiding this comment.
I see ok, thanks for investigating so deeply! The need for the new Gradle import tells us that nothing else is testing in this way today and there's nothing particularly special about these tests in terms of multi-project support so let's back this out and leave it for the multi-project team to strengthen all these tests when the time is right.
There was a problem hiding this comment.
Sounds good to me, thanks!
| final var indexName = addRandomIndexNameNoCollision(allIndexNames); | ||
| validIndicesByProject.putIfAbsent(projectId, new HashSet<>()); | ||
| validIndicesByProject.get(projectId).add(indexName); | ||
| allRequests.add(new CreateIndexRequestSpec(indexName, false, CreateIndexResult.SUCCESS, projectId)); |
There was a problem hiding this comment.
Do we need to track all the request-specs in a separate data structure like this? I would expect we can do this with one loop, after the master service is blocked, which constructs each request together with its expected-result listener, and immediately sends it.
There was a problem hiding this comment.
You are right, and that's much nicer, thanks! Refactored as suggested in d1e63e
|
part-1 test failure in |
DaveCTurner
left a comment
There was a problem hiding this comment.
LGTM great stuff. I think this should be an >enhancement rather than >non-issue- it very much deserves a mention in the release notes.
|
Hi @inespot, I've created a changelog YAML for you. |
…44140) * Extract reroute behavior from create-index request classes The reroute behavior was embedded in CreateIndexClusterStateUpdateRequest and CreateDataStreamClusterStateUpdateRequest, while the reroute listener was passed separately. This change clarifies the relationship between the two by passing RerouteBehavior alongside the listener as a method parameter. It also enables the batched reroute logic in elastic#144074. Relates to ES-13198. * Re-add deleted comment * Small fixes * Missing param in javadoc
* Batch create-index tasks Processes all the waiting create-index tasks in a single cluster-state update. * Small nits and fixes * Add small comment * Revert param position move * Add failure testing + fix unused variables * More testing + style fixes * Rename + small javadoc * More nits * Further strengthen tests * Tests improvement: no collision, latches & nits * testCreateIndexBatching supports multi projects * Revert "testCreateIndexBatching supports multi projects" This reverts commit 23eabb5. * Test consolidate request build and send * Missed this one * Update docs/changelog/144074.yaml --------- Co-authored-by: David Turner <david.turner@elastic.co>
Processes all the waiting create-index tasks in a single cluster-state update.
Relates to ES-13198