Add INDEX_REFRESH_BLOCK by tlrx · Pull Request #117543 · elastic/elasticsearch

tlrx · 2024-11-26T10:32:14Z

This pull request adds a new ClusterBlockLevel called REFRESH. This level is used in a new ClusterBlock.INDEX_REFRESH_BLOCK which is automatically added to new indices that are created from empty store, with replicas, and only on serverless deployments that have a feature flag enabled. This block is also only added when all nodes of a cluster are in a recent enough transport version.

If for some reason the new ClusterBlock is sent over the wire to a node with an old transport version, the REFRESH cluster block level will be removed from the set of level blocked. I expect it to not be an issue, as nodes with old transport versions should not make any usage of ClusterBlock.INDEX_REFRESH_BLOCK. Still, I think that it is worth backporting ClusterBlockLevel.REFRESH, ClusterBlock.INDEX_REFRESH_BLOCK and the serialization changes to v8.18.

In the future, the REFRESH cluster block will be used:

to block refreshes on shards until an unpromotable shard is started
to allow skipping shards when searching

Relates ES-10131

elasticsearchmachine · 2024-11-26T10:32:38Z

Pinging @elastic/es-distributed-indexing (Team:Distributed Indexing)

thecoop · 2024-11-27T13:54:33Z

server/src/main/java/org/elasticsearch/common/io/stream/StreamOutput.java


+    private static <E extends Enum<E>> boolean assertEnumToWrite(E enumValue, TransportVersion version) {
+        assert enumValue instanceof XContentType == false : "XContentHelper#writeTo should be used for XContentType serialisation";
+        assert enumValue != ClusterBlockLevel.REFRESH || version.onOrAfter(TransportVersions.NEW_REFRESH_CLUSTER_BLOCK)


This class should not have a reference to a specific enum value and transport version - this should be handled by the class doing the serialization

I think there is no other way to do this check. Notice that the class
using this enum does do the right thing. I think we can either have the assertion or remove it. I would prefer to have it, similar to how we have the xcontenttype assertion.
We could have a separate task to introduce the infra to declare a min transport version on an enum to ensure this is verified in the infra.

I added this assertion to ensure that we'll be informed in case I miss a place where the ClusterBlockLevel is serialized. I agree it's not great, but it helps me be more confident in the change as some part of this PR will be backported to 8.x (at least the new cluster block and the serialization changes).

I would prefer to keep this assertion until the backport is done and CI run for a couple of weeks before removing it, if that's possible.

I think I'm with @thecoop here - writeEnum is only appropriate for the (very common) case that enum values map exactly to their ordinals in the wire protocol. In all other cases the caller should define its own mapping between enum values and wire representations, calling writeVInt() and readVInt() itself to deal with older versions that relied on writeEnum() and readEnum().

I pushed 00d3dce to remove that change in StreamOutput.

thecoop

StreamOutput should not be modified for a specific enum value

henningandersen

LGTM

henningandersen · 2024-11-27T13:35:01Z

server/src/main/java/org/elasticsearch/cluster/block/ClusterBlockLevel.java

    METADATA_READ,
-    METADATA_WRITE;
+    METADATA_WRITE,
+    REFRESH;


I did not check, but wonder if there are any greater than, less than comparisons against the "level", since the wording signals some order. Perhaps you can take a look (if you have not already)?

I checked and saw no comparisons based on the ordinal (and did not expect to saw one either)

henningandersen · 2024-11-27T13:35:54Z

server/src/main/java/org/elasticsearch/cluster/metadata/IndexMetadata.java

        EnumSet.of(ClusterBlockLevel.WRITE)
    );
+    public static final ClusterBlock INDEX_REFRESH_BLOCK = new ClusterBlock(
+        14,


What is with the hole here, just avoiding the unfortunate 13 or leaving room for one to go in between?

13 is assigned to CLUSTER_READ_ONLY_ALLOW_DELETE_BLOCK in the Metadata class.

I was thinking of declaring all ids in the same constant class (all blocks are defined in server) as a possible follow up.

henningandersen · 2024-11-27T13:47:37Z

server/src/main/java/org/elasticsearch/cluster/metadata/MetadataCreateIndexService.java

+        if (useRefreshBlock(settings) == false) {
+            return (clusterBlocks, indexMetadata, minClusterTransportVersion) -> {};
+        }
+        logger.info("applying refresh block on index creation");


Should we move this to debug?

Sure, I pushed d838465

henningandersen · 2024-11-27T13:47:59Z

server/src/main/java/org/elasticsearch/cluster/metadata/MetadataCreateIndexService.java

+    private static boolean applyRefreshBlock(IndexMetadata indexMetadata, TransportVersion minClusterTransportVersion) {
+        return 0 < indexMetadata.getNumberOfReplicas() // index has replicas
+            && indexMetadata.getResizeSourceIndex() == null // index is not a split/shrink index
+            && indexMetadata.getInSyncAllocationIds().values().stream().allMatch(Set::isEmpty) // index is a new index


Can this ever be not true? Fine to keep ofc.

I expect this to be always true, but decided to copy the conditions from org.elasticsearch.cluster.routing.IndexRoutingTable.Builder#initializeEmpty for extra safety.

henningandersen · 2024-11-27T13:49:28Z

server/src/main/java/org/elasticsearch/cluster/metadata/MetadataCreateIndexService.java

    private final boolean forbidPrivateIndexSettings;
    private final Set<IndexSettingProvider> indexSettingProviders;
    private final ThreadPool threadPool;
+    private final @Nullable ClusterBlocksTransformer blocksTransformerUponIndexCreation;


This no longer looks nullable?

Removed in ccee9d1

henningandersen · 2024-11-27T22:33:03Z

server/src/main/java/org/elasticsearch/common/io/stream/StreamOutput.java


+    private static <E extends Enum<E>> boolean assertEnumToWrite(E enumValue, TransportVersion version) {
+        assert enumValue instanceof XContentType == false : "XContentHelper#writeTo should be used for XContentType serialisation";
+        assert enumValue != ClusterBlockLevel.REFRESH || version.onOrAfter(TransportVersions.NEW_REFRESH_CLUSTER_BLOCK)


I think there is no other way to do this check. Notice that the class
using this enum does do the right thing. I think we can either have the assertion or remove it. I would prefer to have it, similar to how we have the xcontenttype assertion.
We could have a separate task to introduce the infra to declare a min transport version on an enum to ensure this is verified in the infra.

kingherc

Nice idea for serverless!

added to new indices that are created from empty store

Why not all indices?

I think that it is worth backporting

Should you add the v8.x and auto-backport labels?

to block refreshes on shards until an unpromotable shard is started

I am a bit unfamiliar with cluster blocks, but just to confirm, the block will be at the index level, not the whole cluster, right? I think the clusterBlocks.addIndexBlock() shows that, but would like you to confirm.

server/src/test/java/org/elasticsearch/cluster/block/ClusterBlockTests.java

server/src/main/java/org/elasticsearch/cluster/block/ClusterBlock.java

server/src/main/java/org/elasticsearch/cluster/metadata/MetadataCreateIndexService.java

kingherc

Will wait for rest of my comments' answers before approving, but in general looks good to me.

server/src/main/java/org/elasticsearch/cluster/block/ClusterBlock.java

tlrx · 2024-11-28T16:04:14Z

Why not all indices?

The change is targeting searches during the rollover of datastreams, when the new write index can accept writes but unpromotable shards are not ready yet. Do you think it would be valuable for other indices like restored from snapshot?

Should you add the v8.x and auto-backport labels?

I'll add the label but I expect to only backport the serialization changes with the new cluster block (ie, not the changes to apply the block during index creation).

I am a bit unfamiliar with cluster blocks, but just to confirm, the block will be at the index level, not the whole cluster, right? I think the clusterBlocks.addIndexBlock() shows that, but would like you to confirm.

Yes. ClusterBlockLevel can be used at a global level and index level, but this change applies the new refresh block at the index level.

kingherc

LGTM. Please consider my approval "light" in the sense that I'm not too familiar with cluster blocks.

The change is targeting searches during the rollover of datastreams, when the new write index can accept writes but unpromotable shards are not ready yet. Do you think it would be valuable for other indices like restored from snapshot?

Because you mentioned you may use it to allow skipping shards when searching, it might not be applicable to all indices. However, I was thinking that the PR's aspect that it blocks until search shards is ready may be useful for all indices, in lieu of a "wait for index green health" step.

Specifically, PR #117486 came to mind, where we (with @carlosdelest ) were struggling to find a way to have a step "wait for at least one search shard to be ready" so that searches do not fail. We do not have such an API yet. The idea that refreshes or searches simply wait for the search shard(s) to be ready makes sense to me. I'm not sure a cluster block is the perfect way for this, but I was intrigued by the idea and thought I'd mention it for your thoughts here -- whether this might be re-used.

carlosdelest · 2024-11-29T08:39:22Z

I'm not sure a cluster block is the perfect way for this, but I was intrigued by the idea and thought I'd mention it for your thoughts here -- whether this might be re-used.

@kingherc , thanks for the ping!

Would searches performed fail during the INDEX_REFRESH_BLOCK? In that case, I'm not sure how we would be able to reuse this for the "wait for a search shard use case" 🤔 . How do you envision using this block for it?

tlrx · 2024-11-29T08:53:45Z

Would searches performed fail during the INDEX_REFRESH_BLOCK? In that case, I'm not sure how we would be able to reuse this for the "wait for a search shard use case" 🤔 . How do you envision using this block for it?

We plan to modify the search logic so that it skips search shards if the index has a refresh block and no search shard copy is started. The goal here is to avoid the search returning a shard failure (and therefore 503s).

We also plan to modify the refresh logic so that it blocks until a search shard is started (that's why it is called a refresh block). This way, if a bulk request with ?refresh=immediate|wait_for is executed, it will block until a search shard is started and a subsequent search is guaranteed to see the indexed document.

Note that ES-10131 has more context too.

kingherc · 2024-11-29T09:30:43Z

Specifically @carlosdelest I was thinking that if INDEX_REFRESH_BLOCK could apply to synonyms, we try having the tests simply refresh the synonyms index, which would block until at least a search shard is ready. But what Tanguy also described on skipping empty not-ready search shards could work as well if the synonyms are empty.

Note that for this PR to help with the synonyms test situation, the synonyms index in the tests would need to be newly created and empty so that the INDEX_REFRESH_BLOCK is put.

carlosdelest · 2024-11-29T09:51:45Z

Got it @kingherc , I will give it a try when this PR is merged. Thanks!

tlrx · 2024-11-29T09:59:23Z

Got it @kingherc , I will give it a try when this PR is merged. Thanks!

There is more changes to do after this PR before you can use it. Also, it will only work in serverless.

tlrx · 2024-11-29T10:56:59Z

Thanks all for your feedback.

I merged the PR with the assertion removed as requested by @thecoop, who is on PTO today.

This is the backport of elastic#117543 for 8.18. It contains the cluster block and cluster level block, the transport version and serialization changes. It does NOT contain the MetadataCreateIndexService logic to apply the block.

This is the backport of #117543 for 8.18. It contains the cluster block and cluster level block, the transport version and serialization changes. It does NOT contain the MetadataCreateIndexService logic to apply the block.

This change adds a new ClusterBlockLevel called REFRESH. This level is used in a new ClusterBlock.INDEX_REFRESH_BLOCK which is automatically added to new indices that are created from empty store, with replicas, and only on serverless deployments that have a feature flag enabled. This block is also only added when all nodes of a cluster are in a recent enough transport version. If for some reason the new ClusterBlock is sent over the wire to a node with an old transport version, the REFRESH cluster block level will be removed from the set of level blocked. In the future, the REFRESH cluster block will be used: to block refreshes on shards until an unpromotable shard is started to allow skipping shards when searching Relates ES-10131

#117543 introduced a cluster block that is added to new indices in stateless and removed when at least one replica is ready. A search against those indices should be skipped during that time.

…c#3208) This pull request introduces a feature flag setting that enables the addition of the INDEX_REFRESH_BLOCK upon new index creation (see elastic#117543). It also adds integration tests relative to the automatic addition of the refresh block. Relates

elastic#117543 introduced a ClusterBlock which is applied to new indices in Serverless which do not yet have search shards up. We should skip searches for indices with this block in order to avoid meaningless 503s.

…c#3208) This pull request introduces a feature flag setting that enables the addition of the INDEX_REFRESH_BLOCK upon new index creation (see elastic#117543). It also adds integration tests relative to the automatic addition of the refresh block. Relates

elastic#117543 introduced a ClusterBlock which is applied to new indices in Serverless which do not yet have search shards up. We should skip searches for indices with this block in order to avoid meaningless 503s.

Add INDEX_REFRESH_BLOCK

f9e5264

Relates ES-10131

tlrx added >non-issue :Distributed/Recovery Anything around constructing a new shard, either from a local or a remote source. v9.0.0 labels Nov 26, 2024

elasticsearchmachine added the Team:Distributed Indexing (obsolete) Meta label for Distributed Indexing team. Obsolete. Please do not use. label Nov 26, 2024

elasticsearchmachine added the serverless-linked Added by automation, don't add manually label Nov 26, 2024

tlrx added 2 commits November 26, 2024 12:01

spotless

3a592e1

fix unit test

d447c47

tlrx marked this pull request as draft November 26, 2024 12:24

tlrx added 6 commits November 26, 2024 15:18

fix serialization

37b3123

Merge branch 'main' into 2024/11/26/ES-10131

0f980e6

tests

40a3a18

fix test

063e904

Merge branch 'main' into 2024/11/26/ES-10131

5b23da3

Merge branch 'main' into 2024/11/26/ES-10131

9f943dc

tlrx requested review from arteam, henningandersen and kingherc November 27, 2024 13:21

tlrx marked this pull request as ready for review November 27, 2024 13:39

tlrx requested a review from a team as a code owner November 27, 2024 13:39

thecoop reviewed Nov 27, 2024

View reviewed changes

thecoop requested changes Nov 27, 2024

View reviewed changes

henningandersen approved these changes Nov 27, 2024

View reviewed changes

kingherc reviewed Nov 28, 2024

View reviewed changes

server/src/main/java/org/elasticsearch/cluster/block/ClusterBlock.java Show resolved Hide resolved

tlrx added 4 commits November 28, 2024 16:02

Merge branch 'main' into 2024/11/26/ES-10131

789ed95

debug

d838465

nullable

ccee9d1

nits

7525d2d

tlrx added 3 commits November 28, 2024 17:09

Merge branch 'main' into 2024/11/26/ES-10131

7030f80

Merge branch 'main' into 2024/11/26/ES-10131

beed63c

revert StreamOutput

00d3dce

kingherc approved these changes Nov 29, 2024

View reviewed changes

tlrx requested a review from thecoop November 29, 2024 08:36

tlrx merged commit 045f6a3 into elastic:main Nov 29, 2024

tlrx added backport pending v8.18.0 labels Nov 29, 2024

tlrx mentioned this pull request Nov 29, 2024

[8.x] Add INDEX_REFRESH_BLOCK #117753

Merged

tlrx removed the backport pending label Nov 29, 2024

benchaplin mentioned this pull request Jun 9, 2025

Skip search shards with INDEX_REFRESH_BLOCK #129132

Merged

Conversation

tlrx commented Nov 26, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

elasticsearchmachine commented Nov 26, 2024

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

thecoop left a comment

Choose a reason for hiding this comment

Uh oh!

henningandersen left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

kingherc left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

kingherc left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

tlrx commented Nov 28, 2024

Uh oh!

kingherc left a comment

Choose a reason for hiding this comment

Uh oh!

carlosdelest commented Nov 29, 2024

Uh oh!

tlrx commented Nov 29, 2024

Uh oh!

kingherc commented Nov 29, 2024

Uh oh!

carlosdelest commented Nov 29, 2024

Uh oh!

tlrx commented Nov 29, 2024

Uh oh!

tlrx commented Nov 29, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

tlrx commented Nov 26, 2024 •

edited

Loading