Conversation
Relates ES-10131
|
Pinging @elastic/es-distributed-indexing (Team:Distributed Indexing) |
|
|
||
| private static <E extends Enum<E>> boolean assertEnumToWrite(E enumValue, TransportVersion version) { | ||
| assert enumValue instanceof XContentType == false : "XContentHelper#writeTo should be used for XContentType serialisation"; | ||
| assert enumValue != ClusterBlockLevel.REFRESH || version.onOrAfter(TransportVersions.NEW_REFRESH_CLUSTER_BLOCK) |
There was a problem hiding this comment.
This class should not have a reference to a specific enum value and transport version - this should be handled by the class doing the serialization
There was a problem hiding this comment.
I think there is no other way to do this check. Notice that the class
using this enum does do the right thing. I think we can either have the assertion or remove it. I would prefer to have it, similar to how we have the xcontenttype assertion.
We could have a separate task to introduce the infra to declare a min transport version on an enum to ensure this is verified in the infra.
There was a problem hiding this comment.
I added this assertion to ensure that we'll be informed in case I miss a place where the ClusterBlockLevel is serialized. I agree it's not great, but it helps me be more confident in the change as some part of this PR will be backported to 8.x (at least the new cluster block and the serialization changes).
I would prefer to keep this assertion until the backport is done and CI run for a couple of weeks before removing it, if that's possible.
There was a problem hiding this comment.
I think I'm with @thecoop here - writeEnum is only appropriate for the (very common) case that enum values map exactly to their ordinals in the wire protocol. In all other cases the caller should define its own mapping between enum values and wire representations, calling writeVInt() and readVInt() itself to deal with older versions that relied on writeEnum() and readEnum().
There was a problem hiding this comment.
I pushed 00d3dce to remove that change in StreamOutput.
thecoop
left a comment
There was a problem hiding this comment.
StreamOutput should not be modified for a specific enum value
| METADATA_READ, | ||
| METADATA_WRITE; | ||
| METADATA_WRITE, | ||
| REFRESH; |
There was a problem hiding this comment.
I did not check, but wonder if there are any greater than, less than comparisons against the "level", since the wording signals some order. Perhaps you can take a look (if you have not already)?
There was a problem hiding this comment.
I checked and saw no comparisons based on the ordinal (and did not expect to saw one either)
| EnumSet.of(ClusterBlockLevel.WRITE) | ||
| ); | ||
| public static final ClusterBlock INDEX_REFRESH_BLOCK = new ClusterBlock( | ||
| 14, |
There was a problem hiding this comment.
What is with the hole here, just avoiding the unfortunate 13 or leaving room for one to go in between?
There was a problem hiding this comment.
13 is assigned to CLUSTER_READ_ONLY_ALLOW_DELETE_BLOCK in the Metadata class.
I was thinking of declaring all ids in the same constant class (all blocks are defined in server) as a possible follow up.
| if (useRefreshBlock(settings) == false) { | ||
| return (clusterBlocks, indexMetadata, minClusterTransportVersion) -> {}; | ||
| } | ||
| logger.info("applying refresh block on index creation"); |
There was a problem hiding this comment.
Should we move this to debug?
| private static boolean applyRefreshBlock(IndexMetadata indexMetadata, TransportVersion minClusterTransportVersion) { | ||
| return 0 < indexMetadata.getNumberOfReplicas() // index has replicas | ||
| && indexMetadata.getResizeSourceIndex() == null // index is not a split/shrink index | ||
| && indexMetadata.getInSyncAllocationIds().values().stream().allMatch(Set::isEmpty) // index is a new index |
There was a problem hiding this comment.
Can this ever be not true? Fine to keep ofc.
There was a problem hiding this comment.
I expect this to be always true, but decided to copy the conditions from org.elasticsearch.cluster.routing.IndexRoutingTable.Builder#initializeEmpty for extra safety.
| private final boolean forbidPrivateIndexSettings; | ||
| private final Set<IndexSettingProvider> indexSettingProviders; | ||
| private final ThreadPool threadPool; | ||
| private final @Nullable ClusterBlocksTransformer blocksTransformerUponIndexCreation; |
There was a problem hiding this comment.
This no longer looks nullable?
|
|
||
| private static <E extends Enum<E>> boolean assertEnumToWrite(E enumValue, TransportVersion version) { | ||
| assert enumValue instanceof XContentType == false : "XContentHelper#writeTo should be used for XContentType serialisation"; | ||
| assert enumValue != ClusterBlockLevel.REFRESH || version.onOrAfter(TransportVersions.NEW_REFRESH_CLUSTER_BLOCK) |
There was a problem hiding this comment.
I think there is no other way to do this check. Notice that the class
using this enum does do the right thing. I think we can either have the assertion or remove it. I would prefer to have it, similar to how we have the xcontenttype assertion.
We could have a separate task to introduce the infra to declare a min transport version on an enum to ensure this is verified in the infra.
kingherc
left a comment
There was a problem hiding this comment.
Nice idea for serverless!
added to new indices that are created from empty store
Why not all indices?
I think that it is worth backporting
Should you add the v8.x and auto-backport labels?
to block refreshes on shards until an unpromotable shard is started
I am a bit unfamiliar with cluster blocks, but just to confirm, the block will be at the index level, not the whole cluster, right? I think the clusterBlocks.addIndexBlock() shows that, but would like you to confirm.
server/src/test/java/org/elasticsearch/cluster/block/ClusterBlockTests.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/elasticsearch/cluster/metadata/MetadataCreateIndexService.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/elasticsearch/cluster/metadata/MetadataCreateIndexService.java
Show resolved
Hide resolved
kingherc
left a comment
There was a problem hiding this comment.
Will wait for rest of my comments' answers before approving, but in general looks good to me.
The change is targeting searches during the rollover of datastreams, when the new write index can accept writes but unpromotable shards are not ready yet. Do you think it would be valuable for other indices like restored from snapshot?
I'll add the label but I expect to only backport the serialization changes with the new cluster block (ie, not the changes to apply the block during index creation).
Yes. |
kingherc
left a comment
There was a problem hiding this comment.
LGTM. Please consider my approval "light" in the sense that I'm not too familiar with cluster blocks.
The change is targeting searches during the rollover of datastreams, when the new write index can accept writes but unpromotable shards are not ready yet. Do you think it would be valuable for other indices like restored from snapshot?
Because you mentioned you may use it to allow skipping shards when searching, it might not be applicable to all indices. However, I was thinking that the PR's aspect that it blocks until search shards is ready may be useful for all indices, in lieu of a "wait for index green health" step.
Specifically, PR #117486 came to mind, where we (with @carlosdelest ) were struggling to find a way to have a step "wait for at least one search shard to be ready" so that searches do not fail. We do not have such an API yet. The idea that refreshes or searches simply wait for the search shard(s) to be ready makes sense to me. I'm not sure a cluster block is the perfect way for this, but I was intrigued by the idea and thought I'd mention it for your thoughts here -- whether this might be re-used.
@kingherc , thanks for the ping! Would searches performed fail during the |
We plan to modify the search logic so that it skips search shards if the index has a refresh block and no search shard copy is started. The goal here is to avoid the search returning a shard failure (and therefore 503s). We also plan to modify the refresh logic so that it blocks until a search shard is started (that's why it is called a refresh block). This way, if a bulk request with Note that ES-10131 has more context too. |
|
Specifically @carlosdelest I was thinking that if INDEX_REFRESH_BLOCK could apply to synonyms, we try having the tests simply refresh the synonyms index, which would block until at least a search shard is ready. But what Tanguy also described on skipping empty not-ready search shards could work as well if the synonyms are empty. Note that for this PR to help with the synonyms test situation, the synonyms index in the tests would need to be newly created and empty so that the INDEX_REFRESH_BLOCK is put. |
|
Got it @kingherc , I will give it a try when this PR is merged. Thanks! |
There is more changes to do after this PR before you can use it. Also, it will only work in serverless. |
|
Thanks all for your feedback. I merged the PR with the assertion removed as requested by @thecoop, who is on PTO today. |
This is the backport of elastic#117543 for 8.18. It contains the cluster block and cluster level block, the transport version and serialization changes. It does NOT contain the MetadataCreateIndexService logic to apply the block.
This is the backport of #117543 for 8.18. It contains the cluster block and cluster level block, the transport version and serialization changes. It does NOT contain the MetadataCreateIndexService logic to apply the block.
This change adds a new ClusterBlockLevel called REFRESH.
This level is used in a new ClusterBlock.INDEX_REFRESH_BLOCK
which is automatically added to new indices that are created
from empty store, with replicas, and only on serverless
deployments that have a feature flag enabled. This block is
also only added when all nodes of a cluster are in a recent
enough transport version.
If for some reason the new ClusterBlock is sent over the
wire to a node with an old transport version, the REFRESH
cluster block level will be removed from the set of level
blocked.
In the future, the REFRESH cluster block will be used:
to block refreshes on shards until an unpromotable shard is started
to allow skipping shards when searching
Relates ES-10131
#117543 introduced a cluster block that is added to new indices in stateless and removed when at least one replica is ready. A search against those indices should be skipped during that time.
…c#3208) This pull request introduces a feature flag setting that enables the addition of the INDEX_REFRESH_BLOCK upon new index creation (see elastic#117543). It also adds integration tests relative to the automatic addition of the refresh block. Relates
elastic#117543 introduced a ClusterBlock which is applied to new indices in Serverless which do not yet have search shards up. We should skip searches for indices with this block in order to avoid meaningless 503s.
…c#3208) This pull request introduces a feature flag setting that enables the addition of the INDEX_REFRESH_BLOCK upon new index creation (see elastic#117543). It also adds integration tests relative to the automatic addition of the refresh block. Relates
elastic#117543 introduced a ClusterBlock which is applied to new indices in Serverless which do not yet have search shards up. We should skip searches for indices with this block in order to avoid meaningless 503s.
This pull request adds a new
ClusterBlockLevelcalledREFRESH. This level is used in a newClusterBlock.INDEX_REFRESH_BLOCKwhich is automatically added to new indices that are created from empty store, with replicas, and only on serverless deployments that have a feature flag enabled. This block is also only added when all nodes of a cluster are in a recent enough transport version.If for some reason the new
ClusterBlockis sent over the wire to a node with an old transport version, theREFRESHcluster block level will be removed from the set of level blocked. I expect it to not be an issue, as nodes with old transport versions should not make any usage ofClusterBlock.INDEX_REFRESH_BLOCK. Still, I think that it is worth backportingClusterBlockLevel.REFRESH,ClusterBlock.INDEX_REFRESH_BLOCKand the serialization changes to v8.18.In the future, the REFRESH cluster block will be used:
Relates ES-10131