-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Feature Request] Scale to Zero with Reader/Writer Separation. #16720
Comments
[Triage]
Moving forward I will use this issue to discuss more on scale to zero topic. Thank you |
From the POC, with the scale to zero setting (
And once the index level scale to zero setting (
The logic ensures that the final sync for both the translog sync and remote store sync is completed before the shard is closed. To validate this, I’ve added additional log information during the process:
Before closing the shard, when the scale to zero setting (
Also I have seen steps closely align with those outlined in the close index operation, here we are dealing with just the shard close. From multiple tests, I’ve consistently observed no uncommitted operations post-flush. Therefore, closing the shard appears safe when the scale to zero setting ( Question Is there an existing method or logic in remote store to handle scenarios where a shard closure occurs while the translog still contains uncommitted operations? Specifically:
Before I make this change @shwetathareja can you please suggest if there is mechanism already exists from remote store that can I use to find the primary shards for an index across the nodes and run a final guarantee sync to the remote store? If so I can leverage this and just proceed with closing the shard. Thank you |
In the latest commit (prudhvigodithi@aa96f64 under the In the log I can see it filters the nodes and primary shards for a given index and does one final sync before proceeding updating the routing table and closing the shard.
Finally by then it reaches the closeShard method, the translog is empty and sync required is false, which means safe to close the shard without any data loss.
|
Thanks @prudhvigodithi for the detailed proposal! Couple of things:
Also, If i understand correctly search-replica are not capable of applying translog operations. We need a deterministic way to ensure all the segments are created and uploaded before the indexing copies (primary + replica) are closed gracefully. |
Thanks @shwetathareja
The setting
Ya moving forward coming from latest comment #16720 (comment), the idea is to not touch the
When called |
I think we should probably apply this through an explicit api similar to _close instead of an index setting given this is a destructive operation. And similarly an api to undo & bring the writers back. The sequence of events when its applied can then be largely similar to on close. I also +1 introducing a new block type that can't be explicitly removed through the block API. So basically:
|
+1 to @mch2 for having an explicit API for scale in or scale out operation like _scale. Also we need to ensure we stop writes, then refresh and verify there are no pending translogs and flush to ensure they are backed up on remote. When you apply the index block, it will not be applied atomically to all the shards at the same time as it is a cluster state change and some nodes can be slow to apply cluster state change. The writes will keep on coming on some of the shards in the meantime. |
I’ve been away for a while, but coming back to the implementation now. Thanks, everyone, for the suggestions! The existing setup uses index settings, but we can transition this to an explicit API (may be this can be part of the index API https://opensearch.org/docs/latest/api-reference/index-apis/index/) for scaling up or down the shards. We can also add an index block. Since the block won’t be applied atomically to all shards at once, the current POC first checks for Thank you |
I was able to make this work with |
From my previous comment #16720 (comment) the scale-to-zero action relies on cluster-level guarantees and final flush/sync for consistency, whereas the existing
Here is the POC commit:prudhvigodithi@57d5d90 for achieving scale to zero with an scale API and using internal cluster block ( To-Do:
@mch2 please add if I'm missing anything from our discussion. Thanks |
Is your feature request related to a problem? Please describe
Coming from the META issue #15306 achieve Scale to Zero with Reader/Writer Separation. With scale to zero we should be able to scale down the primary and replicas and keep only the search replicas for search traffic and ability to bring back the primary and regular replicas for write (index) traffic.
Describe the solution you'd like
Handle the scale to zero behavior and perform the actions based on an index setting. At high level
Update Cluster State (Initial)
remove_indexing_shards
(orscale_down_indexing
orindex.read_only_mode
) flag in index settings.IndexMetadata
to reflect scaled state.Store Original Configuration
Prepare for Scale Down
Update Routing Table
Close Shards
Handle Cluster Health
Scale Up Process (when flag is removed)
remove_indexing_shards
flag removal.Related component
Search:Performance
The text was updated successfully, but these errors were encountered: