Add doc on ClusterState in DistributedArchitectureGuide#142776
Add doc on ClusterState in DistributedArchitectureGuide#142776inespot merged 20 commits intoelastic:mainfrom
Conversation
Details the cluster state components and the update/publication flow. ES-7869
🔍 Preview links for changed docs |
ℹ️ Important: Docs version tagging👋 Thanks for updating the docs! Just a friendly reminder that our docs are now cumulative. This means all 9.x versions are documented on the same page and published off of the main branch, instead of creating separate pages for each minor version. We use applies_to tags to mark version-specific features and changes. Expand for a quick overviewWhen to use applies_to tags:✅ At the page level to indicate which products/deployments the content applies to (mandatory) What NOT to do:❌ Don't remove or replace information that applies to an older version 🤔 Need help?
|
|
Pinging @elastic/es-distributed (Team:Distributed) |
|
Pinging @elastic/core-docs (Team:Docs) |
|
After https://github.com/elastic/elasticsearch-infra/pull/523, I thought the Docs team would not be pinged if |
| #### Cluster State Publication | ||
|
|
||
| (Majority consensus to apply, what happens if a master-eligible node falls behind / is incommunicado.) | ||
|  |
There was a problem hiding this comment.
Created a quick diagram via draw.io to illustrate this flow. I figured it might make the doc a bit more digestible, but let me know if you don't think it adds much additional value, open to removing it!
There was a problem hiding this comment.
This diagram seems to suggest that the non-master node acks the ApplyCommitRequest while it is still processing onNewClusterState. I think we may want to have the Ack arrow starting from non-master node's onNewClusterState instead?
There was a problem hiding this comment.
Yes that's right. But more generally, diagrams like this are really hard to fix (and therefore to keep up to date as other things change) so I'd rather we found a way to represent this information in text form.
Note that we can embed Mermaid diagrams in these docs directly:
I'd recommend doing that instead.
There was a problem hiding this comment.
This diagram seems to suggest that the non-master node acks the ApplyCommitRequest while it is still processing onNewClusterState
Ah nice catch! That makes sense, I'll look into clarifying the text and/or replace the current diagram to mermaid one
There was a problem hiding this comment.
Done in 40a7bb. I can modify or remove the diagram entirely if preferred. I think the text should now have all the info contained in the diagram? But let me know if I am missing something
DaveCTurner
left a comment
There was a problem hiding this comment.
Looks good apart from suggesting converting the diagram to text and one other request to tighten up the description of ClusterState's purpose a bit.
| (Explain joining, and how it happens every time a new master is elected) | ||
|
|
||
| #### Discovery | ||
| The [ClusterState] is the in-memory data structure that represents the current state of the cluster. It is |
There was a problem hiding this comment.
A bit of a nit but "current state" might be interpreted to include things like on-disk index data which aren't tracked in the ClusterState object.
Can we say something a bit more precise, e.g. that ClusterState is the portion of the current state of the cluster which is (a) required to be held in-memory on every node and (b) required for correctness to be updated in a strongly-consistent (i.e. linearizable) fashion.
I'd also like us to mention something here about how updating the cluster state is extraordinarily expensive, taking 100s of milliseconds at least, and thus must be avoided unless absolutely necessary.
DaveCTurner
left a comment
There was a problem hiding this comment.
Looks great, I think the diagram augments the text nicely. Suggested highlighting that the broadcast messages go to the master itself as well as its followers, but otherwise LGTM.
* Add doc on ClusterState in DistributedArchitectureGuide Details the cluster state components and the update/publication flow. ES-7869 * MasterService details * Cluster State Publication * Add a diagram * Clarification * Cluster State Application * Typos and nits * Persistence * Readability and nits * Typos * Last nits * Review comments * Some format nits * Typo * Diagram: the master sends requests to itself
Details the cluster state components and the update/publication flow.
ES-7869
Follows: #142435