Skip to content

Add doc on ClusterState in DistributedArchitectureGuide#142776

Merged
inespot merged 20 commits intoelastic:mainfrom
inespot:distributed-doc/cluster-state
Feb 25, 2026
Merged

Add doc on ClusterState in DistributedArchitectureGuide#142776
inespot merged 20 commits intoelastic:mainfrom
inespot:distributed-doc/cluster-state

Conversation

@inespot
Copy link
Copy Markdown
Contributor

@inespot inespot commented Feb 20, 2026

Details the cluster state components and the update/publication flow.

ES-7869

Follows: #142435

Details the cluster state components and the update/publication flow.

ES-7869
@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Feb 20, 2026

🔍 Preview links for changed docs

@github-actions
Copy link
Copy Markdown
Contributor

ℹ️ Important: Docs version tagging

👋 Thanks for updating the docs! Just a friendly reminder that our docs are now cumulative. This means all 9.x versions are documented on the same page and published off of the main branch, instead of creating separate pages for each minor version.

We use applies_to tags to mark version-specific features and changes.

Expand for a quick overview

When to use applies_to tags:

✅ At the page level to indicate which products/deployments the content applies to (mandatory)
✅ When features change state (e.g. preview, ga) in a specific version
✅ When availability differs across deployments and environments

What NOT to do:

❌ Don't remove or replace information that applies to an older version
❌ Don't add new information that applies to a specific version without an applies_to tag
❌ Don't forget that applies_to tags can be used at the page, section, and inline level

🤔 Need help?

@inespot inespot marked this pull request as ready for review February 22, 2026 02:01
@elasticsearchmachine elasticsearchmachine added the needs:triage Requires assignment of a team area label label Feb 22, 2026
@inespot inespot added :Distributed/Distributed A catch all label for anything in the Distributed Area. Please avoid if you can. >docs General docs changes labels Feb 22, 2026
@elasticsearchmachine elasticsearchmachine added Team:Distributed Meta label for distributed team. Team:Docs Meta label for docs team and removed needs:triage Requires assignment of a team area label labels Feb 22, 2026
@elasticsearchmachine
Copy link
Copy Markdown
Collaborator

Pinging @elastic/es-distributed (Team:Distributed)

@elasticsearchmachine
Copy link
Copy Markdown
Collaborator

Pinging @elastic/core-docs (Team:Docs)

@inespot inespot removed the Team:Docs Meta label for docs team label Feb 22, 2026
@elasticsearchmachine elasticsearchmachine added the Team:Docs Meta label for docs team label Feb 22, 2026
@inespot inespot added >non-issue and removed >docs General docs changes Team:Docs Meta label for docs team labels Feb 22, 2026
@inespot
Copy link
Copy Markdown
Contributor Author

inespot commented Feb 22, 2026

After https://github.com/elastic/elasticsearch-infra/pull/523, I thought the Docs team would not be pinged if >docs was added 🫢 Sorry about that, switched the label to >non-issue.

@inespot inespot requested a review from DaveCTurner February 22, 2026 23:28
#### Cluster State Publication

(Majority consensus to apply, what happens if a master-eligible node falls behind / is incommunicado.)
![Alt text](images/cluster-state-publication.png)
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Created a quick diagram via draw.io to illustrate this flow. I figured it might make the doc a bit more digestible, but let me know if you don't think it adds much additional value, open to removing it!

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This diagram seems to suggest that the non-master node acks the ApplyCommitRequest while it is still processing onNewClusterState. I think we may want to have the Ack arrow starting from non-master node's onNewClusterState instead?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes that's right. But more generally, diagrams like this are really hard to fix (and therefore to keep up to date as other things change) so I'd rather we found a way to represent this information in text form.

Note that we can embed Mermaid diagrams in these docs directly:

https://github.com/DaveCTurner/elasticsearch/blob/2026/02/23/mermaid-diag-internal-docs/docs/internal/DistributedArchitectureGuide.md

I'd recommend doing that instead.

Copy link
Copy Markdown
Contributor Author

@inespot inespot Feb 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This diagram seems to suggest that the non-master node acks the ApplyCommitRequest while it is still processing onNewClusterState

Ah nice catch! That makes sense, I'll look into clarifying the text and/or replace the current diagram to mermaid one

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in 40a7bb. I can modify or remove the diagram entirely if preferred. I think the text should now have all the info contained in the diagram? But let me know if I am missing something

Copy link
Copy Markdown
Member

@DaveCTurner DaveCTurner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good apart from suggesting converting the diagram to text and one other request to tighten up the description of ClusterState's purpose a bit.

(Explain joining, and how it happens every time a new master is elected)

#### Discovery
The [ClusterState] is the in-memory data structure that represents the current state of the cluster. It is
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A bit of a nit but "current state" might be interpreted to include things like on-disk index data which aren't tracked in the ClusterState object.

Can we say something a bit more precise, e.g. that ClusterState is the portion of the current state of the cluster which is (a) required to be held in-memory on every node and (b) required for correctness to be updated in a strongly-consistent (i.e. linearizable) fashion.

I'd also like us to mention something here about how updating the cluster state is extraordinarily expensive, taking 100s of milliseconds at least, and thus must be avoided unless absolutely necessary.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in 40a7bb

Copy link
Copy Markdown
Member

@DaveCTurner DaveCTurner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great, I think the diagram augments the text nicely. Suggested highlighting that the broadcast messages go to the master itself as well as its followers, but otherwise LGTM.

Copy link
Copy Markdown
Member

@DaveCTurner DaveCTurner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@inespot inespot merged commit ea24601 into elastic:main Feb 25, 2026
12 checks passed
smalyshev pushed a commit to smalyshev/elasticsearch that referenced this pull request Feb 25, 2026
* Add doc on ClusterState in DistributedArchitectureGuide

Details the cluster state components and the update/publication flow.

ES-7869

* MasterService details

* Cluster State Publication

* Add a diagram

* Clarification

* Cluster State Application

* Typos and nits

* Persistence

* Readability and nits

* Typos

* Last nits

* Review comments

* Some format nits

* Typo

* Diagram: the master sends requests to itself
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

:Distributed/Distributed A catch all label for anything in the Distributed Area. Please avoid if you can. >non-issue Team:Distributed Meta label for distributed team. v9.4.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants