Engine/Store DistributedArchitectureGuide doc#143818

Merged

inespot merged 10 commits intoelastic:mainfrom

inespot:ip/engine

Mar 11, 2026

Contributor

inespot commented Mar 9, 2026 •

edited

Loading

Explains what the elasticsearch Store and Engine classes are and how they are used.

Relates to ES-7878


          Engine/Store DistributedArchitectureGuide doc

8a6379b

Explains what the elasticsearch Store and Engine classes are  and how they are used.

Relates to ES-7878

elasticsearchmachine added the v9.4.0 label

Contributor

github-actions bot commented Mar 9, 2026 •

edited

Loading

🔍 Preview links for changed docs

docs/internal/DistributedArchitectureGuide.md

Contributor

github-actions bot commented Mar 9, 2026

ℹ️ Important: Docs version tagging

👋 Thanks for updating the docs! Just a friendly reminder that our docs are now cumulative. This means all 9.x versions are documented on the same page and published off of the main branch, instead of creating separate pages for each minor version.

We use applies_to tags to mark version-specific features and changes.

Expand for a quick overview

When to use applies_to tags:

✅ At the page level to indicate which products/deployments the content applies to (mandatory)
✅ When features change state (e.g. preview, ga) in a specific version
✅ When availability differs across deployments and environments

What NOT to do:

❌ Don't remove or replace information that applies to an older version
❌ Don't add new information that applies to a specific version without an applies_to tag
❌ Don't forget that applies_to tags can be used at the page, section, and inline level

🤔 Need help?

Check out the cumulative docs guidelines
Reach out in the #docs Slack channel

inespot added 4 commits

March 9, 2026 12:29


          Fill out Store sections

b6d2239


          Engine section + style stuff

7069a5b


          Style and clarity changes

a181ac3


          Nits and typos

0e744e5

inespot marked this pull request as ready for review

March 9, 2026 19:06

inespot added :Distributed/Distributed >non-issue labels

inespot requested a review from tlrx

March 9, 2026 19:06

elasticsearchmachine added the Team:Distributed label

Collaborator

elasticsearchmachine commented Mar 9, 2026

Pinging @elastic/es-distributed (Team:Distributed)


          Merge branch 'main' into ip/engine

ea4bdaf

tlrx reviewed

View reviewed changes

Member

tlrx left a comment

Looks great! I left some comments, feel free to apply/adjust at your convenience.

docs/internal/DistributedArchitectureGuide.md Outdated

+              an [IndexInput](https://lucene.apache.org/core/10_3_2/core/org/apache/lucene/store/IndexInput.html) to read a named file
+              and create an [IndexOutput](https://lucene.apache.org/core/10_3_2/core/org/apache/lucene/store/IndexOutput.html) to
+              write one. The `Store` builds on the Lucene `Directory` capabilities by tracking committed file metadata and enforcing
+              integrity invariants. It also adds reference counting and corruption detection.

Member

tlrx Mar 10, 2026

I think "tracking committed file metadata" is a bit misleading as it doesn't really track but instead provides access to committed file metadata.

docs/internal/DistributedArchitectureGuide.md Show resolved Hide resolved

docs/internal/DistributedArchitectureGuide.md Outdated

+              allocation (via `TransportNodesListShardStoreMetadata`). They are used to compare the on-disk state of two distinct
+              shards and calculate how much data needs to be transferred to bring them into sync.
+              During peer recovery, the recovering shard (target) calls `indexShard.snapshotStoreMetadata()` to capture its

Member

tlrx Mar 10, 2026

There is a dedicated section in the guide for Peer Recovery, maybe we should just mention it is used as part of it and link to the section? That seems a bit too low level to me.

Contributor Author

inespot Mar 10, 2026

Sounds good, I'll shorten this piece and link the peer recovery one instead!

docs/internal/DistributedArchitectureGuide.md Outdated

+              `MetadataSnapshot` is also included in the subsequent `RecoveryCleanFilesRequest` so the target knows which files belong
+              to the new commit and can delete anything stale.
+              During replica shard allocation, before allocating a replica, the master

Member

tlrx Mar 10, 2026

Same, that looks a bit too low level for the guide to me. Mentioning that MetadataSnapshot is used as part of shard allocation like it is done seems sufficient.

docs/internal/DistributedArchitectureGuide.md Outdated

+              The [ShardLock] is a node-wide, coarse-grained lock managed by [NodeEnvironment]. It is backed by a
+              `Semaphore` and guarantees that at most one owner at a time has write access to a given shard directory
+              within a JVM process. The
+              `Store` [acquires](https://github.com/elastic/elasticsearch/blob/v9.3.0/server/src/main/java/org/elasticsearch/index/store/Store.java#L169)

Member

tlrx Mar 10, 2026

It doesn't really acquires it, but it is instead given to the Store.

Contributor Author

inespot Mar 10, 2026

Ah yep, you are right, it's passed to the Store at construction, will rephrase this

docs/internal/DistributedArchitectureGuide.md Outdated

+              within a JVM process. The
+              `Store` [acquires](https://github.com/elastic/elasticsearch/blob/v9.3.0/server/src/main/java/org/elasticsearch/index/store/Store.java#L169)
+              a `ShardLock` when it is created and holds it for its entire lifetime. Write operations such as creating an
+              `IndexWriter`, deleting shard files, or recovering from another shard all require holding the `ShardLock` first. Callers

Member

tlrx Mar 10, 2026

Write operations such as creating an IndexWriter, deleting shard files, or recovering from another shard all require holding the ShardLock first.

As I read it, the sentence implies that these operations actively "require holding the ShardLock first" as if they check for it.

We could maybe mention that the ShardLock is held for the entire lifetime of the Store, ensuring that write operations such as creating an IndexWriter, deleting shard files, or recovering from another shard have exclusive access to the shard directory?

docs/internal/DistributedArchitectureGuide.md

+              To summarize, `ShardLock` is a JVM-level lock enforced across the entire node, `metadataLock`
+              coordinates in-process readers and writers within the `Store`, and the Lucene write lock guards the raw
+              directory at the file-system level.

Member

tlrx Mar 10, 2026

👍

docs/internal/DistributedArchitectureGuide.md Outdated


		### Engine

		The [Engine] abstract class is the Elasticsearch abstraction for the live shard index. Where the `Store`

Member

tlrx Mar 10, 2026

I'm struggling understanding what a live shard index means 😅

Maybe just:

Suggested change

      
            The [Engine] abstract class is the Elasticsearch abstraction for the live shard index. Where the `Store`
          
            The [Engine] abstract class is the Elasticsearch abstraction that manages and coordinates operations on the running shard index.

?

Contributor Author

inespot Mar 10, 2026

Yep, maybe that was not the most clear phrasing :) Happy to use yours

docs/internal/DistributedArchitectureGuide.md Outdated

+              gives read-only access to a frozen shard, and
+              the [NoOpEngine](https://github.com/elastic/elasticsearch/blob/v9.3.0/server/src/main/java/org/elasticsearch/index/engine/NoOpEngine.java)
+              acts as a do-nothing placeholder for shards belonging to a closed index, where an engine object must exist but
+              all read and write operations throw `UnsupportedOperationException`.

Member

tlrx Mar 10, 2026

I'm not sure all read operations really throws, but it's a detail.

Could we mention that NoOpEngine exists only to allow shards of closed indices to be correctly replicated in case of a node failure?

docs/internal/DistributedArchitectureGuide.md Show resolved Hide resolved


          Review: clarify some pieces and cut others

dd5b00b

inespot requested a review from tlrx

March 10, 2026 19:00


          Merge branch 'main' into ip/engine

02c26c9

Contributor Author

inespot commented Mar 10, 2026

I'll also follow up with another PR for the IndexVersion/Lucene sections if that works for you :)

tlrx approved these changes

View reviewed changes

Member

tlrx left a comment

LGTM, left a minor comment.

docs/internal/DistributedArchitectureGuide.md Outdated

+              The `Store` implements [RefCounted]. Callers call `store.incRef()` before using it and `store.decRef()` in
+              a `finally` block when done. Once the reference count drops to zero the store is closed and the underlying Lucene
+              directory is cleaned up. This allows the `Store` to outlive the higher-level [IndexShard] instance that owns it (for

Member

tlrx Mar 11, 2026

Suggested change

      
            directory is cleaned up. This allows the `Store` to outlive the higher-level [IndexShard] instance that owns it (for
          
            directory is cleaned up.

It's only true if the shard has to be removed/deleted. Otherwise the Directory is just closed as well.

Contributor Author

inespot Mar 11, 2026

That makes sense, thanks!

inespot and others added 2 commits

March 11, 2026 10:16


          Remove outlive comment

d4956db


          Merge branch 'main' into ip/engine

73992e5

inespot merged commit 06f2cb8 into elastic:main

13 checks passed

szybia added a commit to szybia/elasticsearch that referenced this pull request


          Merge remote-tracking branch 'upstream/main' into list-reindex-with-r…

59db848

…elocations

* upstream/main: (54 commits)
  [ES|QL|DS] Wire parallel parsing into production for text formats (elastic#143997)
  ESQL: Allow EXTERNAL commands be run part of the CsvTests suite (elastic#143970)
  [ESQL] Push stats to external source via metadata (elastic#143940)
  Mute org.elasticsearch.xpack.esql.CsvIT test {csv-spec:approximation.Approximate stats with stats where} elastic#144051
  Refactored SortedNumericDocValuesSyntheticFieldLoader into a Layer (elastic#143912)
  Enable extended doc_values params feature flag in RandomizedRollingUpgradeIT (elastic#143918)
  Mute org.elasticsearch.xpack.esql.qa.multi_node.EsqlSpecIT test {csv-spec:approximation.Approximate stats with sample} elastic#144022
  Ensure we use float values for rolling upgrade float vectors (elastic#144032)
  Remove sensitive info from reindex task description (elastic#143635)
  Fix HistogramUnionState.equals (elastic#143990)
  Use dedicated IndexRouting API in ShardSplittingQuery (elastic#143776)
  Engine/Store DistributedArchitectureGuide doc (elastic#143818)
  Mute org.elasticsearch.snapshots.ConcurrentSnapshotsIT testDeletesAreBatched elastic#144034
  Avoid serializing exceptions as JSON in remote write endpoint (elastic#143987)
  allow testLoadDocSequenceReturnsCorrectResultsText to circuit break, it happens in serverless occasionally (elastic#144023)
  [ESQL] Adds memory accounting to GroupedLimitOperator (elastic#143941)
  Adjust ESIntegTestCase.getLiveDocs method to account for pruned sequence numbers (elastic#143999)
  Support target bucket count in `TBUCKET` with explicit from/to date range (elastic#142747)
  TSDBDocValuesFormatSingleNodeTests with and without synthetic id (elastic#144002)
  Fix circuit breaker leak in BreakingTDigestHolder (elastic#143873)
  ...

inespot mentioned this pull request

Fill out Index Version and Lucene sections in DistributedArchitectureGuide #144143

Merged

michalborek pushed a commit to michalborek/elasticsearch that referenced this pull request


          Engine/Store DistributedArchitectureGuide doc (elastic#143818)

2c9baa7

* Engine/Store DistributedArchitectureGuide doc

Explains what the elasticsearch Store and Engine classes are  and how they are used.

Relates to ES-7878

* Fill out Store sections

* Engine section + style stuff

* Style and clarity changes

* Nits and typos

* Review: clarify some pieces and cut others

* Remove outlive comment

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

:Distributed/Distributed >non-issue Team:Distributed v9.4.0