Skip to content

Engine/Store DistributedArchitectureGuide doc#143818

Merged
inespot merged 10 commits intoelastic:mainfrom
inespot:ip/engine
Mar 11, 2026
Merged

Engine/Store DistributedArchitectureGuide doc#143818
inespot merged 10 commits intoelastic:mainfrom
inespot:ip/engine

Conversation

@inespot
Copy link
Copy Markdown
Contributor

@inespot inespot commented Mar 9, 2026

Explains what the elasticsearch Store and Engine classes are and how they are used.

Relates to ES-7878

Explains what the elasticsearch Store and Engine classes are  and how they are used.

Relates to ES-7878
@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Mar 9, 2026

🔍 Preview links for changed docs

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Mar 9, 2026

ℹ️ Important: Docs version tagging

👋 Thanks for updating the docs! Just a friendly reminder that our docs are now cumulative. This means all 9.x versions are documented on the same page and published off of the main branch, instead of creating separate pages for each minor version.

We use applies_to tags to mark version-specific features and changes.

Expand for a quick overview

When to use applies_to tags:

✅ At the page level to indicate which products/deployments the content applies to (mandatory)
✅ When features change state (e.g. preview, ga) in a specific version
✅ When availability differs across deployments and environments

What NOT to do:

❌ Don't remove or replace information that applies to an older version
❌ Don't add new information that applies to a specific version without an applies_to tag
❌ Don't forget that applies_to tags can be used at the page, section, and inline level

🤔 Need help?

@inespot inespot marked this pull request as ready for review March 9, 2026 19:06
@inespot inespot added :Distributed/Distributed A catch all label for anything in the Distributed Area. Please avoid if you can. >non-issue labels Mar 9, 2026
@inespot inespot requested a review from tlrx March 9, 2026 19:06
@elasticsearchmachine elasticsearchmachine added the Team:Distributed Meta label for distributed team. label Mar 9, 2026
@elasticsearchmachine
Copy link
Copy Markdown
Collaborator

Pinging @elastic/es-distributed (Team:Distributed)

Copy link
Copy Markdown
Member

@tlrx tlrx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great! I left some comments, feel free to apply/adjust at your convenience.

an [IndexInput](https://lucene.apache.org/core/10_3_2/core/org/apache/lucene/store/IndexInput.html) to read a named file
and create an [IndexOutput](https://lucene.apache.org/core/10_3_2/core/org/apache/lucene/store/IndexOutput.html) to
write one. The `Store` builds on the Lucene `Directory` capabilities by tracking committed file metadata and enforcing
integrity invariants. It also adds reference counting and corruption detection.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think "tracking committed file metadata" is a bit misleading as it doesn't really track but instead provides access to committed file metadata.

allocation (via `TransportNodesListShardStoreMetadata`). They are used to compare the on-disk state of two distinct
shards and calculate how much data needs to be transferred to bring them into sync.

During peer recovery, the recovering shard (target) calls `indexShard.snapshotStoreMetadata()` to capture its
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a dedicated section in the guide for Peer Recovery, maybe we should just mention it is used as part of it and link to the section? That seems a bit too low level to me.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good, I'll shorten this piece and link the peer recovery one instead!

`MetadataSnapshot` is also included in the subsequent `RecoveryCleanFilesRequest` so the target knows which files belong
to the new commit and can delete anything stale.

During replica shard allocation, before allocating a replica, the master
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same, that looks a bit too low level for the guide to me. Mentioning that MetadataSnapshot is used as part of shard allocation like it is done seems sufficient.

The [ShardLock] is a node-wide, coarse-grained lock managed by [NodeEnvironment]. It is backed by a
`Semaphore` and guarantees that at most one owner at a time has write access to a given shard directory
within a JVM process. The
`Store` [acquires](https://github.com/elastic/elasticsearch/blob/v9.3.0/server/src/main/java/org/elasticsearch/index/store/Store.java#L169)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It doesn't really acquires it, but it is instead given to the Store.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yep, you are right, it's passed to the Store at construction, will rephrase this

within a JVM process. The
`Store` [acquires](https://github.com/elastic/elasticsearch/blob/v9.3.0/server/src/main/java/org/elasticsearch/index/store/Store.java#L169)
a `ShardLock` when it is created and holds it for its entire lifetime. Write operations such as creating an
`IndexWriter`, deleting shard files, or recovering from another shard all require holding the `ShardLock` first. Callers
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Write operations such as creating an IndexWriter, deleting shard files, or recovering from another shard all require holding the ShardLock first.

As I read it, the sentence implies that these operations actively "require holding the ShardLock first" as if they check for it.

We could maybe mention that the ShardLock is held for the entire lifetime of the Store, ensuring that write operations such as creating an IndexWriter, deleting shard files, or recovering from another shard have exclusive access to the shard directory?


To summarize, `ShardLock` is a JVM-level lock enforced across the entire node, `metadataLock`
coordinates in-process readers and writers within the `Store`, and the Lucene write lock guards the raw
directory at the file-system level.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍


### Engine

The [Engine] abstract class is the Elasticsearch abstraction for the live shard index. Where the `Store`
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm struggling understanding what a live shard index means 😅

Maybe just:

Suggested change
The [Engine] abstract class is the Elasticsearch abstraction for the live shard index. Where the `Store`
The [Engine] abstract class is the Elasticsearch abstraction that manages and coordinates operations on the running shard index.

?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, maybe that was not the most clear phrasing :) Happy to use yours

gives read-only access to a frozen shard, and
the [NoOpEngine](https://github.com/elastic/elasticsearch/blob/v9.3.0/server/src/main/java/org/elasticsearch/index/engine/NoOpEngine.java)
acts as a do-nothing placeholder for shards belonging to a closed index, where an engine object must exist but
all read and write operations throw `UnsupportedOperationException`.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure all read operations really throws, but it's a detail.

Could we mention that NoOpEngine exists only to allow shards of closed indices to be correctly replicated in case of a node failure?

@inespot inespot requested a review from tlrx March 10, 2026 19:00
@inespot
Copy link
Copy Markdown
Contributor Author

inespot commented Mar 10, 2026

I'll also follow up with another PR for the IndexVersion/Lucene sections if that works for you :)

Copy link
Copy Markdown
Member

@tlrx tlrx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, left a minor comment.


The `Store` implements [RefCounted]. Callers call `store.incRef()` before using it and `store.decRef()` in
a `finally` block when done. Once the reference count drops to zero the store is closed and the underlying Lucene
directory is cleaned up. This allows the `Store` to outlive the higher-level [IndexShard] instance that owns it (for
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
directory is cleaned up. This allows the `Store` to outlive the higher-level [IndexShard] instance that owns it (for
directory is cleaned up.

It's only true if the shard has to be removed/deleted. Otherwise the Directory is just closed as well.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That makes sense, thanks!

@inespot inespot merged commit 06f2cb8 into elastic:main Mar 11, 2026
13 checks passed
szybia added a commit to szybia/elasticsearch that referenced this pull request Mar 11, 2026
…elocations

* upstream/main: (54 commits)
  [ES|QL|DS] Wire parallel parsing into production for text formats (elastic#143997)
  ESQL: Allow EXTERNAL commands be run part of the CsvTests suite (elastic#143970)
  [ESQL] Push stats to external source via metadata (elastic#143940)
  Mute org.elasticsearch.xpack.esql.CsvIT test {csv-spec:approximation.Approximate stats with stats where} elastic#144051
  Refactored SortedNumericDocValuesSyntheticFieldLoader into a Layer (elastic#143912)
  Enable extended doc_values params feature flag in RandomizedRollingUpgradeIT (elastic#143918)
  Mute org.elasticsearch.xpack.esql.qa.multi_node.EsqlSpecIT test {csv-spec:approximation.Approximate stats with sample} elastic#144022
  Ensure we use float values for rolling upgrade float vectors (elastic#144032)
  Remove sensitive info from reindex task description (elastic#143635)
  Fix HistogramUnionState.equals (elastic#143990)
  Use dedicated IndexRouting API in ShardSplittingQuery (elastic#143776)
  Engine/Store DistributedArchitectureGuide doc (elastic#143818)
  Mute org.elasticsearch.snapshots.ConcurrentSnapshotsIT testDeletesAreBatched elastic#144034
  Avoid serializing exceptions as JSON in remote write endpoint (elastic#143987)
  allow testLoadDocSequenceReturnsCorrectResultsText to circuit break, it happens in serverless occasionally (elastic#144023)
  [ESQL] Adds memory accounting to GroupedLimitOperator (elastic#143941)
  Adjust ESIntegTestCase.getLiveDocs method to account for pruned sequence numbers (elastic#143999)
  Support target bucket count in `TBUCKET` with explicit from/to date range (elastic#142747)
  TSDBDocValuesFormatSingleNodeTests with and without synthetic id (elastic#144002)
  Fix circuit breaker leak in BreakingTDigestHolder (elastic#143873)
  ...
michalborek pushed a commit to michalborek/elasticsearch that referenced this pull request Mar 23, 2026
* Engine/Store DistributedArchitectureGuide doc

Explains what the elasticsearch Store and Engine classes are  and how they are used.

Relates to ES-7878

* Fill out Store sections

* Engine section + style stuff

* Style and clarity changes

* Nits and typos

* Review: clarify some pieces and cut others

* Remove outlive comment
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

:Distributed/Distributed A catch all label for anything in the Distributed Area. Please avoid if you can. >non-issue Team:Distributed Meta label for distributed team. v9.4.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants