Engine/Store DistributedArchitectureGuide doc#143818
Conversation
Explains what the elasticsearch Store and Engine classes are and how they are used. Relates to ES-7878
🔍 Preview links for changed docs |
ℹ️ Important: Docs version tagging👋 Thanks for updating the docs! Just a friendly reminder that our docs are now cumulative. This means all 9.x versions are documented on the same page and published off of the main branch, instead of creating separate pages for each minor version. We use applies_to tags to mark version-specific features and changes. Expand for a quick overviewWhen to use applies_to tags:✅ At the page level to indicate which products/deployments the content applies to (mandatory) What NOT to do:❌ Don't remove or replace information that applies to an older version 🤔 Need help?
|
|
Pinging @elastic/es-distributed (Team:Distributed) |
tlrx
left a comment
There was a problem hiding this comment.
Looks great! I left some comments, feel free to apply/adjust at your convenience.
| an [IndexInput](https://lucene.apache.org/core/10_3_2/core/org/apache/lucene/store/IndexInput.html) to read a named file | ||
| and create an [IndexOutput](https://lucene.apache.org/core/10_3_2/core/org/apache/lucene/store/IndexOutput.html) to | ||
| write one. The `Store` builds on the Lucene `Directory` capabilities by tracking committed file metadata and enforcing | ||
| integrity invariants. It also adds reference counting and corruption detection. |
There was a problem hiding this comment.
I think "tracking committed file metadata" is a bit misleading as it doesn't really track but instead provides access to committed file metadata.
| allocation (via `TransportNodesListShardStoreMetadata`). They are used to compare the on-disk state of two distinct | ||
| shards and calculate how much data needs to be transferred to bring them into sync. | ||
|
|
||
| During peer recovery, the recovering shard (target) calls `indexShard.snapshotStoreMetadata()` to capture its |
There was a problem hiding this comment.
There is a dedicated section in the guide for Peer Recovery, maybe we should just mention it is used as part of it and link to the section? That seems a bit too low level to me.
There was a problem hiding this comment.
Sounds good, I'll shorten this piece and link the peer recovery one instead!
| `MetadataSnapshot` is also included in the subsequent `RecoveryCleanFilesRequest` so the target knows which files belong | ||
| to the new commit and can delete anything stale. | ||
|
|
||
| During replica shard allocation, before allocating a replica, the master |
There was a problem hiding this comment.
Same, that looks a bit too low level for the guide to me. Mentioning that MetadataSnapshot is used as part of shard allocation like it is done seems sufficient.
| The [ShardLock] is a node-wide, coarse-grained lock managed by [NodeEnvironment]. It is backed by a | ||
| `Semaphore` and guarantees that at most one owner at a time has write access to a given shard directory | ||
| within a JVM process. The | ||
| `Store` [acquires](https://github.com/elastic/elasticsearch/blob/v9.3.0/server/src/main/java/org/elasticsearch/index/store/Store.java#L169) |
There was a problem hiding this comment.
It doesn't really acquires it, but it is instead given to the Store.
There was a problem hiding this comment.
Ah yep, you are right, it's passed to the Store at construction, will rephrase this
| within a JVM process. The | ||
| `Store` [acquires](https://github.com/elastic/elasticsearch/blob/v9.3.0/server/src/main/java/org/elasticsearch/index/store/Store.java#L169) | ||
| a `ShardLock` when it is created and holds it for its entire lifetime. Write operations such as creating an | ||
| `IndexWriter`, deleting shard files, or recovering from another shard all require holding the `ShardLock` first. Callers |
There was a problem hiding this comment.
Write operations such as creating an
IndexWriter, deleting shard files, or recovering from another shard all require holding theShardLockfirst.
As I read it, the sentence implies that these operations actively "require holding the ShardLock first" as if they check for it.
We could maybe mention that the ShardLock is held for the entire lifetime of the Store, ensuring that write operations such as creating an IndexWriter, deleting shard files, or recovering from another shard have exclusive access to the shard directory?
|
|
||
| To summarize, `ShardLock` is a JVM-level lock enforced across the entire node, `metadataLock` | ||
| coordinates in-process readers and writers within the `Store`, and the Lucene write lock guards the raw | ||
| directory at the file-system level. |
|
|
||
| ### Engine | ||
|
|
||
| The [Engine] abstract class is the Elasticsearch abstraction for the live shard index. Where the `Store` |
There was a problem hiding this comment.
I'm struggling understanding what a live shard index means 😅
Maybe just:
| The [Engine] abstract class is the Elasticsearch abstraction for the live shard index. Where the `Store` | |
| The [Engine] abstract class is the Elasticsearch abstraction that manages and coordinates operations on the running shard index. |
?
There was a problem hiding this comment.
Yep, maybe that was not the most clear phrasing :) Happy to use yours
| gives read-only access to a frozen shard, and | ||
| the [NoOpEngine](https://github.com/elastic/elasticsearch/blob/v9.3.0/server/src/main/java/org/elasticsearch/index/engine/NoOpEngine.java) | ||
| acts as a do-nothing placeholder for shards belonging to a closed index, where an engine object must exist but | ||
| all read and write operations throw `UnsupportedOperationException`. |
There was a problem hiding this comment.
I'm not sure all read operations really throws, but it's a detail.
Could we mention that NoOpEngine exists only to allow shards of closed indices to be correctly replicated in case of a node failure?
|
I'll also follow up with another PR for the IndexVersion/Lucene sections if that works for you :) |
|
|
||
| The `Store` implements [RefCounted]. Callers call `store.incRef()` before using it and `store.decRef()` in | ||
| a `finally` block when done. Once the reference count drops to zero the store is closed and the underlying Lucene | ||
| directory is cleaned up. This allows the `Store` to outlive the higher-level [IndexShard] instance that owns it (for |
There was a problem hiding this comment.
| directory is cleaned up. This allows the `Store` to outlive the higher-level [IndexShard] instance that owns it (for | |
| directory is cleaned up. |
It's only true if the shard has to be removed/deleted. Otherwise the Directory is just closed as well.
There was a problem hiding this comment.
That makes sense, thanks!
…elocations * upstream/main: (54 commits) [ES|QL|DS] Wire parallel parsing into production for text formats (elastic#143997) ESQL: Allow EXTERNAL commands be run part of the CsvTests suite (elastic#143970) [ESQL] Push stats to external source via metadata (elastic#143940) Mute org.elasticsearch.xpack.esql.CsvIT test {csv-spec:approximation.Approximate stats with stats where} elastic#144051 Refactored SortedNumericDocValuesSyntheticFieldLoader into a Layer (elastic#143912) Enable extended doc_values params feature flag in RandomizedRollingUpgradeIT (elastic#143918) Mute org.elasticsearch.xpack.esql.qa.multi_node.EsqlSpecIT test {csv-spec:approximation.Approximate stats with sample} elastic#144022 Ensure we use float values for rolling upgrade float vectors (elastic#144032) Remove sensitive info from reindex task description (elastic#143635) Fix HistogramUnionState.equals (elastic#143990) Use dedicated IndexRouting API in ShardSplittingQuery (elastic#143776) Engine/Store DistributedArchitectureGuide doc (elastic#143818) Mute org.elasticsearch.snapshots.ConcurrentSnapshotsIT testDeletesAreBatched elastic#144034 Avoid serializing exceptions as JSON in remote write endpoint (elastic#143987) allow testLoadDocSequenceReturnsCorrectResultsText to circuit break, it happens in serverless occasionally (elastic#144023) [ESQL] Adds memory accounting to GroupedLimitOperator (elastic#143941) Adjust ESIntegTestCase.getLiveDocs method to account for pruned sequence numbers (elastic#143999) Support target bucket count in `TBUCKET` with explicit from/to date range (elastic#142747) TSDBDocValuesFormatSingleNodeTests with and without synthetic id (elastic#144002) Fix circuit breaker leak in BreakingTDigestHolder (elastic#143873) ...
* Engine/Store DistributedArchitectureGuide doc Explains what the elasticsearch Store and Engine classes are and how they are used. Relates to ES-7878 * Fill out Store sections * Engine section + style stuff * Style and clarity changes * Nits and typos * Review: clarify some pieces and cut others * Remove outlive comment
Explains what the elasticsearch Store and Engine classes are and how they are used.
Relates to ES-7878