Skip to content

Conversation

@bleskes
Copy link
Contributor

@bleskes bleskes commented Jul 13, 2016

We currently have concurrency issue between the static methods on the Store class and store changes that are done via a valid open store. An example of this is the async shard fetch which can reach out to a node while a local shard copy is shutting down (the fetch does check if we have an open shard and tries to use that first, but if the shard is shutting down, it will not be available from IndexService).

Specifically, async shard fetching tries to read metadata from store, concurrently the shard that shuts down commits to lucene, changing the segments_N file. this causes a file not find exception on the shard fetching side. That one in turns makes the master think the shard is unusable. In tests this can cause the shard assignment to be delayed (up to 1m) which fails tests. See https://elasticsearch-ci.elastic.co/job/elastic+elasticsearch+master+java9-periodic/570 for details.

This is one of the things #18938 caused to bubble up.

bleskes added 3 commits July 13, 2016 14:05
We currently have concurrency issue between the static methods on the Store class and store changes that are done via a valid open store. An example of this is the async shard fetch which can reach out to a node while a local shard copy is shutting down (the fetch does check if we have an open shard and tries to use that first, but if the shard is shutting down, it will not be available from IndexService).

Specifically, async shard fetching tries to read metadata from store, concurrently the shard that shuts down commits to lucene, changing the segments_N file. this causes a file not find exception on the shard fetching side. That one in turns makes the master think the shard is unusable. In tests this can cause the shard assignment to be delayed (up to 1m) which fails tests. See https://elasticsearch-ci.elastic.co/job/elastic+elasticsearch+master+java9-periodic/570 for details.
@bleskes bleskes added >bug :Distributed Indexing/Store Issues around managing unopened Lucene indices. If it touches Store.java, this is a likely label. v5.0.0-alpha5 labels Jul 13, 2016
}
return new StoreFilesMetaData(shardId, Store.readMetadataSnapshot(shardPath.resolveIndex(), shardId, logger));
// note that this may fail if it can't get access to the shard lock. Since we check above there is an active shard, this means:
// 1) a shard is being constructed, which means the master will not use a copy of this replcia
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you mean to add a 2) here?

Copy link
Contributor Author

@bleskes bleskes Jul 14, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hehe, yeah - ADD for the works :)
2) A shard is shutting down and has not cleared it's content within lock timeout. In this case the master may not reuse local resources.

bleskes added a commit that referenced this pull request Jul 18, 2016
pending the merge of #19416
* A shard lock supplier that is used by the static methods on this class. Normal methods rely on
* the shard lock passed to the constructor.
*/
@FunctionalInterface
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we move this into NodeEnv instead?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure

@s1monw
Copy link
Contributor

s1monw commented Jul 18, 2016

left on nit - LGTM otherwise

@ywelsch
Copy link
Contributor

ywelsch commented Jul 18, 2016

LGTM

@bleskes bleskes merged commit 9ededa4 into elastic:master Jul 18, 2016
@bleskes bleskes deleted the store_static_access_lock branch July 18, 2016 09:23
bleskes added a commit that referenced this pull request Aug 3, 2016
In several places in our code we need to get a consistent list of files + metadata of the current index. We currently have a couple of ways to do in the `Store` class, which also does the right things and tries to verify the integrity of the smaller files. Sadly, those methods can run into trouble if anyone writes into the folder while they are busy. Most notably, the index shard's engine decides to commit half way and remove a `segment_N` file before the store got to checksum (but did already list it). This race condition typically doesn't happen as almost all of the places where we list files also happen to be places where the relevant shard doesn't yet have an engine. There  is however an exception (of course :)) which is the API to list shard stores, used by the master when it is looking for shard copies to assign to.

I already took one shot at fixing this in #19416 , but it turns out not to be enough - see for example https://elasticsearch-ci.elastic.co/job/elastic+elasticsearch+master+multijob-os-compatibility/os=sles/822.

The first inclination to fix this was to add more locking to the different Store methods and acquire the `IndexWriter` lock, thus preventing any engine for accessing if if the a shard is offline and use the current index commit snapshotting logic already existing in `IndexShard` for when the engine is started. That turned out to be a bad idea as we create more subtleties where, for example, a store listing can prevent a shard from starting up (the writer lock doesn't wait if it can't get access, but fails immediately, which is good). Another example is running on a shared directory where some other engine may actually hold the lock.

Instead I decided to take another approach:
1) Remove all the various methods on store and keep one, which accepts an index commit (which can be null) and also clearly communicates that the *caller* is responsible for concurrent access. This also tightens up the API which is a plus.
2) Add a `snapshotStore` method to IndexShard that takes care of all the concurrency aspects with the engine, which is now possible because it's all in the same place. It's still a bit ugly but at least it's all in one place and we can evaluate how to improve on this later on. I also renamed the  `snapshotIndex` method to `acquireIndexCommit` to avoid confusion and I think it communicates better what it does.
ywelsch added a commit that referenced this pull request Nov 22, 2016
…ked during shard state fetching (#21656)

PR #19416 added a safety mechanism to shard state fetching to only access the store when the shard lock can be acquired. This can lead to the following situation however where a shard has not fully shut down yet while the shard fetching is going on, resulting in a ShardLockObtainFailedException. PrimaryShardAllocator that decides where to allocate primary shards sees this exception and treats the shard as unusable. If this is the only shard copy in the cluster, the cluster stays red and a new shard fetching cycle will not be triggered as shard state fetching treats exceptions while opening the store as permanent failures.

This commit makes it so that PrimaryShardAllocator treats the locked shard as a possible allocation target (although with the least priority).
ywelsch added a commit that referenced this pull request Nov 22, 2016
…ked during shard state fetching (#21656)

PR #19416 added a safety mechanism to shard state fetching to only access the store when the shard lock can be acquired. This can lead to the following situation however where a shard has not fully shut down yet while the shard fetching is going on, resulting in a ShardLockObtainFailedException. PrimaryShardAllocator that decides where to allocate primary shards sees this exception and treats the shard as unusable. If this is the only shard copy in the cluster, the cluster stays red and a new shard fetching cycle will not be triggered as shard state fetching treats exceptions while opening the store as permanent failures.

This commit makes it so that PrimaryShardAllocator treats the locked shard as a possible allocation target (although with the least priority).
ywelsch added a commit that referenced this pull request Nov 22, 2016
…ked during shard state fetching (#21656)

PR #19416 added a safety mechanism to shard state fetching to only access the store when the shard lock can be acquired. This can lead to the following situation however where a shard has not fully shut down yet while the shard fetching is going on, resulting in a ShardLockObtainFailedException. PrimaryShardAllocator that decides where to allocate primary shards sees this exception and treats the shard as unusable. If this is the only shard copy in the cluster, the cluster stays red and a new shard fetching cycle will not be triggered as shard state fetching treats exceptions while opening the store as permanent failures.

This commit makes it so that PrimaryShardAllocator treats the locked shard as a possible allocation target (although with the least priority).
@clintongormley clintongormley added :Distributed Indexing/Engine Anything around managing Lucene and the Translog in an open shard. and removed :Distributed Indexing/Store Issues around managing unopened Lucene indices. If it touches Store.java, this is a likely label. labels Feb 13, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

>bug :Distributed Indexing/Engine Anything around managing Lucene and the Translog in an open shard. v5.0.0-alpha5

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants