Skip to content

Conversation

@bleskes
Copy link
Contributor

@bleskes bleskes commented Jun 17, 2016

In the past, we had the semantics where the very first cluster state a node processed after joining could not contain shard assignment to it. This was to make sure the node cleans up local / stale shard copies before receiving new ones that might confuse it. Since then a lot of work in this area, most notably the introduction of allocation ids and #17270 . This means we don't have to be careful and just reroute in the same cluster state change where we process the join, keeping things simple and following the same pattern we have in other places.

@bleskes bleskes added :Cluster :Distributed Coordination/Discovery-Plugins Anything related to our integration plugins with EC2, GCP and Azure v5.0.0-alpha4 labels Jun 17, 2016
@bleskes
Copy link
Contributor Author

bleskes commented Jun 17, 2016

@ywelsch can you take a look?

@bleskes bleskes closed this Jun 17, 2016
@bleskes bleskes reopened this Jun 17, 2016
if (nodesChanged) {
newState.nodes(nodesBuilder);
final ClusterState tmpState = newState.build();
RoutingAllocation.Result result = routingService.getAllocationService().reroute(tmpState, "node_join");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ZenDiscovery / NodeJoinController is only using AllocationService now, no need forRoutingService. We can directly useAllocationService` as dependency for ZD / NJC.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should also do the same in LocalDiscovery as we do here.
We can then also remove getAllocationService from RoutingService.

@ywelsch
Copy link
Contributor

ywelsch commented Jun 17, 2016

Left 2 comments. I really like this change!

@bleskes
Copy link
Contributor Author

bleskes commented Jun 17, 2016

@ywelsch I pushed another commit addressing your comments

@Override
public void setRoutingService(RoutingService routingService) {
this.routingService = routingService;
public void setAllocationService(AllocationService allocationService) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add the @Override back? (here and in all the other subclasses of Discovery)

@ywelsch
Copy link
Contributor

ywelsch commented Jun 17, 2016

Left one minor, no need for another iteration. LGTM. Thanks @bleskes!

@bleskes bleskes merged commit 46b40f7 into elastic:master Jun 17, 2016
@bleskes bleskes deleted the node_join_inline_reroute branch June 17, 2016 15:32
@bleskes bleskes restored the node_join_inline_reroute branch June 23, 2016 06:42
bleskes added a commit that referenced this pull request Jun 23, 2016
…8938)

There are secondary issues with async shard fetch going out to nodes before they have a cluster state published to them that need to be solved first. For example:
- async fetch uses transport node action that resolves nodes based on the cluster state (but it's not yet exposed by ClusterService since we inline the reroute)
- after disruption nodes will respond with an allocated shard (they didn't clean up their shards yet) which throws of decisions master side.
- nodes deed the index meta data in question but they may not have if they didn't recieve the latest CS
@bleskes
Copy link
Contributor Author

bleskes commented Jun 23, 2016

I reverted this one due to secondary problems with async shard fetch. I'm working on fixing those before re-committing this.

From the revert commit message:

There are secondary issues with async shard fetch going out to nodes before they have a cluster state published to them that need to be solved first. For example:

  • async fetch uses transport node action that resolves nodes based on the cluster state (but it's not yet exposed by ClusterService since we inline the reroute)
  • after disruption nodes will respond with an allocated shard (they didn't clean up their shards yet) which throws of decisions master side.
  • nodes deed the index meta data in question but they may not have if they didn't recieve the latest CS

jasontedor added a commit that referenced this pull request Jun 23, 2016
* master: (416 commits)
  docs: removed obsolete information, percolator queries are not longer loaded into jvm heap memory.
  Upgrade JNA to 4.2.2 and remove optionality
  [TEST] Increase timeouts for Rest test client (#19042)
  Update migrate_5_0.asciidoc
  Add ThreadLeakLingering option to Rest client tests
  Add a MultiTermAwareComponent marker interface to analysis factories. #19028
  Attempt at fixing IndexStatsIT.testFilterCacheStats.
  Fix docs build.
  Move templates out of the Search API, into lang-mustache module
  revert - Inline reroute with process of node join/master election (#18938)
  Build valid slices in SearchSourceBuilderTests
  Docs: Convert aggs/misc to CONSOLE
  Docs: migration notes for _timestamp and _ttl
  Group client projects under :client
  [TEST] Add client-test module and make client tests use randomized runner directly
  Move upgrade test to upgrade from version 2.3.3
  Tasks: Add completed to the mapping
  Fail to start if plugin tries broken onModule
  Remove duplicated read byte array methods
  Rename `fields` to `stored_fields` and add `docvalue_fields`
  ...
bleskes added a commit that referenced this pull request Jun 27, 2016
…oth on master and non data nodes (#19044)

#18938 has changed the timing in which we send out to nodes to fetch their shard stores. Instead of doing this after the cluster state resulting of the node's join was published, #18938 made it be sent concurrently to the publishing processes. This revealed a couple of points where the shard store fetching is dependent of the current state of affairs of the cluster state, both on the master and the data nodes. The problem discovered were already present without #18938 but required a failure/extreme situations to make them happen.This PR tries to remove as much as possible of these dependencies making shard store fetching simpler and make the way to re-introduce #18938 which was reverted.

These are the notable changes:
1) Allow TransportNodesAction (of which shard store fetching is derived) callers to supply concrete disco nodes, so it won't need the cluster state to resolve them. This was a problem because the cluster state containing the needed nodes was not yet made available through ClusterService. Note that long term we can expect the rest layer to resolve node ids to concrete nodes, making this mode the only one needed.
2) The data node relied on the cluster state to have the relevant index meta data so it can find data when custom paths are used. We now fall back to read the meta data from disk if needed.
3) The data node was relying on it's own IndexService state to indicate whether the data it has corresponds to an existing allocation. This is of course something it can not know until it got (and processed) the new cluster state from the master. This flag in the response is now removed. This is not a problem because we used that flag to protect against double assigning of a shard to the same node, but we are already protected from it by the allocation deciders.
4) I removed the redundant filterNodeIds method in TransportNodesAction - if people want to filter they can override resolveRequest.
bleskes added a commit to bleskes/elasticsearch that referenced this pull request Jul 4, 2016
bleskes added a commit that referenced this pull request Jul 18, 2016
We currently have concurrency issue between the static methods on the Store class and store changes that are done via a valid open store. An example of this is the async shard fetch which can reach out to a node while a local shard copy is shutting down (the fetch does check if we have an open shard and tries to use that first, but if the shard is shutting down, it will not be available from IndexService).

Specifically, async shard fetching tries to read metadata from store, concurrently the shard that shuts down commits to lucene, changing the segments_N file. this causes a file not find exception on the shard fetching side. That one in turns makes the master think the shard is unusable. In tests this can cause the shard assignment to be delayed (up to 1m) which fails tests. See https://elasticsearch-ci.elastic.co/job/elastic+elasticsearch+master+java9-periodic/570 for details.

This is one of the things #18938 caused to bubble up.
@clintongormley clintongormley added the :Distributed Indexing/Distributed A catch all label for anything in the Distributed Indexing Area. Please avoid if you can. label Feb 13, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

:Distributed Coordination/Discovery-Plugins Anything related to our integration plugins with EC2, GCP and Azure :Distributed Indexing/Distributed A catch all label for anything in the Distributed Indexing Area. Please avoid if you can. >enhancement v5.0.0-alpha5

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants