Skip to content

Conversation

@jasontedor
Copy link
Member

In Elasticsearch 5.3.0 a bug was introduced in the merging of default settings when the target setting existed as an array. When this bug concerns path.data and default.path.data, we ended up in a situation where the paths specified in both settings would be used to write index data. Since our packaging sets default.path.data, users that configure multiple data paths via an array and use the packaging are subject to having shards land in paths in default.path.data when that is very likely not what they intended.

This commit is an attempt to rectify this situation. If path.data and default.path.data are configured, we check for the presence of indices there. If we find any, we log messages explaining the situation and fail the node.

Closes #23981, supersedes #24052, relates #24074, relates #24093

In Elasticsearch 5.3.0 a bug was introduced in the merging of default
settings when the target setting existed as an array. When this bug
concerns path.data and default.path.data, we ended up in a situation
where the paths specified in both settings would be used to write index
data. Since our packaging sets default.path.data, users that configure
multiple data paths via an array and use the packaging are subject to
having shards land in paths in default.path.data when that is very
likely not what they intended.

This commit is an attempt to rectify this situation. If path.data and
default.path.data are configured, we check for the presence of indices
there. If we find any, we log messages explaining the situation and fail
the node.
Copy link
Contributor

@s1monw s1monw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure this change is OK and I would like to discuss different solutions for this down the road. I'd like to not add the permission and find the possible issue earlier in the game. I think we can only do that if we ignore the nodeLockID which I think we should. It is an edgecase we are trying to solve. if we'd not look at that and only look if there is data in default.path.data we would do as good as this solution IMO but with much less trouble. I think this is over engineering we are trying to do here but I am ok with compromising on dropping the sophisticated method here in 6.0 since our recommendation is to upgrade to a latest version first before you upgrade even if it's a full cluster restart. We can still have a simpler check in 6.0 then. @rjernst WDYT can you review this as well?

return nodePaths;
}

public int nodeLockId() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we call this getNodeLockId() pls

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I pushed b21702e.

* @param logger a logger where messages regarding the detection will be logged
* @throws IOException if an I/O exception occurs reading the directory structure
*/
static void checkForIndexDataInDefaultPathData(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you have 140 chars use them :)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I pushed b21702e.

final Settings settings,
final NodeEnvironment nodeEnv,
final Logger logger) throws IOException {
if (!Environment.PATH_DATA_SETTING.exists(settings) || !Environment.DEFAULT_PATH_DATA_SETTING.exists(settings)) return;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please wrap with {}

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I pushed b21702e.

}
}

if (clean) return;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please wrap {}

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I pushed b21702e.

boolean clean = true;
for (final String defaultPathData : Environment.DEFAULT_PATH_DATA_SETTING.get(settings)) {
final Path nodeDirectory = NodeEnvironment.resolveNodePath(getPath(defaultPathData), nodeEnv.nodeLockId());
if (!Files.exists(nodeDirectory)) continue;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please wrap in {} and use == false

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I pushed b21702e.

if (!Files.exists(nodeDirectory)) continue;
final NodeEnvironment.NodePath nodePath = new NodeEnvironment.NodePath(nodeDirectory);
final Set<String> availableIndexFolders = nodeEnv.availableIndexFoldersForPath(nodePath);
if (availableIndexFolders.isEmpty()) continue;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please wrap in {}

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I pushed b21702e.

@s1monw
Copy link
Contributor

s1monw commented Apr 17, 2017

@jasontedor lets merge this and I will look into improving this in master. I think the solution is good enough and we should make progress over perfection

@jasontedor jasontedor merged commit 8033c57 into elastic:master Apr 17, 2017
jasontedor added a commit that referenced this pull request Apr 17, 2017
In Elasticsearch 5.3.0 a bug was introduced in the merging of default
settings when the target setting existed as an array. When this bug
concerns path.data and default.path.data, we ended up in a situation
where the paths specified in both settings would be used to write index
data. Since our packaging sets default.path.data, users that configure
multiple data paths via an array and use the packaging are subject to
having shards land in paths in default.path.data when that is very
likely not what they intended.

This commit is an attempt to rectify this situation. If path.data and
default.path.data are configured, we check for the presence of indices
there. If we find any, we log messages explaining the situation and fail
the node.

Relates #24099
jasontedor added a commit that referenced this pull request Apr 17, 2017
In Elasticsearch 5.3.0 a bug was introduced in the merging of default
settings when the target setting existed as an array. When this bug
concerns path.data and default.path.data, we ended up in a situation
where the paths specified in both settings would be used to write index
data. Since our packaging sets default.path.data, users that configure
multiple data paths via an array and use the packaging are subject to
having shards land in paths in default.path.data when that is very
likely not what they intended.

This commit is an attempt to rectify this situation. If path.data and
default.path.data are configured, we check for the presence of indices
there. If we find any, we log messages explaining the situation and fail
the node.

Relates #24099
jasontedor added a commit that referenced this pull request Apr 17, 2017
In Elasticsearch 5.3.0 a bug was introduced in the merging of default
settings when the target setting existed as an array. When this bug
concerns path.data and default.path.data, we ended up in a situation
where the paths specified in both settings would be used to write index
data. Since our packaging sets default.path.data, users that configure
multiple data paths via an array and use the packaging are subject to
having shards land in paths in default.path.data when that is very
likely not what they intended.

This commit is an attempt to rectify this situation. If path.data and
default.path.data are configured, we check for the presence of indices
there. If we find any, we log messages explaining the situation and fail
the node.

Relates #24099
@jasontedor jasontedor deleted the default-path-data-detection branch April 17, 2017 16:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants