-
Notifications
You must be signed in to change notification settings - Fork 25.6k
Detect remnants of path.data/default.path.data bug #24099
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Detect remnants of path.data/default.path.data bug #24099
Conversation
In Elasticsearch 5.3.0 a bug was introduced in the merging of default settings when the target setting existed as an array. When this bug concerns path.data and default.path.data, we ended up in a situation where the paths specified in both settings would be used to write index data. Since our packaging sets default.path.data, users that configure multiple data paths via an array and use the packaging are subject to having shards land in paths in default.path.data when that is very likely not what they intended. This commit is an attempt to rectify this situation. If path.data and default.path.data are configured, we check for the presence of indices there. If we find any, we log messages explaining the situation and fail the node.
s1monw
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure this change is OK and I would like to discuss different solutions for this down the road. I'd like to not add the permission and find the possible issue earlier in the game. I think we can only do that if we ignore the nodeLockID which I think we should. It is an edgecase we are trying to solve. if we'd not look at that and only look if there is data in default.path.data we would do as good as this solution IMO but with much less trouble. I think this is over engineering we are trying to do here but I am ok with compromising on dropping the sophisticated method here in 6.0 since our recommendation is to upgrade to a latest version first before you upgrade even if it's a full cluster restart. We can still have a simpler check in 6.0 then. @rjernst WDYT can you review this as well?
| return nodePaths; | ||
| } | ||
|
|
||
| public int nodeLockId() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we call this getNodeLockId() pls
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I pushed b21702e.
| * @param logger a logger where messages regarding the detection will be logged | ||
| * @throws IOException if an I/O exception occurs reading the directory structure | ||
| */ | ||
| static void checkForIndexDataInDefaultPathData( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you have 140 chars use them :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I pushed b21702e.
| final Settings settings, | ||
| final NodeEnvironment nodeEnv, | ||
| final Logger logger) throws IOException { | ||
| if (!Environment.PATH_DATA_SETTING.exists(settings) || !Environment.DEFAULT_PATH_DATA_SETTING.exists(settings)) return; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please wrap with {}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I pushed b21702e.
| } | ||
| } | ||
|
|
||
| if (clean) return; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please wrap {}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I pushed b21702e.
| boolean clean = true; | ||
| for (final String defaultPathData : Environment.DEFAULT_PATH_DATA_SETTING.get(settings)) { | ||
| final Path nodeDirectory = NodeEnvironment.resolveNodePath(getPath(defaultPathData), nodeEnv.nodeLockId()); | ||
| if (!Files.exists(nodeDirectory)) continue; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please wrap in {} and use == false
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I pushed b21702e.
| if (!Files.exists(nodeDirectory)) continue; | ||
| final NodeEnvironment.NodePath nodePath = new NodeEnvironment.NodePath(nodeDirectory); | ||
| final Set<String> availableIndexFolders = nodeEnv.availableIndexFoldersForPath(nodePath); | ||
| if (availableIndexFolders.isEmpty()) continue; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please wrap in {}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I pushed b21702e.
|
@jasontedor lets merge this and I will look into improving this in master. I think the solution is good enough and we should make progress over perfection |
In Elasticsearch 5.3.0 a bug was introduced in the merging of default settings when the target setting existed as an array. When this bug concerns path.data and default.path.data, we ended up in a situation where the paths specified in both settings would be used to write index data. Since our packaging sets default.path.data, users that configure multiple data paths via an array and use the packaging are subject to having shards land in paths in default.path.data when that is very likely not what they intended. This commit is an attempt to rectify this situation. If path.data and default.path.data are configured, we check for the presence of indices there. If we find any, we log messages explaining the situation and fail the node. Relates #24099
In Elasticsearch 5.3.0 a bug was introduced in the merging of default settings when the target setting existed as an array. When this bug concerns path.data and default.path.data, we ended up in a situation where the paths specified in both settings would be used to write index data. Since our packaging sets default.path.data, users that configure multiple data paths via an array and use the packaging are subject to having shards land in paths in default.path.data when that is very likely not what they intended. This commit is an attempt to rectify this situation. If path.data and default.path.data are configured, we check for the presence of indices there. If we find any, we log messages explaining the situation and fail the node. Relates #24099
In Elasticsearch 5.3.0 a bug was introduced in the merging of default settings when the target setting existed as an array. When this bug concerns path.data and default.path.data, we ended up in a situation where the paths specified in both settings would be used to write index data. Since our packaging sets default.path.data, users that configure multiple data paths via an array and use the packaging are subject to having shards land in paths in default.path.data when that is very likely not what they intended. This commit is an attempt to rectify this situation. If path.data and default.path.data are configured, we check for the presence of indices there. If we find any, we log messages explaining the situation and fail the node. Relates #24099
In Elasticsearch 5.3.0 a bug was introduced in the merging of default settings when the target setting existed as an array. When this bug concerns path.data and default.path.data, we ended up in a situation where the paths specified in both settings would be used to write index data. Since our packaging sets default.path.data, users that configure multiple data paths via an array and use the packaging are subject to having shards land in paths in default.path.data when that is very likely not what they intended.
This commit is an attempt to rectify this situation. If path.data and default.path.data are configured, we check for the presence of indices there. If we find any, we log messages explaining the situation and fail the node.
Closes #23981, supersedes #24052, relates #24074, relates #24093