-
Notifications
You must be signed in to change notification settings - Fork 588
HDDS-5916. Datanodes stuck in leader election in Kubernetes #3186
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
84b56f8
cb4adb9
e51aa4b
50ded1c
5c9addf
a7e47b7
71f1b28
d7dbf36
9452d81
b89eed1
e346e8b
878f989
b807db6
424c788
e6156e6
f94bb3d
c7993ae
52d20d2
aa2fb97
627a568
4404106
8bbca39
aeeee27
db20d84
62c46eb
1a81703
2c5f812
27f2d76
7941838
7ce0077
e742158
d8f4822
801e856
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -125,7 +125,7 @@ private void persistContainerDatanodeDetails() { | |
| File idPath = new File(dataNodeIDPath); | ||
| DatanodeDetails datanodeDetails = this.context.getParent() | ||
| .getDatanodeDetails(); | ||
| if (datanodeDetails != null && !idPath.exists()) { | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. What's the motivation for dropping this check?
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is because when the datanode got restarted in k8s, the IP will be changed. So the original info in this file is not accurate any more. This will make sure we update with the latest info. And when we are not using k8s, I think it is not harmful to always update this file whenever the node restarts. |
||
| if (datanodeDetails != null) { | ||
| try { | ||
| ContainerUtils.writeDatanodeDetailsTo(datanodeDetails, idPath); | ||
| } catch (IOException ex) { | ||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -26,6 +26,9 @@ | |
| import java.util.stream.Collectors; | ||
| import java.util.stream.Stream; | ||
|
|
||
| import com.google.gson.ExclusionStrategy; | ||
| import com.google.gson.FieldAttributes; | ||
| import org.apache.hadoop.hdds.scm.net.NodeImpl; | ||
| import org.apache.hadoop.util.StringUtils; | ||
| import org.apache.hadoop.util.Time; | ||
|
|
||
|
|
@@ -58,10 +61,29 @@ public class EventQueue implements EventPublisher, AutoCloseable { | |
|
|
||
| private boolean isRunning = true; | ||
|
|
||
| private static final Gson TRACING_SERIALIZER = new GsonBuilder().create(); | ||
| private static final Gson TRACING_SERIALIZER = new GsonBuilder() | ||
| .setExclusionStrategies(new DatanodeDetailsGsonExclusionStrategy()) | ||
| .create(); | ||
|
|
||
| private boolean isSilent = false; | ||
|
|
||
| // The field parent in DatanodeDetails class has the circular reference | ||
| // which will result in Gson infinite recursive parsing. We need to exclude | ||
| // this field when generating json string for DatanodeDetails object | ||
| static class DatanodeDetailsGsonExclusionStrategy | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This change can be merged as a quick PR and not wait on this PR. |
||
| implements ExclusionStrategy { | ||
| @Override | ||
| public boolean shouldSkipField(FieldAttributes f) { | ||
| return f.getDeclaringClass() == NodeImpl.class | ||
| && f.getName().equals("parent"); | ||
| } | ||
|
|
||
| @Override | ||
| public boolean shouldSkipClass(Class<?> aClass) { | ||
| return false; | ||
| } | ||
| } | ||
|
|
||
| /** | ||
| * Add new handler to the event queue. | ||
| * <p> | ||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -56,6 +56,7 @@ public NewNodeHandler(PipelineManager pipelineManager, | |
| public void onMessage(DatanodeDetails datanodeDetails, | ||
| EventPublisher publisher) { | ||
| try { | ||
| pipelineManager.closeStalePipelines(datanodeDetails); | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. is closeStalePipelines necessary here? Since when SCM processes the register command, it should be able to distinguish the new node / updated node, and here should be only responsible for the new node case
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yeah. This is necessary. I believe in my testing, if a datanode is dead for a long time, SCM will remove it from the registration list. When the node comes up with a different IP, it first registers with SCM, and SCM treat it as a new node. But the old pipeline with the old IPs may still be there. Another way to achieve this is to delete the pipelines if SCM is going to remove the dead nodes. But I am not that familiar with this part of the code. I may need to have a further look.
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. According to your implementation, when the node comes up with a different IP, it will register first with SCM, SCM node manager will get it as long as its UUID is not changed through
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Sorry. I think I misstated my case. You are right. When a datanode is dead for a long time, SCM actually won't remove it from its registration list. So when this node with the same uuid comes up again with different IP, it will fall to update address condition, instead of registering new node. However, there is another case. If the SCM also restarts, then it will lose all its in memory node registration map, but it still have all the old pipelines since pipelines are read from persistent. So in this case, if the datanode changes its IP, and come to register with SCM, SCM will treat it as a new node instead of a known node with different IP. So in this case, we still need to close all the stale pipelines which has the old IPs for this datanode. Please let me know if my above statement makes sense to you. Thanks!
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Make sense, thx for the explanation. |
||
| serviceManager.notifyEventTriggered(Event.NEW_NODE_HANDLER_TRIGGERED); | ||
|
|
||
| if (datanodeDetails.getPersistedOpState() | ||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,69 @@ | ||
| /** | ||
| * Licensed to the Apache Software Foundation (ASF) under one or more | ||
| * contributor license agreements. See the NOTICE file distributed with this | ||
| * work for additional information regarding copyright ownership. The ASF | ||
| * licenses this file to you under the Apache License, Version 2.0 (the | ||
| * "License"); you may not use this file except in compliance with the License. | ||
| * You may obtain a copy of the License at | ||
| * <p> | ||
| * http://www.apache.org/licenses/LICENSE-2.0 | ||
| * <p> | ||
| * Unless required by applicable law or agreed to in writing, software | ||
| * distributed under the License is distributed on an "AS IS" BASIS, WITHOUT | ||
| * WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the | ||
| * License for the specific language governing permissions and limitations under | ||
| * the License. | ||
| */ | ||
| package org.apache.hadoop.hdds.scm.node; | ||
|
|
||
| import org.apache.hadoop.hdds.protocol.DatanodeDetails; | ||
| import org.apache.hadoop.hdds.scm.ha.SCMService; | ||
| import org.apache.hadoop.hdds.scm.ha.SCMServiceManager; | ||
| import org.apache.hadoop.hdds.scm.node.states.NodeNotFoundException; | ||
| import org.apache.hadoop.hdds.scm.pipeline.PipelineManager; | ||
| import org.apache.hadoop.hdds.server.events.EventHandler; | ||
| import org.apache.hadoop.hdds.server.events.EventPublisher; | ||
| import org.slf4j.Logger; | ||
| import org.slf4j.LoggerFactory; | ||
|
|
||
| /** | ||
| * Handles datanode ip or hostname change event. | ||
| */ | ||
| public class NodeAddressUpdateHandler | ||
| implements EventHandler<DatanodeDetails> { | ||
| private static final Logger LOG = | ||
| LoggerFactory.getLogger(NodeAddressUpdateHandler.class); | ||
|
|
||
| private final PipelineManager pipelineManager; | ||
| private final NodeDecommissionManager decommissionManager; | ||
| private final SCMServiceManager serviceManager; | ||
|
|
||
| public NodeAddressUpdateHandler(PipelineManager pipelineManager, | ||
| NodeDecommissionManager | ||
| decommissionManager, | ||
| SCMServiceManager serviceManager) { | ||
| this.pipelineManager = pipelineManager; | ||
| this.decommissionManager = decommissionManager; | ||
| this.serviceManager = serviceManager; | ||
| } | ||
|
|
||
| @Override | ||
| public void onMessage(DatanodeDetails datanodeDetails, | ||
| EventPublisher publisher) { | ||
| try { | ||
| LOG.info("Closing stale pipelines for datanode: {}", datanodeDetails); | ||
| pipelineManager.closeStalePipelines(datanodeDetails); | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Is closing pipelines necessary even if using datanode hostname instead of IP? If two or three datanodes of some pipeline are restarted, a new pipeline is created as they register, and the first one or two pipelines are then closed very soon.
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
The problem is that pipeline is using IP address. The datanode host name config only impact the ratis address. Ideally I think we should use hostname in pipeline. However, pipeline info is kept in persistent, if we we change the code to only support the hostname, then it will be not backward compatible with the pipelines already in the DB. If we decide to support both hostname and IP address, then the logic would be more complex: we should figure out one IP is corresponding to one hostname so that pipeline will not be created duplicately. Use hostname in pipeline is a big change. We need to consider more about it. Therefore, I think it should be another PR if we decide to make the change.
Is this a question or statement?
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It's a statement based on behavior I saw while checking kubernetes test results where 3 datanodes are restarted. See the log above. Pipeline But I guess quick restarts are less realistic in non-test environment, so it may be OK for now.
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think the above log is before the restarting. After restarting, the right order is that: first, datanode update its IP address in SCM; second, SCM closes all pipelines belonging to this datanode's old IP address; third, SCM create new pipeline(s) for this datanode with the new IP
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes, pipeline
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Ah, I got what you mean. This is because the restarts are so quick that SCM thinks these 3 datanodes keep alive. So when the first DN re-registers and tries to create the new pipeline, SCM thinks the other two DN are still good, and thus uses the old info (because the other two DNs have not re-registered yet). Same thing happens when the 2nd DN re-registeres, where SCM thinks the 3rd DN is good and uses its old info. For quick restarts, I do not think there is a good way for SCM to recognize these DN's are dead/restarted. Therefore, we rely on the logic of this PR for waiting ALLOCATED pipelines to be OPEN. In such way, the block allocation can still get the correct pipeline eventually. |
||
| serviceManager.notifyEventTriggered(SCMService.Event | ||
| .NODE_ADDRESS_UPDATE_HANDLER_TRIGGERED); | ||
|
|
||
| decommissionManager.continueAdminForNode(datanodeDetails); | ||
| } catch (NodeNotFoundException e) { | ||
| // Should not happen, as the node has just registered to call this event | ||
| // handler. | ||
| LOG.error( | ||
| "NodeNotFound when updating the node Ip or host name to the " + | ||
| "decommissionManager", | ||
| e); | ||
| } | ||
| } | ||
| } | ||
Uh oh!
There was an error while loading. Please reload this page.