MAPREDUCE-7407. Avoid stopContainer() on dead node#4779
Conversation
|
🎊 +1 overall
This message was automatically generated. |
| // it is not necessary and could cost 15 minutes delay if the node is dead. | ||
| if (event.getType() == EventType.CONTAINER_REMOTE_CLEANUP && | ||
| !containers.containsKey(containerID)) { | ||
| LOG.info("Skip cleanup of already-removed container " + containerID); |
There was a problem hiding this comment.
As we are at it, fix this in line 382 too.
There was a problem hiding this comment.
Ack, Will make this change.
| // CONTAINER_REMOTE_LAUNCH event handling. | ||
| // Skip kill() such container during CONTAINER_REMOTE_CLEANUP as | ||
| // it is not necessary and could cost 15 minutes delay if the node is dead. | ||
| if (event.getType() == EventType.CONTAINER_REMOTE_CLEANUP && |
There was a problem hiding this comment.
This should be done in the switch case CONTAINER_REMOTE_CLEANUP
There was a problem hiding this comment.
I am doing it outside the switch case because in this case I dont want it to go to getContainer code which is before the switch case check and getContainer will end up giving the a new container and which will evenually fail this check - !containers.containsKey(containerID)
Container c = getContainer(event);
private Container getContainer(ContainerLauncherEvent event) {
ContainerId id = event.getContainerID();
Container c = containers.get(id);
if(c == null) {
System.out.println("entering null");
c = new Container(event.getTaskAttemptID(), event.getContainerID(),
event.getContainerMgrAddress());
Container old = containers.putIfAbsent(id, c);
if(old != null) {
c = old;
}
}
return c;
}
There was a problem hiding this comment.
I have made small changes to handle the above case and addressing your comment by moving it inside switch case CONTAINER_REMOTE_CLEANUP
|
🎊 +1 overall
This message was automatically generated. |
| @Override | ||
| public void run() { | ||
| LOG.info("Processing the event " + event.toString()); | ||
| LOG.info("Processing the event {}", event.toString()); |
There was a problem hiding this comment.
You don't need the toString() as logger does it for you.
There was a problem hiding this comment.
Thanks. Made the change.
| verify(mockCM).startContainers(any(StartContainersRequest.class)); | ||
|
|
||
| LOG.info("inserting cleanup event"); | ||
| ContainerLauncherEvent mockCleanupEvent2 = |
There was a problem hiding this comment.
Another nit, now the line char limit is 100 characters; we can make these lines into one.
There was a problem hiding this comment.
Makes sense, addressed.
|
🎊 +1 overall
This message was automatically generated. |
|
🎊 +1 overall
This message was automatically generated. |
|
🎊 +1 overall
This message was automatically generated. |
|
Thanks @goiri for final review and merging. |
Description of PR
If a container failed to launch earlier due to terminated instances, it has already been removed from the container hash map. Avoiding the kill() for CONTAINER_REMOTE_CLEANUP will avoid wasting 15min per container on retries/timeout.
JIRA - MAPREDUCE-7407
For code changes:
LICENSE,LICENSE-binary,NOTICE-binaryfiles?