Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -428,6 +428,7 @@ protected void processContainer(ContainerInfo container,
* we have to resend close container command to the datanodes.
*/
if (state == LifeCycleState.CLOSING) {
setHealthStateForClosing(replicas, container, report);
for (ContainerReplica replica: replicas) {
if (replica.getState() != State.UNHEALTHY) {
sendCloseCommand(
Expand Down Expand Up @@ -1613,6 +1614,18 @@ private boolean isOpenContainerHealthy(
.allMatch(r -> compareState(state, r.getState()));
}

private void setHealthStateForClosing(Set<ContainerReplica> replicas,
ContainerInfo container,
ReplicationManagerReport report) {
if (replicas.size() == 0) {
report.incrementAndSample(HealthState.MISSING, container.containerID());
report.incrementAndSample(HealthState.UNDER_REPLICATED,
container.containerID());
report.incrementAndSample(HealthState.MIS_REPLICATED,
container.containerID());
Comment on lines +1622 to +1625
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is the container categorized in all of these states? Isn't MISSING enough?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In theory, MISSING could be enough but we wanted to have the same behavior as we have in Recon - this is way we have set all the states.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't MISSING enough?

This is also consistent with the replication manager detecting and reporting "MISSING" when the container is in the CLOSED state.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I see that we're also setting the container as mis replicated, under replicated and missing further down in that method for closed containers.

Looks like over time we've diverged from the original intention of the replication manager report:

 * This class is used by ReplicationManager. Each time ReplicationManager runs,
 * it creates a new instance of this class and increments the various counters
 * to allow for creating a report on the various container states within the
 * system. There is a counter for each LifeCycleState (open, closing, closed
 * etc) and the sum of each of the lifecycle state counters should equal the
 * total number of containers in SCM. Ie, each container can only be in one of
 * the Lifecycle states at any time.

It specifies that each container should only be in one state at a time.

I think we need to decide what will best help with debugging. For example, if a container is missing, it's naturally also mis replicated and under replicated. We can choose to count it only once as missing or we can count it in all three categories, but that needs to be done consistently everywhere.

The new RM does not count a missing container as mis replicated, but it does count it as under replicated in RatisReplicationCheckHandler. This is because it considers mis replication only when there is no under/over replication.

}
}

public boolean isContainerReplicatingOrDeleting(ContainerID containerID) {
return inflightReplication.containsKey(containerID) ||
inflightDeletion.containsKey(containerID);
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -397,6 +397,62 @@ public void testClosingContainer() throws IOException, TimeoutException {
Assertions.assertEquals(1, report.getStat(LifeCycleState.CLOSING));
}

/**
* Create closing container with 1 replica.
* Expectation: Missing containers 0.
* Remove the only replica.
* Expectation: Missing containers 1.
*/
@Test
public void testClosingMissingContainer()
throws IOException, TimeoutException {
final ContainerInfo container = getContainer(LifeCycleState.CLOSING);
final ContainerID id = container.containerID();

containerStateManager.addContainer(container.getProtobuf());

// One replica in OPEN state
final Set<ContainerReplica> replicas = getReplicas(id, State.OPEN,
randomDatanodeDetails());

for (ContainerReplica replica : replicas) {
containerStateManager.updateContainerReplica(id, replica);
}

final int currentCloseCommandCount = datanodeCommandHandler
.getInvocationCount(SCMCommandProto.Type.closeContainerCommand);

replicationManager.processAll();
eventQueue.processAll(1000);
Assertions.assertEquals(currentCloseCommandCount + 1,
datanodeCommandHandler.getInvocationCount(
SCMCommandProto.Type.closeContainerCommand));

ReplicationManagerReport report = replicationManager.getContainerReport();
Assertions.assertEquals(1, report.getStat(LifeCycleState.CLOSING));
Assertions.assertEquals(0, report.getStat(
ReplicationManagerReport.HealthState.MISSING));

for (ContainerReplica replica : replicas) {
containerStateManager.removeContainerReplica(id, replica);
}

replicationManager.processAll();
eventQueue.processAll(1000);
Assertions.assertEquals(currentCloseCommandCount + 1,
datanodeCommandHandler.getInvocationCount(
SCMCommandProto.Type.closeContainerCommand));

report = replicationManager.getContainerReport();
Assertions.assertEquals(1, report.getStat(LifeCycleState.CLOSING));
Assertions.assertEquals(1, report.getStat(
ReplicationManagerReport.HealthState.MISSING));
Assertions.assertEquals(1, report.getStat(
ReplicationManagerReport.HealthState.UNDER_REPLICATED));
Assertions.assertEquals(1, report.getStat(
ReplicationManagerReport.HealthState.MIS_REPLICATED));
}

@Test
public void testReplicateCommandTimeout()
throws IOException, TimeoutException {
Expand Down