Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
27 commits
Select commit Hold shift + click to select a range
8ba9e8d
HDDS-12421. ContainerReportHandler should not take the call to delete…
smengcl Feb 26, 2025
ff0f98f
If a DELETING or DELETED container has a non-empty replica, the conta…
siddhantsangwan Feb 27, 2025
ea69cb0
Revert "If a DELETING or DELETED container has a non-empty replica, t…
smengcl Feb 27, 2025
818e889
Address Ethan's comments:
smengcl Feb 27, 2025
aa76bc8
Checkstyle
smengcl Feb 27, 2025
f205a6d
Add test case for container state DELETED -> CLOSED transition.
smengcl Feb 27, 2025
647a400
Add test case for DELETED -> CLOSED in `TestContainerStateManager`, `…
smengcl Feb 27, 2025
dc9ac6b
Consider replica states OPEN, CLOSING, CLOSED, QUASI_CLOSED and UNHEA…
siddhantsangwan Feb 28, 2025
1bc4181
modify tests in TestContainerReportHandler
siddhantsangwan Feb 28, 2025
e5962e4
remove unused import
siddhantsangwan Feb 28, 2025
f1971c4
Address comments.
smengcl Feb 28, 2025
08d11c9
UT: Check if delete command is issued or not as expected at the end.
smengcl Mar 1, 2025
65a2876
Fix existing UT `ecContainerShouldTransitionFromDeletingToClosedWhenN…
smengcl Mar 1, 2025
fb77bae
Fix existing UT `testStaleReplicaOfDeletedContainer`, and added case …
smengcl Mar 1, 2025
c26e3df
Checkstyle
smengcl Mar 1, 2025
e854276
Fix `testResendDeleteCommand` (because with HDDS-12421, only empty co…
smengcl Mar 3, 2025
a132b80
Merge UT `ratisContainerShouldTransitionFromDeletingOrDeletedToClosed…
smengcl Mar 3, 2025
448a568
force delete for `UNKNOWN_CONTAINER_ACTION_DELETE`.
smengcl Mar 3, 2025
c7717d8
Merge remote-tracking branch 'asf/master' into HDDS-12421-crh-dont-de…
smengcl Mar 4, 2025
a027755
Fix `WhenEmptyReplica` test case, include EC test case parameterizati…
smengcl Mar 4, 2025
d1f96a5
Fix javadoc.
smengcl Mar 4, 2025
99a6601
empty test
swamirishi Mar 6, 2025
c94aff8
empty test
swamirishi Mar 6, 2025
f96df52
empty test
swamirishi Mar 7, 2025
849eb62
HDDS-12421. Remove ignored flag & restructure switch case
swamirishi Mar 6, 2025
e710734
HDDS-12421. Fix test case
swamirishi Mar 7, 2025
b221882
add test case to assert that DeletingContainerHandler does not send d…
siddhantsangwan Mar 7, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -245,7 +245,7 @@ private boolean updateContainerState(final DatanodeDetails datanode,

final ContainerID containerId = container.containerID();
boolean ignored = false;

boolean replicaIsEmpty = replica.hasIsEmpty() && replica.getIsEmpty();
switch (container.getState()) {
case OPEN:
/*
Expand Down Expand Up @@ -351,27 +351,29 @@ private boolean updateContainerState(final DatanodeDetails datanode,
* The container is already in closed state. do nothing.
*/
break;
case DELETED:
// If container is in DELETED state and the reported replica is empty, delete the empty replica.
// We should also do this for DELETING containers and currently DeletingContainerHandler does that
if (replicaIsEmpty) {
deleteReplica(containerId, datanode, publisher, "DELETED", false);
break;
}
case DELETING:
/*
HDDS-11136: If a DELETING container has a non-empty CLOSED replica, the container should be moved back to CLOSED
state.
* HDDS-11136: If a DELETING container has a non-empty CLOSED replica, the container should be moved back to
* CLOSED state.
*
* HDDS-12421: If a DELETING or DELETED container has a non-empty replica, the container should also be moved
* back to CLOSED state.
*/
boolean replicaStateAllowed = replica.getState() == State.CLOSED;
boolean replicaNotEmpty = replica.hasIsEmpty() && !replica.getIsEmpty();
Copy link
Contributor Author

@smengcl smengcl Feb 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@siddhantsangwan I have a question, do we have cases where replica.hasIsEmpty() == false ?

Right now, replicaIsEmpty would be false when the replica doesn't have the isEmpty field. So !replicaIsEmpty becomes true below and we effectively treat such replicas as non-empty (which is fine).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks to me like it is ok to omit the existence check for the field, but Siddhant should confirm.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a question, do we have cases where replica.hasIsEmpty() == false ?

I don't know either.

Right now, replicaIsEmpty would be false when the replica doesn't have the isEmpty field. So !replicaIsEmpty becomes true below and we effectively treat such replicas as non-empty (which is fine).

This behaviour seems fine to me, we shouldn't be deleting replicas if the isEmpty field is not present at all. Let's keep the current behaviour as it is.

if (replicaStateAllowed && replicaNotEmpty) {
logger.info("Moving DELETING container {} to CLOSED state, datanode {} reported replica with state={}, " +
"isEmpty={} and keyCount={}.", containerId, datanode.getHostName(), replica.getState(), false,
replica.getKeyCount());
containerManager.transitionDeletingToClosedState(containerId);
boolean replicaStateAllowed = (replica.getState() != State.INVALID && replica.getState() != State.DELETED);
Copy link
Contributor Author

@smengcl smengcl Mar 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@siddhantsangwan btw this change failed ecContainerShouldTransitionFromDeletingToClosedWhenNonEmptyClosedReplica:

2025-02-28 16:17:50,365 [main] INFO  container.ContainerReportHandler (AbstractContainerReportHandler.java:updateContainerState(374)) - Moving container ContainerInfo{id=#7540283255775332556, state=DELETING, stateEnterTime=2025-03-01T00:17:50.311Z, pipelineID=PipelineID=198dc161-f620-4be8-83ce-8125701a2f59, owner=TEST} from DELETING to CLOSED state, datanode localhost-130.203.243.102 reported replica with state=CLOSING, isEmpty=false, bcsId=0, keyCount=0, and origin=ead52188-b54d-43aa-8cf4-e9bc63230b88

Pls double check if this is the desired behavior. (Transitions DELETING EC container straight to CLOSED after receiving a CLOSING replica report, which sounds fine to me.)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should have the same behavior for EC and Ratis. A non-empty replica moves the container back to CLOSED.

if (!replicaIsEmpty && replicaStateAllowed) {
logger.info("Moving container {} from {} to CLOSED state, datanode {} reported replica with state={}, " +
"isEmpty={}, bcsId={}, keyCount={}, and origin={}",
container, container.getState(), datanode.getHostName(), replica.getState(),
replica.getIsEmpty(), replica.getBlockCommitSequenceId(), replica.getKeyCount(), replica.getOriginNodeId());
containerManager.transitionDeletingOrDeletedToClosedState(containerId);
}

break;
case DELETED:
/*
* The container is deleted. delete the replica.
*/
deleteReplica(containerId, datanode, publisher, "DELETED");
ignored = true;
break;
default:
break;
Expand Down Expand Up @@ -453,9 +455,9 @@ protected ContainerManager getContainerManager() {
}

protected void deleteReplica(ContainerID containerID, DatanodeDetails dn,
EventPublisher publisher, String reason) {
EventPublisher publisher, String reason, boolean force) {
SCMCommand<?> command = new DeleteContainerCommand(
containerID.getId(), true);
containerID.getId(), force);
try {
command.setTerm(scmContext.getTermOfLeader());
} catch (NotLeaderException nle) {
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -135,14 +135,14 @@ void updateContainerState(ContainerID containerID,
throws IOException, InvalidStateTransitionException;

/**
* Bypasses the container state machine to change a container's state from DELETING to CLOSED. This API was
* Bypasses the container state machine to change a container's state from DELETING or DELETED to CLOSED. This API was
* introduced to fix a bug (HDDS-11136), and should be used with care otherwise.
*
* @see <a href="https://issues.apache.org/jira/browse/HDDS-11136">HDDS-11136</a>
* @param containerID id of the container to transition
* @throws IOException
*/
void transitionDeletingToClosedState(ContainerID containerID) throws IOException;
void transitionDeletingOrDeletedToClosedState(ContainerID containerID) throws IOException;

/**
* Returns the latest list of replicas for given containerId.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -290,12 +290,12 @@ public void updateContainerState(final ContainerID cid,
}

@Override
public void transitionDeletingToClosedState(ContainerID containerID) throws IOException {
public void transitionDeletingOrDeletedToClosedState(ContainerID containerID) throws IOException {
HddsProtos.ContainerID proto = containerID.getProtobuf();
lock.lock();
try {
if (containerExist(containerID)) {
containerStateManager.transitionDeletingToClosedState(proto);
containerStateManager.transitionDeletingOrDeletedToClosedState(proto);
} else {
throwContainerNotFoundException(containerID);
}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -234,7 +234,7 @@ private void processSingleReplica(final DatanodeDetails datanodeDetails,
UNKNOWN_CONTAINER_ACTION_DELETE)) {
final ContainerID containerId = ContainerID
.valueOf(replicaProto.getContainerID());
deleteReplica(containerId, datanodeDetails, publisher, "unknown");
deleteReplica(containerId, datanodeDetails, publisher, "unknown", true);
}
return;
}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -159,15 +159,15 @@ void updateContainerState(HddsProtos.ContainerID id,


/**
* Bypasses the container state machine to change a container's state from DELETING to CLOSED. This API was
* Bypasses the container state machine to change a container's state from DELETING or DELETED to CLOSED. This API was
* introduced to fix a bug (HDDS-11136), and should be used with care otherwise.
*
* @see <a href="https://issues.apache.org/jira/browse/HDDS-11136">HDDS-11136</a>
* @param id id of the container to transition
* @throws IOException
*/
@Replicate
void transitionDeletingToClosedState(HddsProtos.ContainerID id) throws IOException;
void transitionDeletingOrDeletedToClosedState(HddsProtos.ContainerID id) throws IOException;

/**
*
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -370,16 +370,16 @@ public void updateContainerState(final HddsProtos.ContainerID containerID,
}

@Override
public void transitionDeletingToClosedState(HddsProtos.ContainerID containerID) throws IOException {
public void transitionDeletingOrDeletedToClosedState(HddsProtos.ContainerID containerID) throws IOException {
final ContainerID id = ContainerID.getFromProtobuf(containerID);

try (AutoCloseableLock ignored = writeLock(id)) {
if (containers.contains(id)) {
final ContainerInfo oldInfo = containers.getContainerInfo(id);
final LifeCycleState oldState = oldInfo.getState();
if (oldState != DELETING) {
if (oldState != DELETING && oldState != DELETED) {
throw new InvalidContainerStateException("Cannot transition container " + id + " from " + oldState +
" back to CLOSED. The container must be in the DELETING state.");
" back to CLOSED. The container must be in the DELETING or DELETED state.");
}
ExecutionUtil.create(() -> {
containers.updateState(id, oldState, CLOSED);
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,7 @@
import org.apache.hadoop.hdds.protocol.proto.HddsProtos;
import org.apache.hadoop.hdds.scm.container.ContainerID;
import org.apache.hadoop.hdds.scm.container.ContainerInfo;
import org.apache.hadoop.hdds.scm.container.ContainerReplica;
import org.apache.hadoop.hdds.scm.container.replication.ContainerCheckRequest;
import org.apache.hadoop.hdds.scm.container.replication.ContainerReplicaOp;
import org.apache.hadoop.hdds.scm.container.replication.ReplicationManager;
Expand Down Expand Up @@ -88,6 +89,7 @@ public boolean handle(ContainerCheckRequest request) {
//resend deleteCommand if needed
request.getContainerReplicas().stream()
.filter(r -> !pendingDelete.contains(r.getDatanodeDetails()))
.filter(ContainerReplica::isEmpty)
.forEach(rp -> {
try {
replicationManager.sendDeleteCommand(
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,7 @@

import static org.apache.hadoop.hdds.scm.exceptions.SCMException.ResultCodes.FAILED_TO_CHANGE_CONTAINER_STATE;

import com.google.common.annotations.VisibleForTesting;
import com.google.common.base.Preconditions;
import com.google.common.collect.ImmutableSet;
import java.util.Collections;
Expand Down Expand Up @@ -93,6 +94,11 @@ public ContainerStateMap() {
this.replicaMap = new ConcurrentHashMap<>();
}

@VisibleForTesting
public static Logger getLogger() {
return LOG;
}

/**
* Adds a ContainerInfo Entry in the ContainerStateMap.
*
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -42,6 +42,7 @@
import org.apache.hadoop.hdds.protocol.proto.HddsProtos.LifeCycleState;
import org.apache.hadoop.hdds.protocol.proto.HddsProtos.ReplicationFactor;
import org.apache.hadoop.hdds.scm.container.replication.ContainerReplicaPendingOps;
import org.apache.hadoop.hdds.scm.container.states.ContainerStateMap;
import org.apache.hadoop.hdds.scm.ha.SCMHAManager;
import org.apache.hadoop.hdds.scm.ha.SCMHAManagerStub;
import org.apache.hadoop.hdds.scm.ha.SequenceIdGenerator;
Expand All @@ -53,10 +54,15 @@
import org.apache.hadoop.hdds.utils.db.DBStoreBuilder;
import org.apache.hadoop.ozone.common.statemachine.InvalidStateTransitionException;
import org.apache.hadoop.ozone.container.common.SCMTestUtils;
import org.apache.ozone.test.GenericTestUtils;
import org.junit.jupiter.api.AfterEach;
import org.junit.jupiter.api.BeforeAll;
import org.junit.jupiter.api.BeforeEach;
import org.junit.jupiter.api.Test;
import org.junit.jupiter.api.io.TempDir;
import org.junit.jupiter.params.ParameterizedTest;
import org.junit.jupiter.params.provider.EnumSource;
import org.slf4j.event.Level;

/**
* Tests to verify the functionality of ContainerManager.
Expand All @@ -72,6 +78,12 @@ public class TestContainerManagerImpl {
private NodeManager nodeManager;
private ContainerReplicaPendingOps pendingOpsMock;

@BeforeAll
static void init() {
// Print container state transition logs
GenericTestUtils.setLogLevel(ContainerStateMap.getLogger(), Level.TRACE);
}

@BeforeEach
void setUp() throws Exception {
final OzoneConfiguration conf = SCMTestUtils.getConf(testDir);
Expand Down Expand Up @@ -131,9 +143,12 @@ void testUpdateContainerState() throws Exception {
assertEquals(LifeCycleState.CLOSED, containerManager.getContainer(cid).getState());
}

@Test
void testTransitionDeletingToClosedState() throws IOException, InvalidStateTransitionException {
// allocate OPEN Ratis and Ec containers, and do a series of state changes to transition them to DELETING
@ParameterizedTest
@EnumSource(value = HddsProtos.LifeCycleState.class,
names = {"DELETING", "DELETED"})
void testTransitionDeletingOrDeletedToClosedState(HddsProtos.LifeCycleState desiredState)
throws IOException, InvalidStateTransitionException {
// Allocate OPEN Ratis and Ec containers, and do a series of state changes to transition them to DELETING / DELETED
final ContainerInfo container = containerManager.allocateContainer(
RatisReplicationConfig.getInstance(
ReplicationFactor.THREE), "admin");
Expand Down Expand Up @@ -162,29 +177,39 @@ void testTransitionDeletingToClosedState() throws IOException, InvalidStateTrans
assertEquals(LifeCycleState.DELETING, containerManager.getContainer(cid).getState());
assertEquals(LifeCycleState.DELETING, containerManager.getContainer(ecCid).getState());

// DELETING -> CLOSED
containerManager.transitionDeletingToClosedState(cid);
containerManager.transitionDeletingToClosedState(ecCid);
if (desiredState == LifeCycleState.DELETED) {
// DELETING -> DELETED
containerManager.updateContainerState(cid, HddsProtos.LifeCycleEvent.CLEANUP);
containerManager.updateContainerState(ecCid, HddsProtos.LifeCycleEvent.CLEANUP);
assertEquals(LifeCycleState.DELETED, containerManager.getContainer(cid).getState());
assertEquals(LifeCycleState.DELETED, containerManager.getContainer(ecCid).getState());
}

// DELETING / DELETED -> CLOSED
containerManager.transitionDeletingOrDeletedToClosedState(cid);
containerManager.transitionDeletingOrDeletedToClosedState(ecCid);
// the containers should be back in CLOSED state now
assertEquals(LifeCycleState.CLOSED, containerManager.getContainer(cid).getState());
assertEquals(LifeCycleState.CLOSED, containerManager.getContainer(ecCid).getState());
}

@Test
void testTransitionDeletingToClosedStateAllowsOnlyDeletingContainers() throws IOException {
void testTransitionContainerToClosedStateAllowOnlyDeletingOrDeletedContainers() throws IOException {
// Negative test for Ratis/EC container OPEN -> CLOSED transition

// test for RATIS container
final ContainerInfo container = containerManager.allocateContainer(
RatisReplicationConfig.getInstance(
ReplicationFactor.THREE), "admin");
final ContainerID cid = container.containerID();
assertEquals(LifeCycleState.OPEN, containerManager.getContainer(cid).getState());
assertThrows(IOException.class, () -> containerManager.transitionDeletingToClosedState(cid));
assertThrows(IOException.class, () -> containerManager.transitionDeletingOrDeletedToClosedState(cid));

// test for EC container
final ContainerInfo ecContainer = containerManager.allocateContainer(new ECReplicationConfig(3, 2), "admin");
final ContainerID ecCid = ecContainer.containerID();
assertEquals(LifeCycleState.OPEN, containerManager.getContainer(ecCid).getState());
assertThrows(IOException.class, () -> containerManager.transitionDeletingToClosedState(ecCid));
assertThrows(IOException.class, () -> containerManager.transitionDeletingOrDeletedToClosedState(ecCid));
}

@Test
Expand Down
Loading