Skip to content

Conversation

@Gargi-jais11
Copy link
Contributor

@Gargi-jais11 Gargi-jais11 commented Aug 7, 2025

What changes were proposed in this pull request?

Generally, the tar zip file of container will have smaller size than it's real size, so reserve a 2* max container size is conservative enough in most cases. But there are container over-allocated case, where container size can be double or triple the max container size as @siddhantsangwan saw in user's environment, have a accurate container size can handle this case welly, and that info could be brought in

For backward compatibility, if that field unset, fallback to get the container size from config.

comments:

#8360 (comment)
#8360 (comment)
#8360 (comment)

What is the link to the Apache JIRA

https://issues.apache.org/jira/browse/HDDS-12998

How was this patch tested?

Updated existing UT.
TestPushReplicator
TestSendContainerRequestHandler
TestDownloadAndImportReplicator

@Gargi-jais11 Gargi-jais11 marked this pull request as ready for review August 7, 2025 09:18
@Gargi-jais11
Copy link
Contributor Author

@peterxcli can you please review it.

Copy link
Member

@peterxcli peterxcli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @Gargi-jais11 for this patch, left some comments, PTAL.

Also please could you try add some test coverage on your change? Thanks!

@Gargi-jais11
Copy link
Contributor Author

Thanks @Gargi-jais11 for this patch, left some comments, PTAL.

Also please could you try add some test coverage on your change? Thanks!

Ok

@peterxcli peterxcli self-requested a review August 28, 2025 10:20
Copy link
Member

@peterxcli peterxcli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry my previous idea might be wrong.

Copy link
Member

@peterxcli peterxcli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @Gargi-jais11 for the prompt update. Haven't looked into the test code, but the prod code looks good.
Left some comments/suggestions/questions, please take a look.

@peterxcli peterxcli requested review from peterxcli and removed request for peterxcli September 4, 2025 05:24
@peterxcli
Copy link
Member

please request my review whenever you think this is ready :)

@Gargi-jais11
Copy link
Contributor Author

Gargi-jais11 commented Sep 5, 2025

please request my review whenever you think this is ready :)

@peterxcli you can review the patch whenever you are free.

Copy link
Member

@peterxcli peterxcli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for updating the patch! Only some minor comments left.

Btw, could we add some IT coverage in TestContainerCoverage with sth like the below?

@ParameterizedTest
@EnumSource
void testPushWithReplicateSize(CopyContainerCompression compression) throws Exception {
  final int index = compression.ordinal();
  DatanodeDetails source = cluster.getHddsDatanodes().get(index).getDatanodeDetails();
  long containerID = createNewClosedContainer(source);
  DatanodeDetails target = selectOtherNode(source);
  ReplicateContainerCommand cmd = ReplicateContainerCommand.toTarget(containerID, target);
  cmd.setReplicateSize(2L * 1024 * 1024 * 1024); // example value
  queueAndWaitForCompletion(cmd, source, ReplicationSupervisor::getReplicationSuccessCount);
}

@ParameterizedTest
@EnumSource
void testPullWithReplicateSize(CopyContainerCompression compression) throws Exception {
  final int index = compression.ordinal();
  DatanodeDetails target = cluster.getHddsDatanodes().get(index).getDatanodeDetails();
  DatanodeDetails source = selectOtherNode(target);
  long containerID = createNewClosedContainer(source);
  ReplicateContainerCommand cmd =
      ReplicateContainerCommand.fromSources(containerID, ImmutableList.of(source));
  cmd.setReplicateSize(2L * 1024 * 1024 * 1024);
  queueAndWaitForCompletion(cmd, target, ReplicationSupervisor::getReplicationSuccessCount);
}

I guess the fallback path has been cover by the old tests in TestContainerReplication? so "with replicate size" tests should be enough.

@peterxcli
Copy link
Member

Then I think this is good to merge!

cc @ChenSammi @siddhantsangwan Would you like to take another look?

@Gargi-jais11
Copy link
Contributor Author

Btw, could we add some IT coverage in TestContainerCoverage with sth like the below?

Do you mean adding in TestContainerReplication correct?

@peterxcli
Copy link
Member

peterxcli commented Sep 10, 2025

Btw, could we add some IT coverage in TestContainerCoverage with sth like the below?

Do you mean adding in TestContainerReplication correct?

yes... sorry for the typo and thanks for the correction...

@Gargi-jais11
Copy link
Contributor Author

Btw, could we add some IT coverage in TestContainerCoverage with sth like the below?

Do you mean adding in TestContainerReplication correct?

yes... sorry for the type and thanks for the correction...

No issues, it happens.

Copy link
Contributor

@siddhantsangwan siddhantsangwan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any particular reason for sending the container's size from the SCM to the Datanode? It's simpler to just get the size from the Datanode's in-memory state.

@Gargi-jais11
Copy link
Contributor Author

Any particular reason for sending the container's size from the SCM to the Datanode? It's simpler to just get the size from the Datanode's in-memory state.

But I think how can we get size on Datanode side while choosing volume in the below place, so its better to send replicate size from SCM. Since in chhoseNextVolume we need to pass size in order to reserve space.

spaceToReserve = containerImporter.getSpaceToReserve(task.getReplicateSize());
try {
targetVolume = containerImporter.chooseNextVolume(spaceToReserve);
// Wait for the download. This thread pool is limiting the parallel

@siddhantsangwan
Copy link
Contributor

But I think how can we get size on Datanode side while choosing volume in the below place, so its better to send replicate size from SCM. Since in chhoseNextVolume we need to pass size in order to reserve space.

We don't need to make changes to pull replication (DownloadAndImportReplicator) because we don't use it anymore. We only need to support push replication. In push replication, the datanode that is sending the container can send the container's size to the datanode that's receiving the container. SCM doesn't need to send the size and that keeps this change simpler since it avoids making proto changes in the SCM to DN communication.

Moreover, datanode knows the correct size of the container. SCM's knowledge of the container's size is outdated if there have been block deletions and the size has reduced.

@Gargi-jais11
Copy link
Contributor Author

But I think how can we get size on Datanode side while choosing volume in the below place, so its better to send replicate size from SCM. Since in chhoseNextVolume we need to pass size in order to reserve space.

We don't need to make changes to pull replication (DownloadAndImportReplicator) because we don't use it anymore. We only need to support push replication. In push replication, the datanode that is sending the container can send the container's size to the datanode that's receiving the container. SCM doesn't need to send the size and that keeps this change simpler since it avoids making proto changes in the SCM to DN communication.

Moreover, datanode knows the correct size of the container. SCM's knowledge of the container's size is outdated if there have been block deletions and the size has reduced.

Thank you @siddhantsangwan for this information. I will do changes according to push replication.

@peterxcli
Copy link
Member

Thanks @siddhantsangwan for the explanation!

I will do changes according to push replication.

@Gargi-jais11 I think we almost there, we just need to

  1. remove the the newly introduced size from ReplicateContainerCommandProto
  2. inject ContainerController or OzoneContainer class instance into GrpcContainerUploader

@siddhantsangwan
Copy link
Contributor

inject ContainerController or OzoneContainer class instance into GrpcContainerUploader

Another way is to get the container's size in ReplicateContainerCommandHandler#handle and use a setter to set that in ReplicationTask.

@Gargi-jais11
Copy link
Contributor Author

inject ContainerController or OzoneContainer class instance into GrpcContainerUploader

Another way is to get the container's size in ReplicateContainerCommandHandler#handle and use a setter to set that in ReplicationTask.

Sorry, but I didn't saw this comment before changing the code. @siddhantsangwan and @peterxcli .
Can you please check is this way of bringing container's size is fine or not? Else I will do changes accordingly.

@Gargi-jais11
Copy link
Contributor Author

Here is an analysis of both the approaches to get container Size to Push Replicator.

  1. inject ContainerController or OzoneContainer class instance into GrpcContainerUploader

  • need to pass an instance of ContainerController to GrpcContainerUploader. Using which container size is determined and passed to SendContainerOutputStream.
  • In the first request containerSize is added to protobuf SendContainerRequest
@Override
  protected void sendPart(boolean eof, int length, ByteString data) {
    SendContainerRequest.Builder requestBuilder = SendContainerRequest.newBuilder()
        .setContainerID(getContainerId())
        .setData(data)
        .setOffset(getWrittenBytes())
        .setCompression(compression.toProto());
    
    // Include container size in the first request
    if (getWrittenBytes() == 0 && size != null) {
      requestBuilder.setSize(size);
    }
    getStreamObserver().onNext(requestBuilder.build());
  }
  • Using which SendContainerRequestHandler can simply pass the containerSize during first request to chooseNextVolume
// Use container size if available, otherwise fall back to default
        spaceToReserve = importer.getSpaceToReserve(
            req.hasSize() ? req.getSize() : null);

        volume = importer.chooseNextVolume(spaceToReserve);

Now considering Test analysis of this approach :-
It requires to only check for TestSendContainerRequestHandler that if different container size is passed than it properly allocates and releases space on the Target DN.

@Gargi-jais11
Copy link
Contributor Author

  1. Another way is to get the container's size in ReplicateContainerCommandHandler#handle and use a setter to set that in ReplicationTask.

  • Need to get Container Size using ReplicateContainerCommandHandler using the OzoneContainer instance already present and set it for ReplicateTask
  • Create setter and getter for containerSize in ReplicateTask and use this in PushReplicator to get size in it.
public void replicate(ReplicationTask task) {
// rest code....

try {
      Long containerSize = task.getContainerSize();
      output = new CountingOutputStream(
          uploader.startUpload(containerID, target, fut, compression, containerSize));
......
}
  • This causes interface modification of ContainerUploader#startUpload to include containerSize.
  • Then we can use this containerSize in the startupload method in GrpcContainerUploader and after that it follows the same way as above approach.

So this approach adds 3 steps extra which can be avoided. And for this we need to add multiple test cases for PushReplicator class as well.

@Gargi-jais11
Copy link
Contributor Author

Gargi-jais11 commented Sep 22, 2025

@siddhantsangwan and @peterxcli
I have provided the analysis of both approaches above. I prefer using first approach as it is adding container size is simple with single additional dependency (ContainerController).
Both approaches need to add containerSize through GrpcContainerUploader by passing it to SendContainerOutputStream, so I believe first approach will be better to add ContainerController instance to this class, rather than passing via ReplicateContainerCommandHandler.
Please have a look and correct me if I am wrong anywhere.

@siddhantsangwan
Copy link
Contributor

I have provided the analysis of both approaches above. I prefer using first approach as it is adding container size is simple with single additional dependency (ContainerController).
Both approaches need to add containerSize through GrpcContainerUploader by passing it to SendContainerOutputStream, so I believe first approach will be better to add ContainerController instance to this class, rather than passing via ReplicateContainerCommandHandler.
Please have a look and correct me if I am wrong anywhere.

I agree, based on this we can go ahead with the first approach.

@siddhantsangwan
Copy link
Contributor

@Gargi-jais11 it'd be good to also have some kind of integration testing (whichever way is the easiest) that ensures this change works as intended across two Datanodes.

@Gargi-jais11
Copy link
Contributor Author

@Gargi-jais11 it'd be good to also have some kind of integration testing (whichever way is the easiest) that ensures this change works as intended across two Datanodes.

Okay sure. I will add IT.

@siddhantsangwan
Copy link
Contributor

@Gargi-jais11 please avoid force pushing when possible because with force push reviewers can't know what all has changed since the last review.

Copy link
Contributor

@siddhantsangwan siddhantsangwan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly looks good, just a few minor comments.

Copy link
Contributor

@siddhantsangwan siddhantsangwan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, pending green CI.

@siddhantsangwan siddhantsangwan merged commit bb98cc2 into apache:master Oct 7, 2025
43 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants