HDDS-4474. [Ozone-Streaming] Use WriteSmallFile to write small file. #2860

guohao-rosicky · 2021-11-22T11:55:42Z

On the client side implement SmallFileDataStreamOutput, used to process the upload request less than ChunkSize：
SmallFileDataStreamOutput use ContainerProtos.PutSmallFileRequestProto write data and metadata；
Implement new SmallFileStreamDataChannel on Datanode, used to process Smallfile requests

jira: https://issues.apache.org/jira/browse/HDDS-4474

guohao-rosicky · 2021-11-23T04:47:34Z

@szetszwo @captainzmc please take a look,

PutSmallFile contains writeChunk and putBlock, which is transmitted to Datanode through stream write.

There is no applyTransaction that goes through containerStateMachine.

Please check my PR，I don't know how to get the BCSID.

szetszwo

@guohao-rosicky , thanks for the update. The change look mostly good. Some comments:

Let's move the common code KeyValueStreamDataChannel and SmallFileStreamDataChannel to a common base class, say StreamDataChannelBase.
Please add some tests.

See also https://issues.apache.org/jira/secure/attachment/13036532/2860_review.patch .

szetszwo · 2021-11-24T07:01:50Z

...rc/main/java/org/apache/hadoop/ozone/container/keyvalue/impl/SmallFileStreamDataChannel.java

Move randomAccessFile, file, containerData and metrics and the shared code to a new base class.

szetszwo · 2021-11-24T07:35:49Z

...n/java/org/apache/hadoop/ozone/container/common/transport/server/ratis/LocalDataChannel.java

Let's call it StreamDataChannel.

szetszwo · 2021-11-24T07:59:22Z

hadoop-hdds/interface-client/src/main/proto/DatanodeClientProtocol.proto

Why changing data to optional?

Data in PutSmallFileRequestProto is null in setupStream

ContainerProtos.PutSmallFileRequestProto putSmallFileRequest = ContainerProtos.PutSmallFileRequestProto.newBuilder() .setChunkInfo(chunk) .setBlock(createBlockRequest) // data is empty .build(); String id = xceiverClient.getPipeline().getFirstNode().getUuidString(); ContainerProtos.ContainerCommandRequestProto.Builder builder = ContainerProtos.ContainerCommandRequestProto.newBuilder() .setCmdType(ContainerProtos.Type.StreamInit) .setContainerID(blockID.get().getContainerID()) .setDatanodeUuid(id) .setPutSmallFile(putSmallFileRequest);

I see. Thanks.

is this backward compatible?

Yes, changing required to optional is backward compatible since the old code always provides the value and the new code can handle it.

However, it is not forward compatible since the old code may not be able to handle the case that the value is missing.

szetszwo · 2021-11-24T13:47:15Z

@guohao-rosicky, The change looks good but the new TestSmallFileDataStreamOutput failed. Please take a look. Thanks.

captainzmc · 2021-11-25T08:28:18Z

Thanks @guohao-rosicky for the contribution,
I want to know why we don't use async write API to write small files? The async write API should use fewer RPC calls when writing small files.

szetszwo · 2021-11-26T02:34:28Z

@captainzmc , this change is an optimization of createStreamKey(..) and createStreamFile(..) in RpcClient for the case that the given size is smaller than the chunk size.

Using the Async API will have fewer RPC calls and we should implement it. However, the data have to go through the leader so that the network path won't be optimal and the leader may become a hotspot.

captainzmc · 2021-11-26T03:03:52Z

However, the data have to go through the leader so that the network path won't be optimal and the leader may become a hotspot.

Thanks @szetszwo for your explanation, agree with you. let's using Streaming instead of async write.

szetszwo

@guohao-rosicky , we need support zero buffer copying so that we should not put the data inside PutSmallFileRequestProto. We should send the header and then send the raw data in the stream.

szetszwo · 2021-11-29T13:02:08Z

...rc/main/java/org/apache/hadoop/ozone/container/keyvalue/impl/SmallFileStreamDataChannel.java

This method copies buffer twice so that it is not zero buffer copying.

szetszwo · 2021-11-29T13:11:21Z

...-ozone/client/src/main/java/org/apache/hadoop/ozone/client/io/SmallFileDataStreamOutput.java

We cannot call array() since the ByteBuffer may not have an array.

Also, using an array means there must be buffer copying inside the code. This is the reason that we use ByteBuffer but not byte[].

szetszwo · 2021-11-29T13:16:29Z

...-ozone/client/src/main/java/org/apache/hadoop/ozone/client/io/SmallFileDataStreamOutput.java

Do not set the data inside the proto. We should send only the header in the proto and then send the raw data to the stream.

bshashikant · 2021-12-02T15:31:43Z

@guohao-rosicky , can you please rebase?

guohao-rosicky · 2021-12-15T09:53:57Z

@guohao-rosicky , can you please rebase?

done

captainzmc

Thanks @guohao-rosicky for the update. The change looks good.

guohao-rosicky · 2022-01-20T12:29:33Z

@szetszwo @captainzmc please take a look.

szetszwo · 2022-01-22T15:30:16Z

@guohao-rosicky , the change grew from 13kB to 65kB so that it becomes hard to review. The SmallFileDataStreamOutput class is really long.

How about we move the refactoring to a separated JIRA?

guohao-rosicky · 2022-01-24T04:57:54Z

@guohao-rosicky , the change grew from 13kB to 65kB so that it becomes hard to review. The SmallFileDataStreamOutput class is really long.

How about we move the refactoring to a separated JIRA?

How about splitting it into two parts @szetszwo

Data Channle abstraction on the server
write small file (HDDS-4474)

szetszwo · 2022-01-24T08:26:25Z

@guohao-rosicky , sure, please do it. Thanks.

guohao-rosicky · 2022-03-22T02:11:59Z

@szetszwo please take a look. Thanks.

captainzmc · 2022-03-23T04:43:54Z

The CI run was successful. @szetszwo Could you help take another look？

szetszwo · 2022-03-23T09:08:10Z

@captainzmc , @guohao-rosicky , sure, I am reviewing this.

szetszwo

With HDDS-6137, we may not need to add much code to implement WriteSmallFile. The data is buffered at the client side. When the data size is small, all the data can be sent in a single write call with close. What do you think?

hadoop-hdds/client/src/main/java/org/apache/hadoop/hdds/scm/storage/StreamRoutingTable.java

szetszwo · 2022-03-23T12:01:54Z

hadoop-hdds/common/src/test/java/org/apache/hadoop/ozone/container/ContainerTestHelper.java

+  public static byte[] getFixedLengthBytes(int length) {
+    byte[] bytes = new byte[length];
+    Random random = new Random();
+    random.nextBytes(bytes);
+    return bytes;
+  }
+


Use ThreadLocalRandom and support non-random data as below:

public static byte[] generateData(int length, boolean random) { final byte[] data = new byte[length]; if (random) { ThreadLocalRandom.current().nextBytes(data); } else { for (int i = 0; i < length; i++) { data[i] = (byte) i; } } return data; }

szetszwo · 2022-03-23T12:06:52Z

...a/org/apache/hadoop/ozone/container/common/transport/server/ratis/ContainerStateMachine.java

+  private CompletableFuture<ContainerCommandResponseProto> link(
+      LogEntryProto entry, SmallFileStreamDataChannel smallFileChannel) {
+    return CompletableFuture.supplyAsync(() -> {
+      final DispatcherContext context = new DispatcherContext.Builder()
+          .setTerm(entry.getTerm())
+          .setLogIndex(entry.getIndex())
+          .setStage(DispatcherContext.WriteChunkStage.COMMIT_DATA)
+          .setContainer2BCSIDMap(container2BCSIDMap)
+          .build();
+
+      return runCommand(smallFileChannel.getPutBlockRequest(), context);
    }, executor);


Pass ContainerCommandRequestProto instead and rename it to runCommandAsync(..)

private CompletableFuture<ContainerCommandResponseProto> runCommandAsync( ContainerCommandRequestProto requestProto, LogEntryProto entry) { return CompletableFuture.supplyAsync(() -> { final DispatcherContext context = new DispatcherContext.Builder() .setTerm(entry.getTerm()) .setLogIndex(entry.getIndex()) .setStage(DispatcherContext.WriteChunkStage.COMMIT_DATA) .setContainer2BCSIDMap(container2BCSIDMap) .build(); return runCommand(requestProto, context); }, executor); }

szetszwo · 2022-03-23T12:19:10Z

...-ozone/client/src/main/java/org/apache/hadoop/ozone/client/io/SmallFileDataStreamOutput.java

+ * <p>
+ * TODO : currently not support multi-thread access.
+ */
+public class SmallFileDataStreamOutput implements ByteBufferStreamOutput {


It seems that the code in this class is copied from BlockDataStreamOutputEntryPool and KeyDataStreamOutput. We should reuse the code but not copy them. Otherwise, it is very hard to maintain.

szetszwo

@guohao-rosicky , have you seen this comment #2860 (review) ?

guohao-rosicky · 2022-03-24T09:45:30Z

@guohao-rosicky , have you seen this comment #2860 (review) ?

@szetszwo
KeyDataStreamOutput can optionally not specify the data size, SmallFileStreamOutput can be retained, what do you think

szetszwo · 2022-03-24T09:54:23Z

KeyDataStreamOutput can optionally not specify the data size, SmallFileStreamOutput can be retained, ...

That's why KeyDataStreamOutput is more powerful than SmallFileStreamOutput since it works even if the data size is unknown.

guohao-rosicky · 2022-03-24T09:57:05Z

KeyDataStreamOutput can optionally not specify the data size, SmallFileStreamOutput can be retained, ...

That's why KeyDataStreamOutput is more powerful than SmallFileStreamOutput since it works even if the data size is unknown.

@szetszwo Ok, how can we do the following work better? Does this PR code help us to achieve this function? I can split it.

szetszwo · 2022-03-24T10:22:35Z

..., how can we do the following work better? Does this PR code help us to achieve this function? I can split it.

Yes, I actually suggest you to split and move the common code to #3195 in this comment #3195 (comment)

captainzmc · 2022-04-11T07:14:14Z

Hi @szetszwo , will you take this task? If so, I will close the PR.

szetszwo · 2022-04-11T12:32:31Z

@captainzmc , I was thinking to support a BufferedDataStreamOutput in Ratis (similar to java.io.BufferedOutputStream). Then, Ozone could use it. It will require only a small change in Ozone. We may close this pull request and one a new pull request later.

captainzmc · 2022-04-11T12:44:32Z

@captainzmc , I was thinking to support a BufferedDataStreamOutput in Ratis (similar to java.io.BufferedOutputStream). Then, Ozone could use it. It will require only a small change in Ozone. We may close this pull request and one a new pull request later.

OK, let's close this PR.

guohao-rosicky · 2022-04-12T15:10:00Z

@captainzmc , I was thinking to support a BufferedDataStreamOutput in Ratis (similar to java.io.BufferedOutputStream). Then, Ozone could use it. It will require only a small change in Ozone. We may close this pull request and one a new pull request later.

Hi @szetszwo , Is it this ratis Jira? https://issues.apache.org/jira/browse/RATIS-1157

szetszwo requested changes Nov 24, 2021

View reviewed changes

szetszwo requested changes Nov 29, 2021

View reviewed changes

szetszwo force-pushed the HDDS-4454 branch from c117885 to 320ae54 Compare November 30, 2021 09:10

guohao-rosicky force-pushed the HDDS-4454-small-file branch from 0501406 to 6f1e274 Compare December 15, 2021 09:51

szetszwo force-pushed the HDDS-4454 branch from 99ddf5a to d1e4eea Compare December 15, 2021 13:17

captainzmc force-pushed the HDDS-4454 branch from d1e4eea to 89a12ad Compare December 20, 2021 06:26

guohao-rosicky force-pushed the HDDS-4454-small-file branch from 6f1e274 to a8600bb Compare December 22, 2021 03:52

guohao-rosicky requested review from bshashikant and szetszwo December 22, 2021 03:57

guohao-rosicky force-pushed the HDDS-4454-small-file branch from a8600bb to 89a8a2a Compare December 22, 2021 04:28

guohao-rosicky force-pushed the HDDS-4454-small-file branch from 75c37a0 to f86291c Compare December 30, 2021 03:35

szetszwo force-pushed the HDDS-4454 branch from 7ce4e01 to ab016f3 Compare December 30, 2021 07:59

guohao-rosicky force-pushed the HDDS-4454-small-file branch 3 times, most recently from 8ce34b3 to b0a5fe5 Compare January 4, 2022 02:18

szetszwo force-pushed the HDDS-4454 branch from 1ceae85 to e8af582 Compare January 20, 2022 09:14

guohao-rosicky force-pushed the HDDS-4454-small-file branch from dd4d27d to 6537d0d Compare January 20, 2022 10:11

captainzmc approved these changes Jan 20, 2022

View reviewed changes

guohao-rosicky mentioned this pull request Jan 27, 2022

HDDS-6229. [Ozone-Streaming] Data Channel abstraction on datanode #3023

Merged

guohao-rosicky force-pushed the HDDS-4454-small-file branch from 6537d0d to b0a7307 Compare March 15, 2022 11:48

szetszwo force-pushed the HDDS-4454 branch from b6f1921 to f826d12 Compare March 16, 2022 17:30

szetszwo and others added 2 commits March 17, 2022 10:20

HDDS-6461. Update Ratis version to 2.3.0-da5d868-SNAPSHOT. (apache#3205)

eaef7ea

HDDS-4474. [Ozone-Streaming] Use WriteSmallFile to write small file.

76096cf

guohao-rosicky force-pushed the HDDS-4454-small-file branch from aa8903d to 76096cf Compare March 17, 2022 04:07

guohao-rosicky added 4 commits March 18, 2022 17:40

fix bug

69edff5

trigger new CI

005ec01

trigger new CI

1bba3a8

trigger new CI

97ddf53

captainzmc requested review from szetszwo and removed request for bshashikant March 21, 2022 10:40

szetszwo requested changes Mar 23, 2022

View reviewed changes

szetszwo mentioned this pull request Mar 23, 2022

HDDS-6137. [Ozone-Streaming] Refactor KeyDataStreamOutput. #3195

Merged

code review

37accc0

guohao-rosicky requested a review from szetszwo March 24, 2022 07:21

szetszwo reviewed Mar 24, 2022

View reviewed changes

szetszwo force-pushed the HDDS-4454 branch from eaef7ea to b9d36ba Compare March 24, 2022 12:41

guohao-rosicky mentioned this pull request Mar 28, 2022

HDDS-6500. [Ozone-Streaming] Buffer the PutBlockRequest at the end of the stream. #3229

Merged

captainzmc closed this Apr 11, 2022

HDDS-4474. [Ozone-Streaming] Use WriteSmallFile to write small file. #2860

HDDS-4474. [Ozone-Streaming] Use WriteSmallFile to write small file. #2860

Uh oh!

Conversation

guohao-rosicky commented Nov 22, 2021

Uh oh!

guohao-rosicky commented Nov 23, 2021

Uh oh!

szetszwo left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

guohao-rosicky Nov 24, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

szetszwo commented Nov 24, 2021

Uh oh!

captainzmc commented Nov 25, 2021

Uh oh!

szetszwo commented Nov 26, 2021

Uh oh!

captainzmc commented Nov 26, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

szetszwo left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

bshashikant commented Dec 2, 2021

Uh oh!

guohao-rosicky commented Dec 15, 2021

Uh oh!

captainzmc left a comment

Choose a reason for hiding this comment

Uh oh!

guohao-rosicky commented Jan 20, 2022

Uh oh!

szetszwo commented Jan 22, 2022

Uh oh!

guohao-rosicky commented Jan 24, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

szetszwo commented Jan 24, 2022

Uh oh!

guohao-rosicky commented Mar 22, 2022

Uh oh!

captainzmc commented Mar 23, 2022

Uh oh!

szetszwo commented Mar 23, 2022

Uh oh!

szetszwo left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

szetszwo left a comment

Choose a reason for hiding this comment

Uh oh!

guohao-rosicky Nov 24, 2021 •

edited

Loading

captainzmc commented Nov 26, 2021 •

edited

Loading

guohao-rosicky commented Jan 24, 2022 •

edited

Loading

szetszwo commented Mar 24, 2022 •

edited

Loading

guohao-rosicky commented Mar 24, 2022 •

edited

Loading

guohao-rosicky commented Apr 12, 2022 •

edited

Loading