Skip to content

Conversation

@guohao-rosicky
Copy link
Contributor

@guohao-rosicky guohao-rosicky commented Nov 22, 2021

What changes were proposed in this pull request?

Added support for stream on S3Gateway write path

What is the link to the Apache JIRA

https://issues.apache.org/jira/browse/HDDS-5869

@guohao-rosicky
Copy link
Contributor Author

hi @szetszwo @captainzmc , can you help review this patch?

Copy link
Contributor

@szetszwo szetszwo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@guohao-rosicky , thanks for working on this.

This is tricky since we want to have zero buffer copying. Let's study how to have zero buffer copying with JAX-RS.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo? Why hadoop.conf but not hadoop.hdds.conf?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should make streaming optional. Add a new QueryParam with DefaultValue not using streaming.

+      @QueryParam("streaming") @DefaultValue("false") boolean streaming,

Use streaming only if it is true.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@szetszwo , when @QueryParam("streaming") is true, use streaming? aws s3 should not have this option.
see: https://docs.aws.amazon.com/AmazonS3/latest/API/RESTObjectPUT.html

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

when @QueryParam("streaming") is true, use streaming?

Yes.

... aws s3 should not have this option.

Similar to partNumber and uploadId, this parameter is Ozone specific. We may change the default to true later on when the streaming feature is ready.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@szetszwo I can add a configuration to s3 Gateway

Copy link
Contributor

@bharatviswa504 bharatviswa504 Nov 23, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@szetszwo and @guohao-rosicky I donot have much context on Streaming. But my comment is from the perspective of S3.

Similar to partNumber and uploadId, this parameter is Ozone specific. We may change the default to true later on >when the streaming feature is ready.

This is also part of AWS S3 protocol for multipart upload.

Adding query param streaming to this API and then passing query param from AWS S3 SDK will not be possible (As Ozone S3 talks S3 protocol only). I think based on the config selecting streaming or non-streaming would be the way to go here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is also part of AWS S3 protocol for multipart upload.

@bharatviswa504 , thanks for the info. You are right that partNumber and uploadId are also in AWS; see https://docs.aws.amazon.com/AmazonS3/latest/API/API_UploadPart.html

Adding query param streaming to this API and then passing query param from AWS S3 SDK will not be possible ...

It is still better to add a query param and set the default to false (non-streaming) for now. After streaming is stable, we will change the default to true. We need this param for testing and branch mark without restarting the machines. We should set the param name to something like "internalOzoneParam-streaming" to emphasize that it is not in the AWS S3 protocol.

AWS S3 SDK will always use the default.

We may have a conf to override the default value if such conf is useful.

Comment on lines 271 to 298
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should avoid buffer copying. I wonder if we should other body type (instead of InputStream) in order to support zero buffer copying; see https://docs.oracle.com/cd/E19798-01/821-1841/6nmq2cp2b/index.html#gkccg

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@szetszwo I looked at this document, Is it available java.io.File All media types (*/*)?

Reading files is similar to the implementation here: https://github.com/apache/ozone/blob/HDDS-4454/hadoop-ozone/tools/src/main/java/org/apache/hadoop/ozone/shell/keys/PutKeyHandler.java#L134

Copy link
Contributor Author

@guohao-rosicky guohao-rosicky Nov 23, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@szetszwo I submitted the new code, is it ok to change it like this, please take a look.

Such a change is a little big, and the previous UT needs to be changed

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

... Is it available java.io.File All media types (/)?

We should try it.

Such a change is a little big, and the previous UT needs to be changed

I suggest we should change the case when uploadId == null (non-createMultipartKey) first. Then, change the createMultipartKey case in a separate JIRA.

Copy link
Contributor Author

@guohao-rosicky guohao-rosicky Nov 24, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@szetszwo
like this

public Response put(
      @PathParam("bucket") String bucketName,
      @PathParam("path") String keyPath,
      @HeaderParam("Content-Length") long length,
      @QueryParam("partNumber")  int partNumber,
      @QueryParam("uploadId") @DefaultValue("") String uploadID,
      File bodyFile) throws IOException, OS3Exception {

If change InputStream to File, all UT related to put cannot be used. Is there any way to solve this problem

Copy link
Contributor Author

@guohao-rosicky guohao-rosicky Nov 25, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For example here

org.apache.hadoop.ozone.s3.endpoint.TestObjectPut

@Test
  public void testPutObject() throws IOException, OS3Exception {
    //GIVEN
    HttpHeaders headers = Mockito.mock(HttpHeaders.class);
    ByteArrayInputStream body =
        new ByteArrayInputStream(CONTENT.getBytes(UTF_8));
    objectEndpoint.setHeaders(headers);

    //WHEN
    Response response = objectEndpoint.put(bucketName, keyName, CONTENT
        .length(), 1, null, body);


    //THEN
    OzoneInputStream ozoneInputStream =
        clientStub.getObjectStore().getS3Bucket(bucketName)
            .readKey(keyName);
    String keyContent =
        IOUtils.toString(ozoneInputStream, UTF_8);

    Assert.assertEquals(200, response.getStatus());
    Assert.assertEquals(CONTENT, keyContent);
  }

put method Parameters for the InputStream

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If streaming is enabled through configuration, do you need to retain the InputStream?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We may add another put method as below:

+++ b/hadoop-ozone/s3gateway/src/main/java/org/apache/hadoop/ozone/s3/endpoint/ObjectEndpoint.java
@@ -155,6 +153,24 @@ public Response put(
       @HeaderParam("Content-Length") long length,
       @QueryParam("partNumber")  int partNumber,
       @QueryParam("uploadId") @DefaultValue("") String uploadID,
+      @QueryParam("ozoneInternalParam_streaming") @DefaultValue("false")
+          boolean ozoneInternalParam_streaming,
+      File body) throws IOException, OS3Exception {
+    if (!ozoneInternalParam_streaming) {
+      final BufferedInputStream in = new BufferedInputStream(new FileInputStream(body));
+      return put(bucketName, keyPath, length, partNumber, uploadID, in);
+    }
+
+    // new code for Ratis streaming
+    ...
+  }
+
+  public Response put(
+      String bucketName,
+      String keyPath,
+      long length,
+      int partNumber,
+      String uploadID,
       InputStream body) throws IOException, OS3Exception {
 
     OzoneOutputStream output = null;

Copy link
Contributor

@szetszwo szetszwo Nov 25, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then, the existing tests will remain working. Of course, we need to create new tests for the new code.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, I'm going to implement a version, and you can take a look

@szetszwo
Copy link
Contributor

@guohao-rosicky , the compilation failed. Please take a looks. Thanks.

@guohao-rosicky
Copy link
Contributor Author

@szetszwo This PR is implemented using buffer copy, and then I'll turn on another PR to optimize for zero copy

https://issues.apache.org/jira/browse/HDDS-6133

@guohao-rosicky guohao-rosicky force-pushed the HDDS-4454-gateway-stream branch 3 times, most recently from 3523cd6 to 9d007cd Compare December 22, 2021 10:45
@guohao-rosicky
Copy link
Contributor Author

Added the enable Stream configuration on the Gateway side.

Please take a look. @szetszwo

Copy link
Contributor

@szetszwo szetszwo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@guohao-rosicky , thanks for the update. Some comments inlined.

Please add some new tests. Otherwise, we don't know if the new code is working.

Comment on lines 67 to 70
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please reuse OzoneConfigKeys.DFS_CONTAINER_RATIS_DATASTREAM_ENABLE instead of adding a new conf.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's call it "datastreamEnabled".

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's move the new code to a new class, say ObjectEndpointStreaming. Otherwise, there are many changes here and it becomes hard to maintain.

  @PUT
  public Response put(
      @PathParam("bucket") String bucketName,
      @PathParam("path") String keyPath,
      @HeaderParam("Content-Length") long length,
      @QueryParam("partNumber")  int partNumber,
      @QueryParam("uploadId") @DefaultValue("") String uploadID,
      InputStream body) throws IOException, OS3Exception {

    if (datastreamEnabled) {
      return objectEndpointStreaming.put(bucketName, keyPath, length, partNumber, uploadID, body);
    }
    ...
  }

@guohao-rosicky guohao-rosicky force-pushed the HDDS-4454-gateway-stream branch 2 times, most recently from fecd0f4 to 9f781ef Compare January 24, 2022 06:20
@guohao-rosicky guohao-rosicky force-pushed the HDDS-4454-gateway-stream branch from 1b3af21 to 49bd7ba Compare February 8, 2022 02:23
@szetszwo szetszwo force-pushed the HDDS-4454 branch 2 times, most recently from f637b1e to ce1d880 Compare February 16, 2022 14:47
@guohao-rosicky guohao-rosicky force-pushed the HDDS-4454-gateway-stream branch 2 times, most recently from 3bb41d4 to f6dd95f Compare February 22, 2022 08:48
@guohao-rosicky guohao-rosicky force-pushed the HDDS-4454-gateway-stream branch from f6dd95f to 6f307e3 Compare March 15, 2022 08:35
@guohao-rosicky guohao-rosicky force-pushed the HDDS-4454-gateway-stream branch from d61543a to 7ef985f Compare March 18, 2022 11:12
@guohao-rosicky guohao-rosicky force-pushed the HDDS-4454-gateway-stream branch from 7ef985f to bff6e56 Compare March 29, 2022 09:47
@guohao-rosicky guohao-rosicky force-pushed the HDDS-4454-gateway-stream branch from d8ddda7 to 2f9cbb3 Compare April 10, 2022 01:04
@guohao-rosicky guohao-rosicky force-pushed the HDDS-4454-gateway-stream branch from cd2d180 to 9f86764 Compare June 24, 2022 04:52
@captainzmc
Copy link
Member

Thanks @guohao-rosicky update this. @szetszwo Can you help take another look?

Copy link
Contributor

@szetszwo szetszwo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@guohao-rosicky , thanks a lot for working on this. Are the crypto changes related to S3Gateway? Could we separated it out since this is very big?

Also, please revert all the whitespace changes, especially for the files existing in the master branch. Otherwise, the merge will be harder.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please don't have whitespace change like this. Since this is an existing file in the master branch, it will make the merge harder.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please make the new fields final.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed, we should create an ObjectEndpointStreaming streaming object and use streaming != null for checking if streaming is enabled.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add a local variable. Otherwise, it will affect other calls.

@guohao-rosicky guohao-rosicky force-pushed the HDDS-4454-gateway-stream branch from 9f86764 to 473c302 Compare June 27, 2022 04:32
@guohao-rosicky guohao-rosicky force-pushed the HDDS-4454-gateway-stream branch from 473c302 to a6f5c40 Compare June 27, 2022 06:12
@guohao-rosicky
Copy link
Contributor Author

Are the crypto changes related to S3Gateway? Could we separated it out since this is very big?

Hi, @szetszwo I can work this in https://issues.apache.org/jira/browse/HDDS-5892

@guohao-rosicky guohao-rosicky requested a review from szetszwo June 28, 2022 02:03
@szetszwo
Copy link
Contributor

@guohao-rosicky , @captainzmc , this pull request is around for quite a long time. We should plan about how to get this committed. One problem is that this involves too much code change. Let's separate the individual methods put(..) and createMultipartKey(..) into different pull requests.

Also, we should refactor the ObjectEndpoint code in master first. Otherwise, merging becomes a pain. Let me work on the refactoring.

Copy link
Contributor

@szetszwo szetszwo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@guohao-rosicky , really appreciated that you work on this over a long time! Some comments inlined.

Comment on lines +212 to +217
boolean enableEC = false;
if ((replicationConfig != null &&
replicationConfig.getReplicationType() == EC) ||
bucket.getReplicationConfig() instanceof ECReplicationConfig) {
enableEC = true;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is incorrect when replicationConfig != null && replicationConfig.getReplicationType() != EC. As an isEc() method as below.

  static boolean isEc(ReplicationConfig replication, OzoneBucket bucket) {
    if (replication != null) {
      return replication.getReplicationType() == EC;
    } else {
      return bucket.getReplicationConfig() instanceof ECReplicationConfig;
    }
  }

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @szetszwo , I think we'd better add a separate switch for S3G like we did for OFS/O3FS. Instead of using DFS_CONTAINER_RATIS_DATASTREAM_ENABLE to control both S3G and HDDS. This allows both client and HDDS to have a switch that controls whether or not streaming is used.
Otherwise, it cannot meet the requirements of ofs/o3fs using Streaming and S3 using Async.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, a separated conf sounds good.


private List<String> customizableGetHeaders = new ArrayList<>();
private int bufferSize;
private int chunkSize;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Move chunkSize to ObjectEndpointStreaming

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed, we should create an ObjectEndpointStreaming streaming object and use streaming != null for checking if streaming is enabled.

Comment on lines +730 to +757
String copyHeader = headers.getHeaderString(COPY_SOURCE_HEADER);
String storageType = headers.getHeaderString(STORAGE_CLASS_HEADER);
ReplicationConfig replicationConfig =
getReplicationConfig(ozoneBucket, storageType);

boolean enableEC = false;
if ((replicationConfig != null &&
replicationConfig.getReplicationType() == EC) ||
ozoneBucket.getReplicationConfig() instanceof ECReplicationConfig) {
enableEC = true;
}

try {
if (datastreamEnabled && !enableEC && copyHeader != null) {
Pair<String, String> result = parseSourceHeader(copyHeader);
String sourceBucket = result.getLeft();
String sourceKey = result.getRight();
OzoneBucket sourceOzoneBucket = getBucket(sourceBucket);
return ObjectEndpointStreaming
.copyMultipartKey(Pair.of(sourceOzoneBucket, sourceKey),
Pair.of(ozoneBucket, key), length, partNumber, uploadID,
chunkSize, headers);
} else if (datastreamEnabled && !enableEC) {
return ObjectEndpointStreaming
.createMultipartKey(ozoneBucket, key, length, partNumber,
uploadID, chunkSize, body);
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Move the logic to ObjectEndpointStreaming.

@szetszwo
Copy link
Contributor

Filed https://issues.apache.org/jira/browse/HDDS-6973 for the refactoring.

@guohao-rosicky
Copy link
Contributor Author

Hi @szetszwo #3569 has been merged into master, please rebase HDDS-4454, I will modify this PR later, thanks.

@captainzmc
Copy link
Member

Hi @szetszwo #3569 has been merged into master, please rebase HDDS-4454, I will modify this PR later, thanks.

Hi @guohao-rosicky, I just had update HDDS-4454, you can continue this.

@kerneltime
Copy link
Contributor

@szetszwo should this PR be revisited?

@swamirishi
Copy link
Contributor

swamirishi commented Jan 28, 2023

@guohao-rosicky Can you rebase this branch with master? Close this PR & open a new PR with master if you want the changes to go in. HDDS-4454 has been merged with master. Correct me if I am wrong @szetszwo @kerneltime

@guohao-rosicky
Copy link
Contributor Author

@guohao-rosicky Can you rebase this branch with master? Close this PR & open a new PR with master if you want the changes to go in. HDDS-4454 has been merged with master. Correct me if I am wrong @szetszwo @kerneltime

sure.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants