Skip to content

Conversation

@ivandika3
Copy link
Contributor

@ivandika3 ivandika3 commented Mar 25, 2024

What changes were proposed in this pull request?

Currently, the MessageDigest instance is a thread local variable (one per S3G Jetty thread). MessageDigest requires the call to either MessageDigest#digest or MessageDigest#reset to reset the digest.

In normal ObjectEndpoint#put flow, MessageDigest#digest is called after the data has been written to the datanodes, before the key is committed. However, if an IOException happens (e.g. EOFException due to client cancelling during the write), the digest will not be reset and remains in the inconsistent state. This will affect the subsequent request that uses the same thread and therefore the ETag generated will be completely different from the md5 hash of the object causing AWS S3 SDK to detect inconsistent hash when downloading the object.

The issue can be replicated using an S3G with a few threads and doing three put-object operations for the same key and same payload. You can set the hadoop.http.max.threads in ozone-site.xml to a small value (e.g. 4) to increase the chance of the same thread handling the request.

  • 1st put-object: cancel the operation before it put-object operation can finish, ensure the EOFException is thrown in the S3Gateway logs

  • 2nd put-object: let the put-object finish. The resulting ETag will not be the same as the md5 digest of the payload (you might need to do this for a few time since the S3G thread might not be the same from the previous call)

  • 3rd put-object: also let the put-object finish. Since the previous put-object reset the digest, the resulting ETag will be correct.

This patch adds a call to MessageDigest#reset in ObjectEndpoint#put to reset the digest in case of exception. Another valid alternative is to call the MessageDigest#reset just after the DigestInputStream initialization.

What is the link to the Apache JIRA

https://issues.apache.org/jira/browse/HDDS-10587

How was this patch tested?

Manual test from Ozone Intellij IDE setup as shown in the description.

Ref: https://cwiki.apache.org/confluence/display/OZONE/Run+Ozone+cluster+from+IDE

Clean CI run: https://github.com/ivandika3/ozone/actions/runs/8421982154

@ivandika3 ivandika3 marked this pull request as ready for review March 25, 2024 20:57
@ivandika3
Copy link
Contributor Author

@vtutrinov @myskov Could you help take a look when you have time?

@vtutrinov
Copy link
Contributor

@ivandika3 I'd like to see a set of unit tests to check that the digest message will be resetted in case of exception. An example is:

ObjectEndpoint endpoint = spy(...);
try (MockedStatic staticMock = mockStatic(ObjectEndpointStreaming.class)) {
    staticMock.when(() -> ObjectEndpointStreaming.put(...)).thenThrow(IOException.class);
    endpoint.put(...);
    verify(endpoint.getETagProvider()).reset();
}

@ivandika3
Copy link
Contributor Author

@vtutrinov Thank you for the review and unit test idea. I have added unit tests for put, copy, and MPU part upload cases. PTAL.

@vtutrinov
Copy link
Contributor

@ivandika3 thanks for the PR and the unit tests, LGTM, +1

@adoroszlai adoroszlai merged commit c6c611f into apache:master Mar 26, 2024
@adoroszlai
Copy link
Contributor

Thanks @ivandika3 for the patch, @vtutrinov for the review.

@ivandika3
Copy link
Contributor Author

Thanks @vtutrinov for the review and @adoroszlai for the merge.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

s3 S3 Gateway

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants