Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(s3 dataplane): Fix transfer of empty objects #417

Merged
merged 15 commits into from
Aug 21, 2024

Conversation

rafaelmag110
Copy link
Contributor

What this PR changes/adds

When creating a folder via the AWS console, a 0 byte object with the name of the folder is created.
This object is picked up by the s3 listObjects call and passed to transferParts as a valid part to be transfered. This creates an unnecessary Part to be uploaded which can be filtered out.

Also, the multipart upload logic wasn't able to upload 0 byte files since no empty completedParts list can be used in a completedMultipartUpload request.

Why it does that

To enable the transfer of 0 byte files and remove the unecessary upload of folder 0 byte marker.

Further notes

Some tests where adapted to best represent the testing cases.

Linked Issue(s)

Closes #384

Copy link

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are always happy to welcome new contributors ❤️ To make things easier for everyone, please make sure to follow our contribution guidelines, check if you have already signed the ECA, and relate this pull request to an existing issue or discussion.

@rafaelmag110 rafaelmag110 marked this pull request as ready for review August 19, 2024 21:27
@codecov-commenter
Copy link

codecov-commenter commented Aug 20, 2024

⚠️ Please install the 'codecov app svg image' to ensure uploads and comments are reliably processed by Codecov.

Codecov Report

Attention: Patch coverage is 66.66667% with 2 lines in your changes missing coverage. Please review.

Project coverage is 65.53%. Comparing base (d177a98) to head (900004a).
Report is 63 commits behind head on main.

Files Patch % Lines
...pse/edc/connector/dataplane/aws/s3/S3DataSink.java 75.00% 0 Missing and 1 partial ⚠️
...e/edc/connector/dataplane/aws/s3/S3DataSource.java 50.00% 0 Missing and 1 partial ⚠️

❗ Your organization needs to install the Codecov GitHub app to enable full functionality.

Additional details and impacted files
@@             Coverage Diff              @@
##               main     #417      +/-   ##
============================================
+ Coverage     63.82%   65.53%   +1.70%     
- Complexity        0      117     +117     
============================================
  Files            26       28       +2     
  Lines           633      676      +43     
  Branches         30       32       +2     
============================================
+ Hits            404      443      +39     
+ Misses          222      218       -4     
- Partials          7       15       +8     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@paullatzelsperger paullatzelsperger added the bug Something isn't working label Aug 20, 2024
@rafaelmag110
Copy link
Contributor Author

While discussing the changes with @bmg13 we identified another case where having nested folder files would brake the tests. We accounted for that case in the last commit.

@paullatzelsperger This is ready for review. I can't add properly request the review...

Copy link
Member

@ndr-brt ndr-brt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just some nits, all good overall

@@ -116,8 +118,12 @@ private List<S3Object> fetchPrefixedS3Objects() {
return s3Objects;
}

private Collection<S3Object> filterOutFolderFile(List<S3Object> contents) {
return contents.stream().filter(object -> !object.key().endsWith("/")).collect(Collectors.toSet());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why does it transform a List to a Set?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Initial idea was to avoid duplicates. Thinking about it a second time and a files full name cannot be a duplicate, so this transformation makes no sense.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changed in 62ef21a

@@ -116,8 +118,12 @@ private List<S3Object> fetchPrefixedS3Objects() {
return s3Objects;
}

private Collection<S3Object> filterOutFolderFile(List<S3Object> contents) {
return contents.stream().filter(object -> !object.key().endsWith("/")).collect(Collectors.toSet());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the filter could be extracted in a field/constant like:

Predicate<S3Object> isFile = object -> !object.key.endsWith("/");

and this method could be inlined

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changed in 62ef21a

@@ -105,6 +107,12 @@ void should_copy_using_destination_object_name_case_single_transfer(List<String>
var objectNameInDestination = "object-name-in-destination";
var objectContent = UUID.randomUUID().toString();

//Put folder 0 byte size file marker. AWS does this when a folder is created via the console.
if (!isSingleObject) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know the test was already structured like this, but it's not a good practice to have conditionals into a test, better to have 2 distinct tests. It could also be solved in a different PR

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree these test class deserve some attention. The conditionals are there because the argument provider is the same for both test cases. I maintained it the same way to avoid refactoring the thing since it seems to be better done in another PR. I can create the issue.

@rafaelmag110 rafaelmag110 requested a review from ndr-brt August 20, 2024 14:15
@@ -107,7 +109,9 @@ private List<S3Object> fetchPrefixedS3Objects() {

var response = client.listObjectsV2(listObjectsRequest);

s3Objects.addAll(response.contents());
Predicate<S3Object> isFile = object -> !object.key().endsWith("/");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this can be a class field or constant, so it won't be recreated each time

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

addressed in a6645ac

@ndr-brt ndr-brt merged commit d0c0335 into eclipse-edc:main Aug 21, 2024
12 checks passed
@rafaelmag110 rafaelmag110 deleted the fix_empty_file_copy branch November 29, 2024 13:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging this pull request may close these issues.

AWS S3 folder copy not working
5 participants