Skip to content

Conversation

@ben-roling
Copy link
Contributor

This started with HADOOP-16085-003.patch from the JIRA.

I'm switching over to a PR instead of using patch files attached to the JIRA. I expect that will make review easier.

I've addressed a few things since that patch:

  • copy exception handling - handling 412 error on the response
  • addressed Gabor's comments on
    TestPathMetadataDynamoDBTranslation, TestDirListingMetadata
  • fixed a problem I introduced around inconsistency between PathMetadata.isEmptyDir and the underlying S3AFileStatus.isEmpyDir that was manifesting in failures to clean up files after tests
  • increased default LocalMetadataStore cache timeout as the low 10 second default was making debugging some failing tests confusing as the outcome would depend on how quickly I went through breakpoints
  • fixed S3 Select test in ITestS3ARemoteFileChanged and added test for copy/rename
  • improved documentation

I haven't actually run all the tests again since these changes. Also, I think there might be a couple more tests to add or alter. For example, I don't have an explicit integration test yet to read a file that has no ETag or versionId in S3Guard.

I'll make another pass through but figured it is worthwhile to post the progress.

uri);
}
return versionId;
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

whitespace:end of line


private final Mode mode;
private final boolean requireVersion;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

whitespace:end of line

@hadoop-yetus
Copy link

💔 -1 overall

Vote Subsystem Runtime Comment
0 reexec 21 Docker mode activated.
_ Prechecks _
+1 @author 0 The patch does not contain any @author tags.
+1 test4tests 0 The patch appears to include 18 new or modified test files.
_ trunk Compile Tests _
+1 mvninstall 1091 trunk passed
+1 compile 30 trunk passed
+1 checkstyle 23 trunk passed
+1 mvnsite 35 trunk passed
+1 shadedclient 764 branch has no errors when building and testing our client artifacts.
+1 findbugs 49 trunk passed
+1 javadoc 22 trunk passed
_ Patch Compile Tests _
+1 mvninstall 29 the patch passed
+1 compile 28 the patch passed
-1 javac 28 hadoop-tools_hadoop-aws generated 1 new + 15 unchanged - 0 fixed = 16 total (was 15)
-0 checkstyle 19 hadoop-tools/hadoop-aws: The patch generated 43 new + 57 unchanged - 2 fixed = 100 total (was 59)
+1 mvnsite 32 the patch passed
-1 whitespace 0 The patch has 2 line(s) that end in whitespace. Use git apply --whitespace=fix <<patch_file>>. Refer https://git-scm.com/docs/git-apply
+1 shadedclient 770 patch has no errors when building and testing our client artifacts.
-1 findbugs 52 hadoop-tools/hadoop-aws generated 1 new + 0 unchanged - 0 fixed = 1 total (was 0)
+1 javadoc 22 the patch passed
_ Other Tests _
+1 unit 282 hadoop-aws in the patch passed.
-1 asflicense 25 The patch generated 1 ASF License warnings.
3377
Reason Tests
FindBugs module:hadoop-tools/hadoop-aws
org.apache.hadoop.fs.s3a.S3LocatedFileStatus doesn't override org.apache.hadoop.fs.LocatedFileStatus.equals(Object) At S3LocatedFileStatus.java:At S3LocatedFileStatus.java:[line 1]
Subsystem Report/Notes
Docker Client=17.05.0-ce Server=17.05.0-ce base: https://builds.apache.org/job/hadoop-multibranch/job/PR-646/1/artifact/out/Dockerfile
GITHUB PR #646
JIRA Issue HADOOP-16085
Optional Tests dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle
uname Linux 63e0c6f06812 4.4.0-138-generic #164~14.04.1-Ubuntu SMP Fri Oct 5 08:56:16 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality personality/hadoop.sh
git revision trunk / ce4bafd
maven version: Apache Maven 3.3.9
Default Java 1.8.0_191
findbugs v3.1.0-RC1
javac https://builds.apache.org/job/hadoop-multibranch/job/PR-646/1/artifact/out/diff-compile-javac-hadoop-tools_hadoop-aws.txt
checkstyle https://builds.apache.org/job/hadoop-multibranch/job/PR-646/1/artifact/out/diff-checkstyle-hadoop-tools_hadoop-aws.txt
whitespace https://builds.apache.org/job/hadoop-multibranch/job/PR-646/1/artifact/out/whitespace-eol.txt
findbugs https://builds.apache.org/job/hadoop-multibranch/job/PR-646/1/artifact/out/new-findbugs-hadoop-tools_hadoop-aws.html
Test Results https://builds.apache.org/job/hadoop-multibranch/job/PR-646/1/testReport/
asflicense https://builds.apache.org/job/hadoop-multibranch/job/PR-646/1/artifact/out/patch-asflicense-problems.txt
Max. process+thread count 339 (vs. ulimit of 5500)
modules C: hadoop-tools/hadoop-aws U: hadoop-tools/hadoop-aws
Console output https://builds.apache.org/job/hadoop-multibranch/job/PR-646/1/console
Powered by Apache Yetus 0.9.0 http://yetus.apache.org

This message was automatically generated.

new Listing.AcceptFilesOnly(qualify(f))));
}

private static RemoteIterator<LocatedFileStatus> toLocatedFileStatusIterator(
Copy link
Contributor Author

@ben-roling ben-roling Mar 27, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if there is a better way to do this. I have RemoteIterator<S3LocatedFileStatus> but need to return RemoteIterator<LocatedFileStatus> from the public methods like listFiles() so I use this to do the conversion.

}
});
RemoteIterator<? extends LocatedFileStatus> iterator =
once("listLocatedStatus", path.toString(),
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The GitHub diff makes this look like a bigger change than it is due to change in indentation. Really all that changed in this block is assignment of the once() return value to the iterator on line 3610, the call to toLocatedFileStatusIterator() on 3644, and use of S3AFileStatus instead of vanilla FileStatus on lines 3614, 3629.

* @throws PathIOException raised on failure
* @throws RemoteFileChangedException if the remote file has changed.
*/
public void processResponse(final CopyResult copyResult)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe I shouldn't have even put this method in here given it doesn't do anything (at least right now). I added it before I realized the ETag and versionId wouldn't (necessarily) be the same on the copied object. With certain encryption algorithms, ETag will actually be the same and I suppose we could try to compare in those cases but it felt too awkward to bother with.

I'm curious for feedback. Should I remove this method altogether? Or leave it as a means to document that it wasn't just forgotten?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, one thing I can do here is enforce fs.s3a.change.detection.version.required (if set) to make sure the CopyResult has an ETag or versionId if one is expected. I'll add that.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, and maybe add debug logging as well

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated

@hadoop-yetus
Copy link

🎊 +1 overall

Vote Subsystem Runtime Comment
0 reexec 27 Docker mode activated.
_ Prechecks _
+1 @author 0 The patch does not contain any @author tags.
+1 test4tests 0 The patch appears to include 18 new or modified test files.
_ trunk Compile Tests _
+1 mvninstall 1139 trunk passed
+1 compile 36 trunk passed
+1 checkstyle 24 trunk passed
+1 mvnsite 37 trunk passed
+1 shadedclient 757 branch has no errors when building and testing our client artifacts.
+1 findbugs 45 trunk passed
+1 javadoc 22 trunk passed
_ Patch Compile Tests _
+1 mvninstall 30 the patch passed
+1 compile 27 the patch passed
+1 javac 27 the patch passed
-0 checkstyle 19 hadoop-tools/hadoop-aws: The patch generated 23 new + 56 unchanged - 3 fixed = 79 total (was 59)
+1 mvnsite 32 the patch passed
+1 whitespace 0 The patch has no whitespace issues.
+1 shadedclient 781 patch has no errors when building and testing our client artifacts.
+1 findbugs 50 the patch passed
+1 javadoc 25 the patch passed
_ Other Tests _
+1 unit 271 hadoop-aws in the patch passed.
+1 asflicense 26 The patch does not generate ASF License warnings.
3414
Subsystem Report/Notes
Docker Client=17.05.0-ce Server=17.05.0-ce base: https://builds.apache.org/job/hadoop-multibranch/job/PR-646/2/artifact/out/Dockerfile
GITHUB PR #646
JIRA Issue HADOOP-16085
Optional Tests dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle
uname Linux 819ff5509e9b 4.4.0-139-generic #165~14.04.1-Ubuntu SMP Wed Oct 31 10:55:11 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality personality/hadoop.sh
git revision trunk / b226958
maven version: Apache Maven 3.3.9
Default Java 1.8.0_191
findbugs v3.1.0-RC1
checkstyle https://builds.apache.org/job/hadoop-multibranch/job/PR-646/2/artifact/out/diff-checkstyle-hadoop-tools_hadoop-aws.txt
Test Results https://builds.apache.org/job/hadoop-multibranch/job/PR-646/2/testReport/
Max. process+thread count 340 (vs. ulimit of 5500)
modules C: hadoop-tools/hadoop-aws U: hadoop-tools/hadoop-aws
Console output https://builds.apache.org/job/hadoop-multibranch/job/PR-646/2/console
Powered by Apache Yetus 0.9.0 http://yetus.apache.org

This message was automatically generated.

"fs.s3a.s3guard.local.ttl";
public static final int DEFAULT_S3GUARD_METASTORE_LOCAL_ENTRY_TTL
= 10 * 1000;
= 120 * 1000;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seems to be a common problem, this was increased to 60s in #624 / #630

* @param blockSize block size
* @param owner owner
* @param group group
* @param permission persmission
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo, persmission

* Get the change detection policy for this FS instance.
* @return the change detection policy
*/
@VisibleForTesting
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No longer needs @VisibleForTesting annotation now, since public

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I actually thought a little about this. I was thinking of @VisibileForTesting in this case as documenting this method is only public to allow access in tests (in a different package). I know it is more typically used on protected or package-private methods.

I'm curious if there is any feedback about this being public in general? I'm accessing it across packages in a couple of tests (in the s3guard and select packages).

I can reinforce that it is only visible for tests by mentioning that explicitly in the javadoc.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See updated javadoc to mention only public to allow access in tests in other packages.


@Override
public boolean equals(Object o) {
return super.equals(o);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If not changing the behavior, no need to override equals and hashCode methods here - only valid reason might be to raise attention to the fact that the eTag and versionId are ignored by them? in that case, should add a comment to explain why.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Originally I didn't override this. FindBugs flagged it as an issue, which prompted me to add it. The base LocatedFileStatus equality is only based on Path and this implementation doesn't need to be different.

It looks like some would argue FindBugs shouldn't flag this:
https://sourceforge.net/p/findbugs/bugs/1379/

I'll just add a comment to explain why I'm implementing and why it's ok to not to include ETag and version ID.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sounds good

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a non-javadoc comment to explain this.

private final Mode mode;
private final boolean requireVersion;

public abstract String getRevisionId(S3ObjectAttributes s3Attributes);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add javadoc for these 2 new methods

@hadoop-yetus
Copy link

🎊 +1 overall

Vote Subsystem Runtime Comment
0 reexec 23 Docker mode activated.
_ Prechecks _
+1 @author 0 The patch does not contain any @author tags.
+1 test4tests 0 The patch appears to include 21 new or modified test files.
_ trunk Compile Tests _
+1 mvninstall 1061 trunk passed
+1 compile 31 trunk passed
+1 checkstyle 22 trunk passed
+1 mvnsite 34 trunk passed
+1 shadedclient 755 branch has no errors when building and testing our client artifacts.
+1 findbugs 51 trunk passed
+1 javadoc 22 trunk passed
_ Patch Compile Tests _
+1 mvninstall 33 the patch passed
+1 compile 31 the patch passed
+1 javac 31 the patch passed
-0 checkstyle 19 hadoop-tools/hadoop-aws: The patch generated 23 new + 57 unchanged - 3 fixed = 80 total (was 60)
+1 mvnsite 35 the patch passed
+1 whitespace 0 The patch has no whitespace issues.
+1 shadedclient 846 patch has no errors when building and testing our client artifacts.
+1 findbugs 54 the patch passed
+1 javadoc 21 the patch passed
_ Other Tests _
+1 unit 270 hadoop-aws in the patch passed.
+1 asflicense 25 The patch does not generate ASF License warnings.
3427
Subsystem Report/Notes
Docker Client=17.05.0-ce Server=17.05.0-ce base: https://builds.apache.org/job/hadoop-multibranch/job/PR-646/3/artifact/out/Dockerfile
GITHUB PR #646
JIRA Issue HADOOP-16085
Optional Tests dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle
uname Linux 2d88dc5cb77e 4.4.0-139-generic #165~14.04.1-Ubuntu SMP Wed Oct 31 10:55:11 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality personality/hadoop.sh
git revision trunk / 9cd6619
maven version: Apache Maven 3.3.9
Default Java 1.8.0_191
findbugs v3.1.0-RC1
checkstyle https://builds.apache.org/job/hadoop-multibranch/job/PR-646/3/artifact/out/diff-checkstyle-hadoop-tools_hadoop-aws.txt
Test Results https://builds.apache.org/job/hadoop-multibranch/job/PR-646/3/testReport/
Max. process+thread count 340 (vs. ulimit of 5500)
modules C: hadoop-tools/hadoop-aws U: hadoop-tools/hadoop-aws
Console output https://builds.apache.org/job/hadoop-multibranch/job/PR-646/3/console
Powered by Apache Yetus 0.9.0 http://yetus.apache.org

This message was automatically generated.

S3Guard metadata store (e.g. DynamoDB). On opening a file, S3AFileSystem
will look in S3 for the version of the file indicated by the ETag or object
version ID stored in the metadata store. If that version is unavailable,
`RemoteFileChangedException` is thrown. Whether ETag or version ID and

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

whitespace:end of line

@hadoop-yetus
Copy link

💔 -1 overall

Vote Subsystem Runtime Comment
0 reexec 26 Docker mode activated.
_ Prechecks _
+1 @author 0 The patch does not contain any @author tags.
+1 test4tests 0 The patch appears to include 22 new or modified test files.
_ trunk Compile Tests _
+1 mvninstall 1008 trunk passed
+1 compile 30 trunk passed
+1 checkstyle 24 trunk passed
+1 mvnsite 36 trunk passed
+1 shadedclient 685 branch has no errors when building and testing our client artifacts.
+1 findbugs 41 trunk passed
+1 javadoc 20 trunk passed
_ Patch Compile Tests _
+1 mvninstall 29 the patch passed
+1 compile 26 the patch passed
+1 javac 26 the patch passed
-0 checkstyle 17 hadoop-tools/hadoop-aws: The patch generated 24 new + 62 unchanged - 3 fixed = 86 total (was 65)
+1 mvnsite 29 the patch passed
-1 whitespace 0 The patch has 1 line(s) that end in whitespace. Use git apply --whitespace=fix <<patch_file>>. Refer https://git-scm.com/docs/git-apply
+1 shadedclient 687 patch has no errors when building and testing our client artifacts.
+1 findbugs 48 the patch passed
+1 javadoc 22 the patch passed
_ Other Tests _
+1 unit 270 hadoop-aws in the patch passed.
+1 asflicense 29 The patch does not generate ASF License warnings.
3096
Subsystem Report/Notes
Docker Client=17.05.0-ce Server=17.05.0-ce base: https://builds.apache.org/job/hadoop-multibranch/job/PR-646/4/artifact/out/Dockerfile
GITHUB PR #646
JIRA Issue HADOOP-16085
Optional Tests dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle
uname Linux 98128039d9c4 4.4.0-138-generic #164-Ubuntu SMP Tue Oct 2 17:16:02 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality personality/hadoop.sh
git revision trunk / 9cd6619
maven version: Apache Maven 3.3.9
Default Java 1.8.0_191
findbugs v3.1.0-RC1
checkstyle https://builds.apache.org/job/hadoop-multibranch/job/PR-646/4/artifact/out/diff-checkstyle-hadoop-tools_hadoop-aws.txt
whitespace https://builds.apache.org/job/hadoop-multibranch/job/PR-646/4/artifact/out/whitespace-eol.txt
Test Results https://builds.apache.org/job/hadoop-multibranch/job/PR-646/4/testReport/
Max. process+thread count 444 (vs. ulimit of 5500)
modules C: hadoop-tools/hadoop-aws U: hadoop-tools/hadoop-aws
Console output https://builds.apache.org/job/hadoop-multibranch/job/PR-646/4/console
Powered by Apache Yetus 0.9.0 http://yetus.apache.org

This message was automatically generated.

@steveloughran steveloughran added the fs/s3 changes related to hadoop-aws; submitter must declare test endpoint label Mar 28, 2019
@ben-roling
Copy link
Contributor Author

Really appreciate the review @noslowerdna ! I think I have addressed all of your comments now and I don't have any remaining TODOs on my own list. @steveloughran and @bgaborg I know this is kind of lengthy but if you can get a chance to review I will be really grateful! I'll be happy to address any suggestions you have for improvement.

I ran the full integration tests (minus -Ds3guard) yesterday and they passed with both fs.s3a.change.detection.source=etag and versionId. I don't think any changes I've made since would affect the result but I will run again anyway as well as with -Ds3guard.


The S3Guard metadata for a file can be corrected with the `s3guard import`
command as discussed above. The command can take a file URI instead of a
bucket URI to correct the metdata for a single file. For example:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo, met -> meta

@hadoop-yetus
Copy link

💔 -1 overall

Vote Subsystem Runtime Comment
0 reexec 0 Docker mode activated.
-1 patch 8 #646 does not apply to trunk. Rebase required? Wrong Branch? See https://wiki.apache.org/hadoop/HowToContribute for help.
Subsystem Report/Notes
GITHUB PR #646
JIRA Issue HADOOP-16085
Console output https://builds.apache.org/job/hadoop-multibranch/job/PR-646/5/console
Powered by Apache Yetus 0.9.0 http://yetus.apache.org

This message was automatically generated.

@hadoop-yetus
Copy link

💔 -1 overall

Vote Subsystem Runtime Comment
0 reexec 0 Docker mode activated.
-1 patch 7 #646 does not apply to trunk. Rebase required? Wrong Branch? See https://wiki.apache.org/hadoop/HowToContribute for help.
Subsystem Report/Notes
GITHUB PR #646
JIRA Issue HADOOP-16085
Console output https://builds.apache.org/job/hadoop-multibranch/job/PR-646/6/console
Powered by Apache Yetus 0.9.0 http://yetus.apache.org

This message was automatically generated.

@ben-roling
Copy link
Contributor Author

It looks like Yetus is confused. I merged in trunk and Github seems to indicate the PR would merge cleanly. @steveloughran @bgaborg are there any tricks to get Yetus to do another build? I guess I could push some meaningless commit (e.g. whitespace change) and then another commit to revert that... Is there anything else I can/should do?

@bgaborg
Copy link

bgaborg commented Apr 1, 2019

You can amend (e.g edit your last commit message) and force push to trigger yetus. I'll review your change shortly.

@hadoop-yetus
Copy link

💔 -1 overall

Vote Subsystem Runtime Comment
0 reexec 0 Docker mode activated.
-1 patch 7 #646 does not apply to trunk. Rebase required? Wrong Branch? See https://wiki.apache.org/hadoop/HowToContribute for help.
Subsystem Report/Notes
GITHUB PR #646
JIRA Issue HADOOP-16085
Console output https://builds.apache.org/job/hadoop-multibranch/job/PR-646/7/console
Powered by Apache Yetus 0.9.0 http://yetus.apache.org

This message was automatically generated.

@ben-roling
Copy link
Contributor Author

I added another unit test so that should trigger another Yetus build.

@ben-roling
Copy link
Contributor Author

You can amend (e.g edit your last commit message) and force push to trigger yetus. I'll review your change shortly.

Thanks @bgaborg . I'm looking forward to your review and I'll keep that trick in mind in case it happens again.

@ben-roling
Copy link
Contributor Author

I added another unit test so that should trigger another Yetus build.

I failed to notice that yetus had already failed again after my latest commit adding the test. I'll examine the logs more closely to see if I can determine how yetus is coming to the conclusion that my PR does not apply despite the fact that GitHub seems to be indicating it is ok.

@ben-roling
Copy link
Contributor Author

I can't articulate the exact cause, but I was able to figure out a bit more about how Yetus works and was able to reproduce the patch application failure. GitHub allows a PR to be downloaded as a patch. Yetus uses that here.

This PR can be seen by suffixing the PR URL with ".patch":
https://github.com/apache/hadoop/pull/646.patch

I downloaded that and when I try to apply it to trunk it fails. I don't know quite why exactly but since it started after I merged trunk back into the branch I'm guessing I will probably need to do a rebase on trunk instead of merging in trunk.

At this point I've opened a new PR (#675) with this same content merge-squashed on trunk. Yetus should build it cleanly. I'm closing this PR out, leaving it as-is to leave the review context unchanged for any reviewer that might like to refer back to previous discussions here.

@ben-roling ben-roling closed this Apr 1, 2019
shanthoosh pushed a commit to shanthoosh/hadoop that referenced this pull request Oct 15, 2019
…ke shared context changes easier

This replaces apache/samza#638, I accidentally messed up that branch.
The difference between this PR and the last review by prateekm is apache/samza@5d55299

Author: Cameron Lee <[email protected]>

Reviewers: Prateek Maheshwari <[email protected]>

Closes apache#646 from cameronlee314/refactor_unit_tests_for_shared_context_new
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

fs/s3 changes related to hadoop-aws; submitter must declare test endpoint

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants