-
Notifications
You must be signed in to change notification settings - Fork 9.2k
HADOOP-16085: use object version or etags to protect against inconsistent read after replace/overwrite #646
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Rebase of previous work after merge of HADOOP-15625.
| uri); | ||
| } | ||
| return versionId; | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
whitespace:end of line
|
|
||
| private final Mode mode; | ||
| private final boolean requireVersion; | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
whitespace:end of line
|
💔 -1 overall
This message was automatically generated. |
| new Listing.AcceptFilesOnly(qualify(f)))); | ||
| } | ||
|
|
||
| private static RemoteIterator<LocatedFileStatus> toLocatedFileStatusIterator( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure if there is a better way to do this. I have RemoteIterator<S3LocatedFileStatus> but need to return RemoteIterator<LocatedFileStatus> from the public methods like listFiles() so I use this to do the conversion.
| } | ||
| }); | ||
| RemoteIterator<? extends LocatedFileStatus> iterator = | ||
| once("listLocatedStatus", path.toString(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The GitHub diff makes this look like a bigger change than it is due to change in indentation. Really all that changed in this block is assignment of the once() return value to the iterator on line 3610, the call to toLocatedFileStatusIterator() on 3644, and use of S3AFileStatus instead of vanilla FileStatus on lines 3614, 3629.
| * @throws PathIOException raised on failure | ||
| * @throws RemoteFileChangedException if the remote file has changed. | ||
| */ | ||
| public void processResponse(final CopyResult copyResult) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe I shouldn't have even put this method in here given it doesn't do anything (at least right now). I added it before I realized the ETag and versionId wouldn't (necessarily) be the same on the copied object. With certain encryption algorithms, ETag will actually be the same and I suppose we could try to compare in those cases but it felt too awkward to bother with.
I'm curious for feedback. Should I remove this method altogether? Or leave it as a means to document that it wasn't just forgotten?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, one thing I can do here is enforce fs.s3a.change.detection.version.required (if set) to make sure the CopyResult has an ETag or versionId if one is expected. I'll add that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1, and maybe add debug logging as well
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated
|
🎊 +1 overall
This message was automatically generated. |
| "fs.s3a.s3guard.local.ttl"; | ||
| public static final int DEFAULT_S3GUARD_METASTORE_LOCAL_ENTRY_TTL | ||
| = 10 * 1000; | ||
| = 120 * 1000; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AFileStatus.java
Show resolved
Hide resolved
| * @param blockSize block size | ||
| * @param owner owner | ||
| * @param group group | ||
| * @param permission persmission |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typo, persmission
| * Get the change detection policy for this FS instance. | ||
| * @return the change detection policy | ||
| */ | ||
| @VisibleForTesting |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No longer needs @VisibleForTesting annotation now, since public
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I actually thought a little about this. I was thinking of @VisibileForTesting in this case as documenting this method is only public to allow access in tests (in a different package). I know it is more typically used on protected or package-private methods.
I'm curious if there is any feedback about this being public in general? I'm accessing it across packages in a couple of tests (in the s3guard and select packages).
I can reinforce that it is only visible for tests by mentioning that explicitly in the javadoc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See updated javadoc to mention only public to allow access in tests in other packages.
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AFileSystem.java
Show resolved
Hide resolved
|
|
||
| @Override | ||
| public boolean equals(Object o) { | ||
| return super.equals(o); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If not changing the behavior, no need to override equals and hashCode methods here - only valid reason might be to raise attention to the fact that the eTag and versionId are ignored by them? in that case, should add a comment to explain why.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Originally I didn't override this. FindBugs flagged it as an issue, which prompted me to add it. The base LocatedFileStatus equality is only based on Path and this implementation doesn't need to be different.
It looks like some would argue FindBugs shouldn't flag this:
https://sourceforge.net/p/findbugs/bugs/1379/
I'll just add a comment to explain why I'm implementing and why it's ok to not to include ETag and version ID.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sounds good
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a non-javadoc comment to explain this.
| private final Mode mode; | ||
| private final boolean requireVersion; | ||
|
|
||
| public abstract String getRevisionId(S3ObjectAttributes s3Attributes); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add javadoc for these 2 new methods
|
🎊 +1 overall
This message was automatically generated. |
| S3Guard metadata store (e.g. DynamoDB). On opening a file, S3AFileSystem | ||
| will look in S3 for the version of the file indicated by the ETag or object | ||
| version ID stored in the metadata store. If that version is unavailable, | ||
| `RemoteFileChangedException` is thrown. Whether ETag or version ID and |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
whitespace:end of line
|
💔 -1 overall
This message was automatically generated. |
|
Really appreciate the review @noslowerdna ! I think I have addressed all of your comments now and I don't have any remaining TODOs on my own list. @steveloughran and @bgaborg I know this is kind of lengthy but if you can get a chance to review I will be really grateful! I'll be happy to address any suggestions you have for improvement. I ran the full integration tests (minus -Ds3guard) yesterday and they passed with both fs.s3a.change.detection.source=etag and versionId. I don't think any changes I've made since would affect the result but I will run again anyway as well as with -Ds3guard. |
|
|
||
| The S3Guard metadata for a file can be corrected with the `s3guard import` | ||
| command as discussed above. The command can take a file URI instead of a | ||
| bucket URI to correct the metdata for a single file. For example: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typo, met -> meta
|
💔 -1 overall
This message was automatically generated. |
|
💔 -1 overall
This message was automatically generated. |
|
It looks like Yetus is confused. I merged in trunk and Github seems to indicate the PR would merge cleanly. @steveloughran @bgaborg are there any tricks to get Yetus to do another build? I guess I could push some meaningless commit (e.g. whitespace change) and then another commit to revert that... Is there anything else I can/should do? |
|
You can amend (e.g edit your last commit message) and force push to trigger yetus. I'll review your change shortly. |
|
💔 -1 overall
This message was automatically generated. |
|
I added another unit test so that should trigger another Yetus build. |
Thanks @bgaborg . I'm looking forward to your review and I'll keep that trick in mind in case it happens again. |
I failed to notice that yetus had already failed again after my latest commit adding the test. I'll examine the logs more closely to see if I can determine how yetus is coming to the conclusion that my PR does not apply despite the fact that GitHub seems to be indicating it is ok. |
|
I can't articulate the exact cause, but I was able to figure out a bit more about how Yetus works and was able to reproduce the patch application failure. GitHub allows a PR to be downloaded as a patch. Yetus uses that here. This PR can be seen by suffixing the PR URL with ".patch": I downloaded that and when I try to apply it to trunk it fails. I don't know quite why exactly but since it started after I merged trunk back into the branch I'm guessing I will probably need to do a rebase on trunk instead of merging in trunk. At this point I've opened a new PR (#675) with this same content merge-squashed on trunk. Yetus should build it cleanly. I'm closing this PR out, leaving it as-is to leave the review context unchanged for any reviewer that might like to refer back to previous discussions here. |
…ke shared context changes easier This replaces apache/samza#638, I accidentally messed up that branch. The difference between this PR and the last review by prateekm is apache/samza@5d55299 Author: Cameron Lee <[email protected]> Reviewers: Prateek Maheshwari <[email protected]> Closes apache#646 from cameronlee314/refactor_unit_tests_for_shared_context_new
This started with HADOOP-16085-003.patch from the JIRA.
I'm switching over to a PR instead of using patch files attached to the JIRA. I expect that will make review easier.
I've addressed a few things since that patch:
TestPathMetadataDynamoDBTranslation, TestDirListingMetadata
I haven't actually run all the tests again since these changes. Also, I think there might be a couple more tests to add or alter. For example, I don't have an explicit integration test yet to read a file that has no ETag or versionId in S3Guard.
I'll make another pass through but figured it is worthwhile to post the progress.