-
Notifications
You must be signed in to change notification settings - Fork 9.2k
HADOOP-16190. S3A copyFile operation to include source versionID or etag in the copy request #606
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HADOOP-16190. S3A copyFile operation to include source versionID or etag in the copy request #606
Conversation
|
Initial PR: untested |
|
💔 -1 overall
This message was automatically generated. |
|
no, no new tests. Could add one to check exception translation though |
…tag in the copy request This patch adds the constraints on the request, and maps a 412 response to a RemoteFileChangedException. No obvious test for this. The way to do it would be to get an invalid etag/version in to the request and see what happens, which would complicate the copy API a bit -but is something we will need for etag/version tracking in s3guard anyway.... Change-Id: I4b229336ba2d57018bd8b66888b807074419598e
b19d3fe to
8ca29e7
Compare
|
💔 -1 overall
This message was automatically generated. |
|
full test run against an unversioned bucket with s3guard, dynamo and scale options (i.e..uses etags) All tests worked except for rerunning against a versioned bucket to make sure this codepath is good too.
|
…ry rule. Includes a PNG file of the form to fill in Change-Id: I7cce8d954da08a9f9e39701e6abaf6b99bdf6cea
FWIW, I'll set up different buckets with different policies, and have my dev hadoop-trunk source tree running unversioned, but the separate branch I use for committing work set up to test against versioned. At least once we've got the version-aware-delete-fake-dirs-code in |
|
|
||
| AWS S3 and some third party stores support versioned buckets. | ||
|
|
||
| Hadoop is adding awareness of this, including |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
whitespace:end of line
|
💔 -1 overall
This message was automatically generated. |
| * @throws AmazonClientException on failures inside the AWS SDK | ||
| * @throws InterruptedIOException the operation was interrupted | ||
| * @throws IOException Other IO problems | ||
| */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add @return javadoc
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done; will review rest of comment.
…the exception, add a test for that and an entry in the troubleshooting_s3a doc Change-Id: I71a13024a7efe32b56ecfe49fbe79529627ab126
|
I've just updated the patch to address the PR and some other changes it needed: a test and a doc entry. Also my first attempt at using vs.code for an in-IDE review tool. Mixed opinions there; will need to explore it more —but worth trying out |
Change-Id: I46b3fc65711856b390e1a4d3b4d860113061eac8
|
|
||
| /** | ||
| * Error message used when mapping a 412 to this exception. | ||
| */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
whitespace:end of line
|
|
||
| AWS S3 and some third party stores support versioned buckets. | ||
|
|
||
| Hadoop is adding awareness of this, including |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
whitespace:end of line
|
💔 -1 overall
This message was automatically generated. |
| copyObjectRequest.setCannedAccessControlList(cannedACL); | ||
| copyObjectRequest.setNewObjectMetadata(dstom); | ||
| String id = srcom.getVersionId(); | ||
| if (id != null) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The way I have been approaching this in my work for HADOOP-16085 is to use ChangeDetectionPolicy and ChangeTracker. I was adding new ChangeTracker.maybeApplyConstraint(CopyObjectRequest) and ChangeDetectionPolicy.applyRevisionConstraint(CopyObjectRequest, revisionId) methods to support that.
I call ChangeTracker.processMetadata(srcom) and then ChangeTracker.maybeApplyConstraint(copyObjectRequest). Resulting behavior depends on ChangeDetectionPolicy, but would generally be similar to what you are doing here. A couple of differences:
- if change.detection.mode=none, my code doesn't add the constraint
- if change.detection.version.required=true, my code will throw NoVersionAttributeException when the attribute defined by change.detection.source is unavailable
What do you think about that?
Side note: my branch doesn't currently handle exceptions on copy (the 412 error condition) properly. This PR helped remind me of that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Something else is I wonder about involving ChangeTracker in exception handling such that it can see this condition and increment the version mismatch counter.
| } | ||
|
|
||
| @Test | ||
| public void test413isPreconditions() throws Exception { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@noslowerdna noted this somewhere else, but it looks like 413 should be 412 here.
|
I like the documentation updates. Thanks for adding those :) |
| enabled. | ||
|
|
||
| A full `hadoop-aws` test run implicitly cleans up all files in the bucket | ||
| in `ITestS3AContractRootDir`, so all every test run creates a large set of |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
all every -> every
| A full `hadoop-aws` test run implicitly cleans up all files in the bucket | ||
| in `ITestS3AContractRootDir`, so all every test run creates a large set of | ||
| old (deleted) file versions. To avoid large bills, you must | ||
| create a lifecycle rule on the bucket to purge the old versions. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nevermind, given the specific steps listed below
|
|
||
| /** | ||
| * Error message used when mapping a 412 to this exception. | ||
| */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
whitespace:end of line
|
💔 -1 overall
This message was automatically generated. |
…ream read version mismatch counter Change-Id: I0e3e5ac94592495d078202a0881aef4d89236e81
|
S3 ireland, s3guard+ ddb + auth. Apart from the DynamoDB per-request billing test failures covered in #647, all good |
|
|
||
| /** | ||
| * Error message used when mapping a 412 to this exception. | ||
| */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
whitespace:end of line
|
💔 -1 overall
This message was automatically generated. |
|
Hey @steveloughran - what would you like to do with this PR? As I mentioned above, I did things a little differently in #646 . Would you like this to go in as-is and get replaced by strategy when #646 is merged? Or should we pull the bit of #646 relevant to this out into this PR? Or, do you have any reservations about how this is currently implemented in #646? |
| break; | ||
|
|
||
| // version/etag id cannot be met in copy. | ||
| case 412: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
public final static int PRECONDITION_FAILED = 412;|
#646 is the big one, which does a lot more than this, This one just hardens copy even without s3guard, when you have a large enough file that the transfer manager decides to do bits in parallel. I think this one is probably the "straightforward to backport version" which we can pull back into 3.x line without risk. The #646 patch is something big enough it's going to have to go into trunk, and we'll worry about whether to put it into 3.2.x when and only when we are all happy with it working in our day-to-day experience, tests etc. Now, if this one went in, and I pulled it back to the older versions, would this change be a problem for you, even if your patch just completely stamps on it? Because once the big patch is in, I'm not going to worry about this one -except in those older releases. |
|
Given you don't have a problem with what I'm doing in #646 going over top of it later, I have no problem with this going in as-is or being patched into older versions. I will also note though that this seems somewhat inconsistent with changes already in S3AInputStream from HADOOP-15625. There we only apply constraints indicated by the fs.s3a.change.detection configuration whereas here you apply constraints regardless of that config. You probably already realize this, but just wanted to be sure. Still, I'm not bothered enough by the inconsistency to say you shouldn't go ahead with it if you like. |
|
bq. . There we only apply constraints indicated by the fs.s3a.change.detection configuration whereas here you apply constraints regardless of that config. You probably already realize this, but just wanted to be sure. Still, I'm not bothered enough by the inconsistency to say you shouldn't go ahead with it if you like. I couldn't see way of turning not making this mandatory, and at the same time, why anyone wouldn't want a single copy to be of a single file, not a mixture of blocks. If I'd known that this problem existed and could be stopped, I'd had have this patch in a long time ago. I'll probably backport all the way to hadoop 2.8+ for that reason -which is not something I was planning for the input stream code. |
|
Yep, no arguments here. |
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AFileSystem.java
Show resolved
Hide resolved
|
I don't see any integration tests added to test this feature, just a unit test for the Invoker. What is the reason for this? |
| /** | ||
| * Error message used when mapping a 412 to this exception. | ||
| */ | ||
| public static final String PRECONDITIONS_NOT_MET = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The Http Response code is "Preconditions Failed" so we probably want to use the same string and keep the name.
| Check the URI. If using a third-party store, verify that you've configured | ||
| the client to talk to the specific server in `fs.s3a.endpoint`. | ||
|
|
||
| #### <a name="preconditions_unmet"></a> `RemoteFileChangedException`: `Constraints of request were unsatisfiable` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The http response is Precondition Failed, not "unmet". Even in AWS code, so I don't know from where unmet keeps popping up.
|
|
||
| // out of range. This may happen if an object is overwritten with | ||
| // a shorter one while it is being read. | ||
| case 416: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
RANGE_NOT_SATISFIABLE
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(argh, hitting cmd-enter turns this into a review and not a comment like it used to)
|
@ehiggs thanks for the comments, will get back to them |
|
closing as a WONTFIX; HADOOP-16085 shows up some issues in the AWS SDK which make this a trickier backport than you'd think |
…h- and low-level APIs in YARN and standalone environment This is the initial PR for SEP-13. High-lighted changes: - Define StreamApplication and TaskApplication with describe(ApplicationDescriptor) API to define processing logic of a Stream application - the objects instantiated and registered to ApplicationDescriptor in describe() method should be serializable - Define ApplicationRunner to have mandatory constructor parameter of ApplicationDescriptor - Define ProcessorLifecycleListenerFactory to allow user inject local logic and instantiate local objects in the processors in an application Author: Yi Pan (Data Infrastructure) <[email protected]> Author: Yi Pan (Data Infrastructure) <[email protected]> Author: Yi Pan (Data Infrastructure) <[email protected]> Author: Prateek Maheshwari <[email protected]> Author: Prateek Maheshwari <[email protected]> Author: prateekm <[email protected]> Reviewers: Prateek Maheshwari <[email protected]>, Cameron Lee <[email protected]> Closes apache#606 from nickpan47/app-runtime-with-processor-callbacks and squashes the following commits: 3e60d44a [Yi Pan (Data Infrastructure)] SAMZA-1789: final revision on ApplicationDescriptor and ApplicationRunner APIs bdb5b0fc [Yi Pan (Data Infrastructure)] SAMZA-1789: ApplicationRunner and ApplicationDescriptor final revision 66af5b70 [Yi Pan (Data Infrastructure)] SAMZA-1789: addressing Cameron's review comments. ec4bb1dc [Yi Pan (Data Infrastructure)] SAMZA-1789: merge with fix for SAMZA-1836 9c89c63d [Yi Pan (Data Infrastructure)] Merge branch 'master' into app-runtime-with-processor-callbacks 91fcd73a [Yi Pan (Data Infrastructure)] Merge branch 'master' into app-runtime-with-processor-callbacks 34ffda8a [Yi Pan (Data Infrastructure)] SAMZA-1789: disabling tests due to SAMZA-1836 02076c85 [Yi Pan (Data Infrastructure)] SAMZA-1789: fixed the modifier for the mandatory constructor of ApplicationRunner; Disabled three tests due to wrong configure for test systems 222abf21 [Yi Pan (Data Infrastructure)] SAMZA-1789: added a constructor to StreamProcessor to take a StreamProcessorListenerFactory 7a73992a [Yi Pan (Data Infrastructure)] SAMZA-1789: fixing checkstyle and javadoc errors 9997b98b [Yi Pan (Data Infrastructure)] SAMZA-1789: renamed all ApplicationDescriptor classes with full-spelling of Application f4b3d43a [Yi Pan (Data Infrastructure)] SAMZA-1789: Fxing TaskApplication examples and some checkstyle errors f2969f8d [Yi Pan (Data Infrastructure)] SAMZA-1789: fixed ApplicationDescriptor to use InputDescriptor and OutputDescriptor; addressed Prateek's comments. f04404cc [Yi Pan (Data Infrastructure)] SAMZA-1789: move createStreams out of the loop in prepareJobs 33753f72 [Yi Pan (Data Infrastructure)] Merge branch 'master' into app-runtime-with-processor-callbacks 12c09af0 [Yi Pan (Data Infrastructure)] SAMZA-1789: Fix a merging error (with SAMZA-1813) a072118d [Yi Pan (Data Infrastructure)] Merge branch 'master' into app-runtime-with-processor-callbacks e7af6932 [Yi Pan (Data Infrastructure)] Merge branch 'master' into app-runtime-with-processor-callbacks 8d4d3ffd [Yi Pan (Data Infrastructure)] Merge with master 055bd91e [Yi Pan (Data Infrastructure)] SAMZA-1789: fix unit test with ThreadJobFactory 247dcff4 [Yi Pan (Data Infrastructure)] Merge branch 'master' into app-runtime-with-processor-callbacks 1621c4d0 [Yi Pan (Data Infrastructure)] SAMZA-1789: a few more fixes to address Cameron's reviews 6e446fe6 [Yi Pan (Data Infrastructure)] SAMZA-1789: address Cameron's review comments. 4382d45d [Yi Pan (Data Infrastructure)] Merge branch 'master' into app-runtime-with-processor-callbacks 3b2f04d5 [Yi Pan (Data Infrastructure)] SAMZA-1789: moved all impl classes from samza-api to samza-core. db96da83 [Yi Pan (Data Infrastructure)] SAMZA-1789: WIP - revision to address review feedbacks. 01433717 [Yi Pan (Data Infrastructure)] Merge branch 'master' into app-runtime-with-processor-callbacks a82708bb [Yi Pan (Data Infrastructure)] SAMZA-1789: unify ApplicationDescriptor and ApplicationRunner for high- and low-level APIs in YARN and standalone environment c4bb0dce [Yi Pan (Data Infrastructure)] Merge branch 'master' into app-runtime-with-processor-callbacks f20cdcda [Yi Pan (Data Infrastructure)] WIP: adding unit tests. Pending update on StreamProcessorLifecycleListener, LocalContainerRunner, and SamzaContainerListener 973eb526 [Yi Pan (Data Infrastructure)] WIP: compiles, still working on LocalContainerRunner refactor fb1bc49e [Yi Pan (Data Infrastructure)] Merge branch 'master' into app-spec-with-app-runtime-Jul-16-18 30a4e5f0 [Yi Pan (Data Infrastructure)] WIP: application runner refactor - proto-type for SEP-13 95577b74 [Yi Pan (Data Infrastructure)] WIP: trying to figure out the two interface classes for spec: a) spec builder in init(); b) spec reader in all other lifecycle methods 42782d81 [Yi Pan (Data Infrastructure)] Merge branch 'prateek-remove-app-runner-stream-spec' into app-spec-with-app-runtime-Jul-16-18 d43e9231 [Yi Pan (Data Infrastructure)] WIP: proto-type with ApplicationRunnable and no ApplicationRunner exposed to user f1cb8f0e [Yi Pan (Data Infrastructure)] Merge branch 'master' into single-app-api-May-21-18 7e71dc7e [Yi Pan (Data Infrastructure)] Merge with master 85619301 [Prateek Maheshwari] Merge branch 'master' into stream-spec-cleanup 7d7aa508 [Prateek Maheshwari] Updated with Cameron and Daniel's feedback. 8e6fc2da [prateekm] Remove all usages of StreamSpec and ApplicationRunner from the operator spec and impl layers.
HADOOP-16190. S3A copyFile operation to include source versionID or etag in the copy request
This patch adds the constraints on the request, and maps a 412 response to a RemoteFileChangedException.
No obvious test for this. The way to do it would be to get an invalid etag/version in to the request and see what happens, which would complicate the copy API a bit -but is something we will need for etag/version tracking in s3guard anyway....
Change-Id: I4b229336ba2d57018bd8b66888b807074419598e