-
Notifications
You must be signed in to change notification settings - Fork 9.2k
HADOOP-16629: support copyFile in s3afilesystem #1591
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
💔 -1 overall
This message was automatically generated. |
Thanks @jnp for reviewing this. Merging now.
This reverts commit 4cf0b36.
Doroszlai, Attila <[email protected]>
…n the same host Closes apache#1551
…mpose .env files. Closes apache#1570.
Contributed by Steve Loughran.
Replaces the committer-specific terasort and MR test jobs with parameterization
of the (now single tests) and use of file:// over hdfs:// as the cluster FS.
The parameterization ensures that only one of the specific committer tests
run at a time -overloads of the test machines are less likely, and so the
suites can be pulled back into the parallel phase.
There's also more detailed validation of the stage outputs of the terasorting;
if one test fails the rest are all skipped. This and the fact that job
output is stored under target/yarn-${timestamp} means failures should
be more debuggable.
Change-Id: Iefa370ba73c6419496e6e69dd6673d00f37ff095
…uted by Prabhu Joseph
Contributed by Siddharth Wagle
Contributed by Steve Loughran. This addresses two scale issues which has surfaced in large scale benchmarks of the S3A Committers. * Thread pools are not cleaned up. This now happens, with tests. * OOM on job commit for jobs with many thousands of tasks, each generating tens of (very large) files. Instead of loading all pending commits into memory as a single list, the list of files to load is the sole list which is passed around; .pendingset files are loaded and processed in isolation -and reloaded if necessary for any abort/rollback operation. The parallel commit/abort/revert operations now work at the .pendingset level, rather than that of individual pending commit files. The existing parallelized Tasks API is still used to commit those files, but with a null thread pool, so as to serialize the operations. Change-Id: I5c8240cd31800eaa83d112358770ca0eb2bca797
…). Contributed by Norbert Kalmár.
… Contributed by Prabhu Joseph" This reverts commit 4510970.
Contributed by Steve Loughran. Change-Id: Ife730b80057ddd43e919438cb5b2abbda990e636
…pache#1531). Contributed by Norbert Kalmár." This reverts commit 10bdc59.
…tis. Contributed by Tsz-wo Sze(apache#1517).
…ion in assert statements Signed-off-by: Akira Ajisaka <[email protected]>
Contributed by Bilahari T H. This also addresses HADOOP-16498: AzureADAuthenticator cannot authenticate in China. Change-Id: I2441dd48b50b59b912b0242f7f5a4418cf94a87c
|
💔 -1 overall
This message was automatically generated. |
…y if build passes. Will remove HADOOP-14900 later from this patch" This reverts commit b149725.
|
💔 -1 overall
This message was automatically generated. |
…Contributed by Xiaoyu Yao. (apache#1642)
…pache#1576). Contributed by Gabor Bota. Fixes HADOOP-16349. DynamoDBMetadataStore.getVersionMarkerItem() to log at info/warn on retry Change-Id: Ia83e92b9039ccb780090c99c41b4f71ef7539d35
|
Thinking a bit about what a followup patch for cross-store copy would be; I think it'd be how I I think the Multipart Upload API needs to go. There'd be an abstract copier class you'd get an instance of from the dest fs to make 1+ copy under a dest path from a given source which you then set ops on to build up the copy where you could set up things like overwrite, FS permissions, .. And then kick off the copy and await that future. If you are doing many copies, you'd put them in a set of futures and await them all to complete, in whatever order the store chooses. So you don't have to guess what is the optimal order (though a bit of randomisation is always handy) Like I said: a followup. What's interesting with that is you could implement a default one which does exec client side in a thread pool. Slower than a rename, but viable |
…. Contributed by Adam Antal
…with multiple resource types. Contributed by Adam Antal
…tAdditionalTokenIssuers (apache#1556)
…uthenticator on Windows. Contributed by Kitti Nanasi.
… INFO instead of ERROR. Contributed by Shen Yinjie.
…d in master due to some native issues unrelated to this patch. Made minor edit to trigger build.)
…y if build passes. Will remove HADOOP-14900 later from this patch" This reverts commit b149725.
…to verify if build passes. Will remove HADOOP-14900 later from this patch"" This reverts commit 9094415.
…it failed in master due to some native issues unrelated to this patch. Made minor edit to trigger build.)" This reverts commit f14fd0a.
This reverts commit da36147.
This reverts commit 3954194.
|
💔 -1 overall
This message was automatically generated. |
|
Please ignore the last wrong commit. |
|
Sorry about the merge mess up. I have created PR: #1655 for this. |
|
can we close this PR? |
Changes:
This is subtask of HADOOP-16604 which aims to provide copy functionality for cloud native applications. Intent of this PR is to provide
copyFile(URI src, URI dst)functionality for S3AFileSystem (HADOOP-16629).Testing was done in
region=us-west-2on my local laptop.I observed good number of tests timing out and few of them throwing NPE. e.g
I will check if i can run few more runs to reduce the error count.