Skip to content

Conversation

@rbalamohan
Copy link

Changes:
This is subtask of HADOOP-16604 which aims to provide copy functionality for cloud native applications. Intent of this PR is to provide copyFile(URI src, URI dst) functionality for S3AFileSystem (HADOOP-16629).

Testing was done in region=us-west-2 on my local laptop.

[ERROR] Tests run: 1090, Failures: 5, Errors: 22, Skipped: 318

I observed good number of tests timing out and few of them throwing NPE. e.g

[ERROR] testPrefixVsDirectory(org.apache.hadoop.fs.s3a.ITestAuthoritativePath)  Time elapsed: 13.164 s  <<< ERROR!
java.lang.NullPointerException
	at org.apache.hadoop.fs.s3a.ITestAuthoritativePath.teardown(ITestAuthoritativePath.java:94)

I will check if i can run few more runs to reduce the error count.

@hadoop-yetus
Copy link

💔 -1 overall

Vote Subsystem Runtime Comment
0 reexec 51 Docker mode activated.
_ Prechecks _
+1 dupname 0 No case conflicting files found.
+1 @author 0 The patch does not contain any @author tags.
+1 test4tests 0 The patch appears to include 5 new or modified test files.
_ trunk Compile Tests _
0 mvndep 69 Maven dependency ordering for branch
+1 mvninstall 1415 trunk passed
-1 compile 236 root in trunk failed.
+1 checkstyle 158 trunk passed
+1 mvnsite 115 trunk passed
+1 shadedclient 1120 branch has no errors when building and testing our client artifacts.
+1 javadoc 119 trunk passed
0 spotbugs 68 Used deprecated FindBugs config; considering switching to SpotBugs.
+1 findbugs 201 trunk passed
_ Patch Compile Tests _
0 mvndep 27 Maven dependency ordering for patch
+1 mvninstall 91 the patch passed
-1 compile 205 root in the patch failed.
-1 javac 205 root in the patch failed.
-0 checkstyle 172 root: The patch generated 22 new + 106 unchanged - 0 fixed = 128 total (was 106)
+1 mvnsite 112 the patch passed
+1 whitespace 0 The patch has no whitespace issues.
+1 xml 1 The patch has no ill-formed XML file.
+1 shadedclient 813 patch has no errors when building and testing our client artifacts.
+1 javadoc 106 the patch passed
+1 findbugs 179 the patch passed
_ Other Tests _
-1 unit 512 hadoop-common in the patch failed.
+1 unit 76 hadoop-aws in the patch passed.
+1 asflicense 34 The patch does not generate ASF License warnings.
5816
Reason Tests
Failed junit tests hadoop.fs.TestHarFileSystem
hadoop.fs.TestFilterFileSystem
Subsystem Report/Notes
Docker Client=19.03.1 Server=19.03.1 base: https://builds.apache.org/job/hadoop-multibranch/job/PR-1591/1/artifact/out/Dockerfile
GITHUB PR #1591
JIRA Issue HADOOP-16629
Optional Tests dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle xml
uname Linux 31f98978aff9 4.15.0-58-generic #64-Ubuntu SMP Tue Aug 6 11:12:41 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality personality/hadoop.sh
git revision trunk / b23bdaf
Default Java 1.8.0_222
compile https://builds.apache.org/job/hadoop-multibranch/job/PR-1591/1/artifact/out/branch-compile-root.txt
compile https://builds.apache.org/job/hadoop-multibranch/job/PR-1591/1/artifact/out/patch-compile-root.txt
javac https://builds.apache.org/job/hadoop-multibranch/job/PR-1591/1/artifact/out/patch-compile-root.txt
checkstyle https://builds.apache.org/job/hadoop-multibranch/job/PR-1591/1/artifact/out/diff-checkstyle-root.txt
unit https://builds.apache.org/job/hadoop-multibranch/job/PR-1591/1/artifact/out/patch-unit-hadoop-common-project_hadoop-common.txt
Test Results https://builds.apache.org/job/hadoop-multibranch/job/PR-1591/1/testReport/
Max. process+thread count 1514 (vs. ulimit of 5500)
modules C: hadoop-common-project/hadoop-common hadoop-tools/hadoop-aws U: .
Console output https://builds.apache.org/job/hadoop-multibranch/job/PR-1591/1/console
versions git=2.7.4 maven=3.3.9 findbugs=3.1.0-RC1
Powered by Apache Yetus 0.10.0 http://yetus.apache.org

This message was automatically generated.

szetszwo and others added 26 commits October 4, 2019 17:50
Thanks @jnp  for reviewing this.  Merging now.
Contributed by Steve Loughran.

Replaces the committer-specific terasort and MR test jobs with parameterization
of the (now single tests) and use of file:// over hdfs:// as the cluster FS.

The parameterization ensures that only one of the specific committer tests
run at a time -overloads of the test machines are less likely, and so the
suites can be pulled back into the parallel phase.

There's also more detailed validation of the stage outputs of the terasorting;
if one test fails the rest are all skipped. This and the fact that job
output is stored under target/yarn-${timestamp} means failures should
be more debuggable.

Change-Id: Iefa370ba73c6419496e6e69dd6673d00f37ff095
Contributed by Steve Loughran.

This addresses two scale issues which has surfaced in large scale benchmarks
of the S3A Committers.

* Thread pools are not cleaned up.
  This now happens, with tests.

* OOM on job commit for jobs with many thousands of tasks,
  each generating tens of (very large) files.

Instead of loading all pending commits into memory as a single list, the list
of files to load is the sole list which is passed around; .pendingset files are
loaded and processed in isolation -and reloaded if necessary for any
abort/rollback operation.

The parallel commit/abort/revert operations now work at the .pendingset level,
rather than that of individual pending commit files. The existing parallelized
Tasks API is still used to commit those files, but with a null thread pool, so
as to serialize the operations.

Change-Id: I5c8240cd31800eaa83d112358770ca0eb2bca797
… Contributed by Prabhu Joseph"

This reverts commit 4510970.
Contributed by Steve Loughran.

Change-Id: Ife730b80057ddd43e919438cb5b2abbda990e636
Contributed by Bilahari T H.

This also addresses HADOOP-16498: AzureADAuthenticator cannot authenticate
in China.

Change-Id: I2441dd48b50b59b912b0242f7f5a4418cf94a87c
@hadoop-yetus
Copy link

💔 -1 overall

Vote Subsystem Runtime Comment
0 reexec 0 Docker mode activated.
-1 patch 10 #1591 does not apply to trunk. Rebase required? Wrong Branch? See https://wiki.apache.org/hadoop/HowToContribute for help.
Subsystem Report/Notes
GITHUB PR #1591
JIRA Issue HADOOP-16629
Console output https://builds.apache.org/job/hadoop-multibranch/job/PR-1591/3/console
versions git=2.17.1
Powered by Apache Yetus 0.10.0 http://yetus.apache.org

This message was automatically generated.

…y if build passes. Will remove HADOOP-14900 later from this patch"

This reverts commit b149725.
@hadoop-yetus
Copy link

💔 -1 overall

Vote Subsystem Runtime Comment
0 reexec 46 Docker mode activated.
_ Prechecks _
+1 dupname 0 No case conflicting files found.
+1 @author 0 The patch does not contain any @author tags.
+1 test4tests 0 The patch appears to include 5 new or modified test files.
_ trunk Compile Tests _
0 mvndep 76 Maven dependency ordering for branch
+1 mvninstall 1138 trunk passed
+1 compile 1081 trunk passed
+1 checkstyle 201 trunk passed
+1 mvnsite 147 trunk passed
+1 shadedclient 1253 branch has no errors when building and testing our client artifacts.
+1 javadoc 132 trunk passed
0 spotbugs 79 Used deprecated FindBugs config; considering switching to SpotBugs.
+1 findbugs 220 trunk passed
-0 patch 120 Used diff version of patch file. Binary files and potentially other changes not applied. Please rebase and squash commits if necessary.
_ Patch Compile Tests _
0 mvndep 28 Maven dependency ordering for patch
+1 mvninstall 96 the patch passed
+1 compile 1188 the patch passed
+1 javac 1188 the patch passed
-0 checkstyle 194 root: The patch generated 22 new + 106 unchanged - 0 fixed = 128 total (was 106)
+1 mvnsite 141 the patch passed
+1 whitespace 0 The patch has no whitespace issues.
+1 xml 2 The patch has no ill-formed XML file.
+1 shadedclient 902 patch has no errors when building and testing our client artifacts.
+1 javadoc 138 the patch passed
+1 findbugs 237 the patch passed
_ Other Tests _
-1 unit 545 hadoop-common in the patch failed.
+1 unit 95 hadoop-aws in the patch passed.
+1 asflicense 58 The patch does not generate ASF License warnings.
7875
Reason Tests
Failed junit tests hadoop.fs.TestFilterFileSystem
hadoop.fs.TestHarFileSystem
Subsystem Report/Notes
Docker Client=19.03.3 Server=19.03.3 base: https://builds.apache.org/job/hadoop-multibranch/job/PR-1591/4/artifact/out/Dockerfile
GITHUB PR #1591
JIRA Issue HADOOP-16629
Optional Tests dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle xml
uname Linux b4bb0bd0bf70 4.15.0-58-generic #64-Ubuntu SMP Tue Aug 6 11:12:41 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality personality/hadoop.sh
git revision trunk / 9c72bf4
Default Java 1.8.0_222
checkstyle https://builds.apache.org/job/hadoop-multibranch/job/PR-1591/4/artifact/out/diff-checkstyle-root.txt
unit https://builds.apache.org/job/hadoop-multibranch/job/PR-1591/4/artifact/out/patch-unit-hadoop-common-project_hadoop-common.txt
Test Results https://builds.apache.org/job/hadoop-multibranch/job/PR-1591/4/testReport/
Max. process+thread count 1530 (vs. ulimit of 5500)
modules C: hadoop-common-project/hadoop-common hadoop-tools/hadoop-aws U: .
Console output https://builds.apache.org/job/hadoop-multibranch/job/PR-1591/4/console
versions git=2.7.4 maven=3.3.9 findbugs=3.1.0-RC1
Powered by Apache Yetus 0.10.0 http://yetus.apache.org

This message was automatically generated.

xiaoyuyao and others added 2 commits October 10, 2019 22:33
…pache#1576). Contributed by Gabor Bota.

Fixes HADOOP-16349. DynamoDBMetadataStore.getVersionMarkerItem() to log at info/warn on retry

Change-Id: Ia83e92b9039ccb780090c99c41b4f71ef7539d35
@steveloughran
Copy link
Contributor

Thinking a bit about what a followup patch for cross-store copy would be; I think it'd be how I I think the Multipart Upload API needs to go. There'd be an abstract copier class you'd get an instance of from the dest fs to make 1+ copy under a dest path from a given source

CopyierBuilder InitiateCopy(Path destination, FileSystem sourceFS, Path source)

which you then set ops on to build up the copy

CopyOperationBuilder builder = copier.copy()
  setSource(sourceStatus) // or a path
  setDest(destPath)
  must("fs.option.overwrite", true)

where you could set up things like overwrite, FS permissions, ..

And then kick off the copy

CompletableFuture<CopyOutcome> outcome = builder.build()

and await that future. If you are doing many copies, you'd put them in a set of futures and await them all to complete, in whatever order the store chooses. So you don't have to guess what is the optimal order (though a bit of randomisation is always handy)

Like I said: a followup.

What's interesting with that is you could implement a default one which does exec client side in a thread pool. Slower than a rename, but viable

@steveloughran
Copy link
Contributor

FYI @bgaborg @ehiggs

@bgaborg bgaborg self-requested a review October 11, 2019 13:06
Shweta Yakkali and others added 14 commits October 11, 2019 10:23
…uthenticator on Windows. Contributed by Kitti Nanasi.
… INFO instead of ERROR. Contributed by Shen Yinjie.
…d in master due to some native issues unrelated to this patch. Made minor edit to trigger build.)
…y if build passes. Will remove HADOOP-14900 later from this patch"

This reverts commit b149725.
…to verify if build passes. Will remove HADOOP-14900 later from this patch""

This reverts commit 9094415.
…it failed in master due to some native issues unrelated to this patch. Made minor edit to trigger build.)"

This reverts commit f14fd0a.
@hadoop-yetus
Copy link

💔 -1 overall

Vote Subsystem Runtime Comment
0 reexec 0 Docker mode activated.
-1 patch 12 #1591 does not apply to trunk. Rebase required? Wrong Branch? See https://wiki.apache.org/hadoop/HowToContribute for help.
Subsystem Report/Notes
GITHUB PR #1591
JIRA Issue HADOOP-16629
Console output https://builds.apache.org/job/hadoop-multibranch/job/PR-1591/5/console
versions git=2.17.1
Powered by Apache Yetus 0.10.0 http://yetus.apache.org

This message was automatically generated.

@rbalamohan
Copy link
Author

Please ignore the last wrong commit.

@rbalamohan
Copy link
Author

Sorry about the merge mess up.

I have created PR: #1655 for this.

@bgaborg
Copy link

bgaborg commented Oct 17, 2019

can we close this PR?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.