Skip to content

Conversation

@bgaborg
Copy link

@bgaborg bgaborg commented Mar 19, 2019

No description provided.

@hadoop-yetus
Copy link

🎊 +1 overall

Vote Subsystem Runtime Comment
0 reexec 26 Docker mode activated.
_ Prechecks _
+1 @author 0 The patch does not contain any @author tags.
+1 test4tests 0 The patch appears to include 1 new or modified test files.
_ trunk Compile Tests _
+1 mvninstall 983 trunk passed
+1 compile 34 trunk passed
+1 checkstyle 24 trunk passed
+1 mvnsite 34 trunk passed
+1 shadedclient 711 branch has no errors when building and testing our client artifacts.
+1 findbugs 45 trunk passed
+1 javadoc 25 trunk passed
_ Patch Compile Tests _
+1 mvninstall 30 the patch passed
+1 compile 29 the patch passed
+1 javac 29 the patch passed
+1 checkstyle 19 the patch passed
+1 mvnsite 33 the patch passed
+1 whitespace 0 The patch has no whitespace issues.
+1 shadedclient 741 patch has no errors when building and testing our client artifacts.
+1 findbugs 47 the patch passed
+1 javadoc 17 the patch passed
_ Other Tests _
+1 unit 279 hadoop-aws in the patch passed.
+1 asflicense 23 The patch does not generate ASF License warnings.
3162
Subsystem Report/Notes
Docker Client=17.05.0-ce Server=17.05.0-ce base: https://builds.apache.org/job/hadoop-multibranch/job/PR-624/1/artifact/out/Dockerfile
GITHUB PR #624
Optional Tests dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle
uname Linux f4304e1c482e 4.4.0-138-generic #164-Ubuntu SMP Tue Oct 2 17:16:02 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality personality/hadoop.sh
git revision trunk / c9e50c4
maven version: Apache Maven 3.3.9
Default Java 1.8.0_191
findbugs v3.1.0-RC1
Test Results https://builds.apache.org/job/hadoop-multibranch/job/PR-624/1/testReport/
Max. process+thread count 445 (vs. ulimit of 5500)
modules C: hadoop-tools/hadoop-aws U: hadoop-tools/hadoop-aws
Console output https://builds.apache.org/job/hadoop-multibranch/job/PR-624/1/console
Powered by Apache Yetus 0.9.0 http://yetus.apache.org

This message was automatically generated.

@steveloughran
Copy link
Contributor

This test is working for me standalone, but when run in a parallel set, its failing

[INFO] Tests run: 12, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 19.121 s - in org.apache.hadoop.fs.s3a.ITestS3AMiscOperations
[INFO] Running org.apache.hadoop.fs.s3a.s3guard.ITestS3GuardToolLocal
[ERROR] Tests run: 12, Failures: 2, Errors: 0, Skipped: 0, Time elapsed: 54.988 s <<< FAILURE! - in org.apache.hadoop.fs.s3a.ITestS3GuardOutOfBandOperations
[ERROR] testListingSameLengthOverwrite[0](org.apache.hadoop.fs.s3a.ITestS3GuardOutOfBandOperations)  Time elapsed: 2.906 s  <<< FAILURE!
java.lang.AssertionError: Authoritative is enabled, so metadata is not updated in ms, so mod_time won't match. Expecting different values for raw and guarded filestatus. Raw: S3AFileStatus{path=s3a://hwdev-steve-ireland-new/dir-5dc16452-d51a-4a34-8250-54e140797390/file-1-5dc16452-d51a-4a34-8250-54e140797390; isDirectory=false; length=8; replication=1; blocksize=33554432; modification_time=1553098862000; access_time=0; owner=stevel; group=stevel; permission=rw-rw-rw-; isSymlink=false; hasAcl=false; isEncrypted=false; isErasureCoded=false} isEmptyDirectory=FALSE Guarded: S3AFileStatus{path=s3a://hwdev-steve-ireland-new/dir-5dc16452-d51a-4a34-8250-54e140797390/file-1-5dc16452-d51a-4a34-8250-54e140797390; isDirectory=false; length=8; replication=1; blocksize=33554432; modification_time=1553098862000; access_time=0; owner=stevel; group=stevel; permission=rw-rw-rw-; isSymlink=false; hasAcl=false; isEncrypted=false; isErasureCoded=false} isEmptyDirectory=FALSE. Actual: 1553098862000
	at org.junit.Assert.fail(Assert.java:88)
	at org.junit.Assert.failEquals(Assert.java:185)
	at org.junit.Assert.assertNotEquals(Assert.java:199)
	at org.apache.hadoop.fs.s3a.ITestS3GuardOutOfBandOperations.overwriteFileInListing(ITestS3GuardOutOfBandOperations.java:349)
	at org.apache.hadoop.fs.s3a.ITestS3GuardOutOfBandOperations.testListingSameLengthOverwrite(ITestS3GuardOutOfBandOperations.java:214)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
	at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
	at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
	at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
	at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
	at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
	at org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:55)
	at org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:298)
	at org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:292)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.lang.Thread.run(Thread.java:745)

[ERROR] testListingLongerLengthOverwrite[0](org.apache.hadoop.fs.s3a.ITestS3GuardOutOfBandOperations)  Time elapsed: 2.897 s  <<< FAILURE!
java.lang.AssertionError: Authoritative is enabled, so metadata is not updated in ms, so mod_time won't match. Expecting different values for raw and guarded filestatus. Raw: S3AFileStatus{path=s3a://hwdev-steve-ireland-new/dir-d2e314e4-c20c-4228-9e4f-07f7f8a2a44b/file-1-d2e314e4-c20c-4228-9e4f-07f7f8a2a44b; isDirectory=false; length=15; replication=1; blocksize=33554432; modification_time=1553098875000; access_time=0; owner=stevel; group=stevel; permission=rw-rw-rw-; isSymlink=false; hasAcl=false; isEncrypted=false; isErasureCoded=false} isEmptyDirectory=FALSE Guarded: S3AFileStatus{path=s3a://hwdev-steve-ireland-new/dir-d2e314e4-c20c-4228-9e4f-07f7f8a2a44b/file-1-d2e314e4-c20c-4228-9e4f-07f7f8a2a44b; isDirectory=false; length=8; replication=1; blocksize=33554432; modification_time=1553098875000; access_time=0; owner=stevel; group=stevel; permission=rw-rw-rw-; isSymlink=false; hasAcl=false; isEncrypted=false; isErasureCoded=false} isEmptyDirectory=FALSE. Actual: 1553098875000
	at org.junit.Assert.fail(Assert.java:88)
	at org.junit.Assert.failEquals(Assert.java:185)
	at org.junit.Assert.assertNotEquals(Assert.java:199)
	at org.apache.hadoop.fs.s3a.ITestS3GuardOutOfBandOperations.overwriteFileInListing(ITestS3GuardOutOfBandOperations.java:349)
	at org.apache.hadoop.fs.s3a.ITestS3GuardOutOfBandOperations.testListingLongerLengthOverwrite(ITestS3GuardOutOfBandOperations.java:219)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
	at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
	at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
	at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
	at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
	at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
	at org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:55)
	at org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:298)
	at org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:292)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.lang.Thread.run(Thread.java:745)

Assumption: some FS sharing/caching is still going on

@steveloughran
Copy link
Contributor

and I've managed to do it in a single test run. There's some intermittent issue going on here. It could just be timestamp race conditions.

[INFO] 
[ERROR] Failures: 
[ERROR]   ITestS3GuardOutOfBandOperations.testListingLongerLengthOverwrite:236->overwriteFileInListing:366->Assert.assertNotEquals:199->Assert.failEquals:185->Assert.fail:88 Authoritative is enabled, so metadata is not updated in ms, so mod_time won't match. Expecting different values for raw and guarded filestatus. Raw: S3AFileStatus{path=s3a://hwdev-steve-ireland-new/dir-27967cf9-e744-4732-9d3a-3ae754684ae2/file-1-27967cf9-e744-4732-9d3a-3ae754684ae2; isDirectory=false; length=15; replication=1; blocksize=33554432; modification_time=1553108035000; access_time=0; owner=stevel; group=stevel; permission=rw-rw-rw-; isSymlink=false; hasAcl=false; isEncrypted=false; isErasureCoded=false} isEmptyDirectory=FALSE Guarded: S3AFileStatus{path=s3a://hwdev-steve-ireland-new/dir-27967cf9-e744-4732-9d3a-3ae754684ae2/file-1-27967cf9-e744-4732-9d3a-3ae754684ae2; isDirectory=false; length=8; replication=1; blocksize=33554432; modification_time=1553108035000; access_time=0; owner=stevel; group=stevel; permission=rw-rw-rw-; isSymlink=false; hasAcl=false; isEncrypted=false; isErasureCoded=false} isEmptyDirectory=FALSE. Actual: 1553108035000

@steveloughran
Copy link
Contributor

Given I've seen the failure in a standalone run, it's not FS caching somehow meaning raw has picked up a DDB table. Though I've added a way to query S3A FS instance for its auth status to make sure we know what's going on in inside

Possible causes then

  1. writes are so close together they share the same timestamp
  2. the rawFS.getFileStatus() call after a read is picking up the out of date value
  3. the check for modtime is a bit brittle, especially given for all writes using different length files, we can just check the file length.

See PR #630 for the patch addressing these issues

@bgaborg to get the changes I've done, look at the "command line instructions" at the bottom of the list, add my repo, do a git fetch and then cherry pick the commit 5aba1e4738b050fde1b2c45c21ed59320869be53 into this PR

* Add delays long enough for timestamps to be different
* Add delays for S3 to stabilize after writes/deletes, so that listings and HEAD calls will get the new value, not old ones
* probes for differences look for file lengths ahead of timestamps, for more tangible failures.
* and they validate the raw FS status acquired after the stabiliziation delay
* package private (currently) probe for S3A to verify that an FS instances considers its store to be authoritative.
Currently we've been checking the config, but to really know what's happening: lets query the internal state of FS.

Change-Id: Ib0184a2aacbec1e4b316cb8cad0265bd0b579bcd
(cherry picked from commit 5aba1e4738b050fde1b2c45c21ed59320869be53)
@bgaborg
Copy link
Author

bgaborg commented Mar 21, 2019

I'll do another check on your modifications, run some test with local and ddb, and get back with the results.

@bgaborg
Copy link
Author

bgaborg commented Mar 21, 2019

I never got any issues with running ITestS3GuardOutOfBandOperations (just with local and because of the reference on the cache), but running it with your commit gave me the following error:

[ERROR] ITestS3GuardOutOfBandOperations.testListingLongerLengthOverwrite:251->overwriteFileInListing:399->verifyFileStatusAsExpected:439->Assert.assertNotEquals:199->Assert.failEquals:185->Assert.fail:88 File length in authoritative table with
Raw: S3AFileStatus{path=s3a://cloudera-dev-gabor-ireland/dir-fcd509ca-0db4-426a-9c4f-b972a4949a5d/file-1-fcd509ca-0db4-426a-9c4f-b972a4949a5d; isDirectory=false; length=15; replication=1; blocksize=33554432; modification_time=1553182592000; access_time=0; owner=gaborbota; group=gaborbota; permission=rw-rw-rw-; isSymlink=false; hasAcl=false; isEncrypted=false; isErasureCoded=false} isEmptyDirectory=FALSE
Guarded: S3AFileStatus{path=s3a://cloudera-dev-gabor-ireland/dir-fcd509ca-0db4-426a-9c4f-b972a4949a5d/file-1-fcd509ca-0db4-426a-9c4f-b972a4949a5d; isDirectory=false; length=15; replication=1; blocksize=33554432; modification_time=1553182592000; access_time=0; owner=gaborbota; group=gaborbota; permission=rw-rw-rw-; isSymlink=false; hasAcl=false; isEncrypted=false; isErasureCoded=false} isEmptyDirectory=FALSE. Actual: 15

Maybe something still inconsistent?

@bgaborg
Copy link
Author

bgaborg commented Mar 21, 2019

The second run succeeded, no errors.

@hadoop-yetus
Copy link

🎊 +1 overall

Vote Subsystem Runtime Comment
0 reexec 28 Docker mode activated.
_ Prechecks _
+1 @author 0 The patch does not contain any @author tags.
+1 test4tests 0 The patch appears to include 2 new or modified test files.
_ trunk Compile Tests _
0 mvndep 58 Maven dependency ordering for branch
+1 mvninstall 1063 trunk passed
+1 compile 1074 trunk passed
+1 checkstyle 213 trunk passed
+1 mvnsite 142 trunk passed
+1 shadedclient 1064 branch has no errors when building and testing our client artifacts.
+1 findbugs 148 trunk passed
+1 javadoc 87 trunk passed
_ Patch Compile Tests _
0 mvndep 23 Maven dependency ordering for patch
+1 mvninstall 72 the patch passed
+1 compile 928 the patch passed
+1 javac 928 the patch passed
-0 checkstyle 188 root: The patch generated 3 new + 6 unchanged - 0 fixed = 9 total (was 6)
+1 mvnsite 115 the patch passed
+1 whitespace 0 The patch has no whitespace issues.
+1 shadedclient 643 patch has no errors when building and testing our client artifacts.
+1 findbugs 169 the patch passed
+1 javadoc 86 the patch passed
_ Other Tests _
+1 unit 555 hadoop-common in the patch passed.
+1 unit 273 hadoop-aws in the patch passed.
+1 asflicense 39 The patch does not generate ASF License warnings.
6837
Subsystem Report/Notes
Docker Client=17.05.0-ce Server=17.05.0-ce base: https://builds.apache.org/job/hadoop-multibranch/job/PR-624/2/artifact/out/Dockerfile
GITHUB PR #624
Optional Tests dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle
uname Linux cca3c80700fe 4.4.0-138-generic #164-Ubuntu SMP Tue Oct 2 17:16:02 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality personality/hadoop.sh
git revision trunk / 9f1c017
maven version: Apache Maven 3.3.9
Default Java 1.8.0_191
findbugs v3.1.0-RC1
checkstyle https://builds.apache.org/job/hadoop-multibranch/job/PR-624/2/artifact/out/diff-checkstyle-root.txt
Test Results https://builds.apache.org/job/hadoop-multibranch/job/PR-624/2/testReport/
Max. process+thread count 1358 (vs. ulimit of 5500)
modules C: hadoop-common-project/hadoop-common hadoop-tools/hadoop-aws U: .
Console output https://builds.apache.org/job/hadoop-multibranch/job/PR-624/2/console
Powered by Apache Yetus 0.9.0 http://yetus.apache.org

This message was automatically generated.

@steveloughran
Copy link
Contributor

I think there's still a race condition.

Can you use LambdaTestUtils.eventually() to give a longer interval for things to stabilise?

eventually(20_000,
  () -> assert(...)),
  new LambdaTestUtils.ProportionalRetryInterval(50, 1000));

that will exit fast on success, but retry until there's stability

@steveloughran steveloughran added the fs/s3 changes related to hadoop-aws; submitter must declare test endpoint label Mar 26, 2019
@steveloughran
Copy link
Contributor

OK, updated the test to use eventually(); please check my branch for that patch.

If you can run it and the tests work, all is well. If they fail, then we may have more insight as to what is wrong

final long s3ModTime = s3AFileStatus.getModificationTime();

if(s3ModTime > msModTime) {
LOG.debug("S3Guard metadata for {} is outdated, updating it",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

might include the 2 mod times in the debug log

@ben-roling
Copy link
Contributor

Looks like this should be closed since #630 was merged?

@bgaborg
Copy link
Author

bgaborg commented Apr 1, 2019

It's already committed so closing this.

@bgaborg bgaborg closed this Apr 1, 2019
shanthoosh pushed a commit to shanthoosh/hadoop that referenced this pull request Oct 15, 2019
Remove SimpleConsumer  and BrokerProxy from Samza's KafkaSystemConsumer implementation.
Instead use KafkaConsumerProxy with high-level kafka consumer.

Author: Boris S <[email protected]>
Author: Boris Shkolnik <[email protected]>

Reviewers: Shanthoosh Venktataraman <[email protected]>, Prateek Maheshwari <[email protected]>

Closes apache#624 from sborya/NewConsumer2
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

fs/s3 changes related to hadoop-aws; submitter must declare test endpoint

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants