Skip to content

Conversation

@HeartSaVioR
Copy link
Contributor

NOTE: WIP. DO-NOT-MERGE.

No meaningful tests have been added, as I have no idea where I can add it, and how s3a has been tested
with integration test manner. (Tests in TestS3ABlockOutputStream only check simple things with mocking
everything, so can't do some write/upload test with it.)

I got some review comments from @steveloughran in my commit, and will reflect these review comments.

Once it's done I'll remove WIP.

…ble output stream to be terminated

No meaningful tests have been added, as I have no idea where I can add it, and how s3a has been tested
with integration test manner. (Tests in TestS3ABlockOutputStream only check simple things with mocking
everything, so can't do some write/upload test with it.)
@HeartSaVioR
Copy link
Contributor Author

I'll migrate @steveloughran 's review comments to the diff in PR. It doesn't seem to be done automatically - these comments have been shown as "normal review comments".

}

@Override
public void abort() {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Migrating comment in HeartSaVioR@63c5588#r46507150

the new "cloud ready" API calls always return a CompletableFuture, to emphasise that the op may take time and to allow the caller to do something while waiting. Would we want to do this here? I'm not convinced it is appropriate. Instead we say

  1. call must guarantee that after this is invoked,. close() will not materialize the file at its final path
  2. it may communicate with the store to cancel an operation; which may retry. Errors will be stored.
  3. there may still/also be async IO to the store after the call returns, but this must maintain the requirement "not visible"
  4. And close() may do some IO to cancel

return;
}

S3ADataBlocks.DataBlock block = getActiveBlock();
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Migrating comment in HeartSaVioR@63c5588#r46506593

I think we are going to have to worry about this a bit more, because we may have queued >1 block for upload in a separate thread. They'll maybe need interruption, or at least, when they finish, see if they should immediately cancel the upload. This won't make any difference in the semantics of abort() (the final upload has been killed), I just don't want to run up any bills.

S3ADataBlocks.DataBlock block = getActiveBlock();
try {
if (multiPartUpload != null) {
multiPartUpload.abort();
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Migrating comment in HeartSaVioR@63c5588#r46506765

ok, don't worry so much about my prev comment. That cancels all the outstanding futures.

"uploadId", 50000, 1024, inputStream, null, 0L));
}

@Test
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Migrating comment in HeartSaVioR@63c5588#r46506646

tests are good. We will need to do an ITest too, which can be done in ITestS3ABlockOutputArray


// This verification replaces testing various operations after calling abort:
// after calling abort, stream is closed like calling close().
intercept(IOException.class, () -> stream.checkOpen());
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Migrating comment in HeartSaVioR@63c5588#r46507220

should also verify that stream.write() raises an IOE. We could raise a subclass of IOE to indicate this was a checkOpen failure for a stricter test

@hadoop-yetus
Copy link

💔 -1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 1m 17s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
+1 💚 0m 0s test4tests The patch appears to include 1 new or modified test files.
_ trunk Compile Tests _
+0 🆗 mvndep 13m 57s Maven dependency ordering for branch
+1 💚 mvninstall 23m 29s trunk passed
+1 💚 compile 24m 14s trunk passed with JDK Ubuntu-11.0.9.1+1-Ubuntu-0ubuntu1.20.04
+1 💚 compile 20m 47s trunk passed with JDK Private Build-1.8.0_275-8u275-b01-0ubuntu1~20.04-b01
+1 💚 checkstyle 4m 2s trunk passed
+1 💚 mvnsite 2m 16s trunk passed
+1 💚 shadedclient 22m 24s branch has no errors when building and testing our client artifacts.
+1 💚 javadoc 1m 26s trunk passed with JDK Ubuntu-11.0.9.1+1-Ubuntu-0ubuntu1.20.04
+1 💚 javadoc 2m 6s trunk passed with JDK Private Build-1.8.0_275-8u275-b01-0ubuntu1~20.04-b01
+0 🆗 spotbugs 1m 12s Used deprecated FindBugs config; considering switching to SpotBugs.
+1 💚 findbugs 3m 28s trunk passed
_ Patch Compile Tests _
+0 🆗 mvndep 0m 21s Maven dependency ordering for patch
+1 💚 mvninstall 1m 27s the patch passed
+1 💚 compile 21m 32s the patch passed with JDK Ubuntu-11.0.9.1+1-Ubuntu-0ubuntu1.20.04
+1 💚 javac 21m 32s the patch passed
+1 💚 compile 19m 2s the patch passed with JDK Private Build-1.8.0_275-8u275-b01-0ubuntu1~20.04-b01
+1 💚 javac 19m 2s the patch passed
-0 ⚠️ checkstyle 3m 51s /diff-checkstyle-root.txt root: The patch generated 4 new + 2 unchanged - 0 fixed = 6 total (was 2)
+1 💚 mvnsite 2m 13s the patch passed
+1 💚 whitespace 0m 0s The patch has no whitespace issues.
+1 💚 shadedclient 15m 58s patch has no errors when building and testing our client artifacts.
+1 💚 javadoc 1m 25s the patch passed with JDK Ubuntu-11.0.9.1+1-Ubuntu-0ubuntu1.20.04
+1 💚 javadoc 2m 9s the patch passed with JDK Private Build-1.8.0_275-8u275-b01-0ubuntu1~20.04-b01
-1 ❌ findbugs 1m 21s /new-findbugs-hadoop-tools_hadoop-aws.html hadoop-tools/hadoop-aws generated 1 new + 0 unchanged - 0 fixed = 1 total (was 0)
_ Other Tests _
+1 💚 unit 17m 18s hadoop-common in the patch passed.
+1 💚 unit 1m 53s hadoop-aws in the patch passed.
+1 💚 asflicense 0m 46s The patch does not generate ASF License warnings.
209m 22s
Reason Tests
FindBugs module:hadoop-tools/hadoop-aws
Inconsistent synchronization of org.apache.hadoop.fs.s3a.S3ABlockOutputStream.multiPartUpload; locked 50% of time Unsynchronized access at S3ABlockOutputStream.java:50% of time Unsynchronized access at S3ABlockOutputStream.java:[line 569]
Subsystem Report/Notes
Docker ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2667/1/artifact/out/Dockerfile
GITHUB PR #2667
Optional Tests dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle
uname Linux 45196ae1217f 4.15.0-128-generic #131-Ubuntu SMP Wed Dec 9 06:57:35 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/bin/hadoop.sh
git revision trunk / 1a205cc
Default Java Private Build-1.8.0_275-8u275-b01-0ubuntu1~20.04-b01
Multi-JDK versions /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.9.1+1-Ubuntu-0ubuntu1.20.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_275-8u275-b01-0ubuntu1~20.04-b01
Test Results https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2667/1/testReport/
Max. process+thread count 2568 (vs. ulimit of 5500)
modules C: hadoop-common-project/hadoop-common hadoop-tools/hadoop-aws U: .
Console output https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2667/1/console
versions git=2.25.1 maven=3.6.3 findbugs=4.0.6
Powered by Apache Yetus 0.13.0-SNAPSHOT https://yetus.apache.org

This message was automatically generated.

@HeartSaVioR
Copy link
Contributor Author

Addressed the basic UT, IT, scaled IT. Once the Yetus is happy with the change, I'll remove WIP and ping again for review.

@HeartSaVioR
Copy link
Contributor Author

I don't get the warning sign findbugs provided; multipartUpload field is already accessed via both synchronized/unsynchronized ways before this change, so probably something to ignore here?

https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2667/1/artifact/out/new-findbugs-hadoop-tools_hadoop-aws.html

@hadoop-yetus
Copy link

💔 -1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 1m 5s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
+1 💚 0m 0s test4tests The patch appears to include 3 new or modified test files.
_ trunk Compile Tests _
+0 🆗 mvndep 14m 5s Maven dependency ordering for branch
+1 💚 mvninstall 20m 9s trunk passed
+1 💚 compile 20m 28s trunk passed with JDK Ubuntu-11.0.9.1+1-Ubuntu-0ubuntu1.20.04
+1 💚 compile 17m 50s trunk passed with JDK Private Build-1.8.0_275-8u275-b01-0ubuntu1~20.04-b01
+1 💚 checkstyle 3m 41s trunk passed
+1 💚 mvnsite 2m 23s trunk passed
+1 💚 shadedclient 19m 56s branch has no errors when building and testing our client artifacts.
+1 💚 javadoc 1m 41s trunk passed with JDK Ubuntu-11.0.9.1+1-Ubuntu-0ubuntu1.20.04
+1 💚 javadoc 2m 23s trunk passed with JDK Private Build-1.8.0_275-8u275-b01-0ubuntu1~20.04-b01
+0 🆗 spotbugs 1m 14s Used deprecated FindBugs config; considering switching to SpotBugs.
+1 💚 findbugs 3m 30s trunk passed
_ Patch Compile Tests _
+0 🆗 mvndep 0m 25s Maven dependency ordering for patch
+1 💚 mvninstall 1m 25s the patch passed
+1 💚 compile 19m 47s the patch passed with JDK Ubuntu-11.0.9.1+1-Ubuntu-0ubuntu1.20.04
+1 💚 javac 19m 47s the patch passed
+1 💚 compile 17m 49s the patch passed with JDK Private Build-1.8.0_275-8u275-b01-0ubuntu1~20.04-b01
+1 💚 javac 17m 49s the patch passed
-0 ⚠️ checkstyle 3m 44s /diff-checkstyle-root.txt root: The patch generated 4 new + 2 unchanged - 0 fixed = 6 total (was 2)
+1 💚 mvnsite 2m 26s the patch passed
+1 💚 whitespace 0m 0s The patch has no whitespace issues.
+1 💚 shadedclient 13m 25s patch has no errors when building and testing our client artifacts.
+1 💚 javadoc 1m 38s the patch passed with JDK Ubuntu-11.0.9.1+1-Ubuntu-0ubuntu1.20.04
+1 💚 javadoc 2m 17s the patch passed with JDK Private Build-1.8.0_275-8u275-b01-0ubuntu1~20.04-b01
-1 ❌ findbugs 1m 25s /new-findbugs-hadoop-tools_hadoop-aws.html hadoop-tools/hadoop-aws generated 1 new + 0 unchanged - 0 fixed = 1 total (was 0)
_ Other Tests _
+1 💚 unit 17m 14s hadoop-common in the patch passed.
+1 💚 unit 2m 4s hadoop-aws in the patch passed.
+1 💚 asflicense 0m 55s The patch does not generate ASF License warnings.
193m 13s
Reason Tests
FindBugs module:hadoop-tools/hadoop-aws
Inconsistent synchronization of org.apache.hadoop.fs.s3a.S3ABlockOutputStream.multiPartUpload; locked 50% of time Unsynchronized access at S3ABlockOutputStream.java:50% of time Unsynchronized access at S3ABlockOutputStream.java:[line 569]
Subsystem Report/Notes
Docker ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2667/3/artifact/out/Dockerfile
GITHUB PR #2667
Optional Tests dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle
uname Linux 2cbffd8ea39d 4.15.0-112-generic #113-Ubuntu SMP Thu Jul 9 23:41:39 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/bin/hadoop.sh
git revision trunk / 21a3fc3
Default Java Private Build-1.8.0_275-8u275-b01-0ubuntu1~20.04-b01
Multi-JDK versions /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.9.1+1-Ubuntu-0ubuntu1.20.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_275-8u275-b01-0ubuntu1~20.04-b01
Test Results https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2667/3/testReport/
Max. process+thread count 3152 (vs. ulimit of 5500)
modules C: hadoop-common-project/hadoop-common hadoop-tools/hadoop-aws U: .
Console output https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2667/3/console
versions git=2.25.1 maven=3.6.3 findbugs=4.0.6
Powered by Apache Yetus 0.13.0-SNAPSHOT https://yetus.apache.org

This message was automatically generated.

@HeartSaVioR
Copy link
Contributor Author

I'll remove WIP tag as I don't have any idea on findbugs failure and I guess I've addressed everything except "cloud-friendly" requirement on API. This doesn't sound something strictly bound to this PR, but please correct me if I'm missing here.

@HeartSaVioR HeartSaVioR marked this pull request as ready for review February 2, 2021 06:51
@HeartSaVioR HeartSaVioR changed the title WIP. HADOOP-16906. Add Abortable.abort() interface for streams to enable output stream to be terminated HADOOP-16906. Add Abortable.abort() interface for streams to enable output stream to be terminated Feb 2, 2021
@HeartSaVioR
Copy link
Contributor Author

cc. @steveloughran Could you please review this PR? Thanks in advance!

@hadoop-yetus
Copy link

💔 -1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 1m 13s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
+1 💚 0m 0s test4tests The patch appears to include 3 new or modified test files.
_ trunk Compile Tests _
+0 🆗 mvndep 14m 6s Maven dependency ordering for branch
+1 💚 mvninstall 20m 7s trunk passed
+1 💚 compile 20m 30s trunk passed with JDK Ubuntu-11.0.9.1+1-Ubuntu-0ubuntu1.20.04
+1 💚 compile 17m 50s trunk passed with JDK Private Build-1.8.0_275-8u275-b01-0ubuntu1~20.04-b01
+1 💚 checkstyle 3m 44s trunk passed
+1 💚 mvnsite 2m 28s trunk passed
+1 💚 shadedclient 20m 9s branch has no errors when building and testing our client artifacts.
+1 💚 javadoc 1m 40s trunk passed with JDK Ubuntu-11.0.9.1+1-Ubuntu-0ubuntu1.20.04
+1 💚 javadoc 2m 18s trunk passed with JDK Private Build-1.8.0_275-8u275-b01-0ubuntu1~20.04-b01
+0 🆗 spotbugs 1m 17s Used deprecated FindBugs config; considering switching to SpotBugs.
+1 💚 findbugs 3m 32s trunk passed
_ Patch Compile Tests _
+0 🆗 mvndep 0m 25s Maven dependency ordering for patch
+1 💚 mvninstall 1m 27s the patch passed
+1 💚 compile 19m 49s the patch passed with JDK Ubuntu-11.0.9.1+1-Ubuntu-0ubuntu1.20.04
+1 💚 javac 19m 49s the patch passed
+1 💚 compile 17m 50s the patch passed with JDK Private Build-1.8.0_275-8u275-b01-0ubuntu1~20.04-b01
+1 💚 javac 17m 50s the patch passed
+1 💚 checkstyle 3m 41s the patch passed
+1 💚 mvnsite 2m 21s the patch passed
+1 💚 whitespace 0m 0s The patch has no whitespace issues.
+1 💚 shadedclient 13m 23s patch has no errors when building and testing our client artifacts.
+1 💚 javadoc 1m 39s the patch passed with JDK Ubuntu-11.0.9.1+1-Ubuntu-0ubuntu1.20.04
+1 💚 javadoc 2m 20s the patch passed with JDK Private Build-1.8.0_275-8u275-b01-0ubuntu1~20.04-b01
-1 ❌ findbugs 1m 24s /new-findbugs-hadoop-tools_hadoop-aws.html hadoop-tools/hadoop-aws generated 1 new + 0 unchanged - 0 fixed = 1 total (was 0)
_ Other Tests _
+1 💚 unit 17m 19s hadoop-common in the patch passed.
+1 💚 unit 2m 7s hadoop-aws in the patch passed.
+1 💚 asflicense 0m 56s The patch does not generate ASF License warnings.
193m 59s
Reason Tests
FindBugs module:hadoop-tools/hadoop-aws
Inconsistent synchronization of org.apache.hadoop.fs.s3a.S3ABlockOutputStream.multiPartUpload; locked 50% of time Unsynchronized access at S3ABlockOutputStream.java:50% of time Unsynchronized access at S3ABlockOutputStream.java:[line 569]
Subsystem Report/Notes
Docker ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2667/4/artifact/out/Dockerfile
GITHUB PR #2667
Optional Tests dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle
uname Linux 3f942e07395b 4.15.0-112-generic #113-Ubuntu SMP Thu Jul 9 23:41:39 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/bin/hadoop.sh
git revision trunk / 1b893e1
Default Java Private Build-1.8.0_275-8u275-b01-0ubuntu1~20.04-b01
Multi-JDK versions /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.9.1+1-Ubuntu-0ubuntu1.20.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_275-8u275-b01-0ubuntu1~20.04-b01
Test Results https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2667/4/testReport/
Max. process+thread count 1489 (vs. ulimit of 5500)
modules C: hadoop-common-project/hadoop-common hadoop-tools/hadoop-aws U: .
Console output https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2667/4/console
versions git=2.25.1 maven=3.6.3 findbugs=4.0.6
Powered by Apache Yetus 0.13.0-SNAPSHOT https://yetus.apache.org

This message was automatically generated.

Copy link
Contributor

@mehakmeet mehakmeet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, pending some nits. Also, maybe we should add a test after a path is created and then abort the upload? There are no tests to check abort with pre-existing data on a file. Also, we have object_multipart_aborted statistic in IOStatistics by @steveloughran, which could be a little helpful in asserting the abort while doing a multipart upload.


import org.apache.hadoop.fs.FSDataOutputStream;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.io.IOUtils;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reposition in org.apache.* block below.

}

@Test
public void testAbortAfterTwoPartUpload() throws Throwable {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

javadoc or describe() to explain the test.

import com.amazonaws.services.s3.model.PutObjectRequest;
import com.amazonaws.services.s3.model.PutObjectResult;
import com.amazonaws.services.s3.model.UploadPartRequest;
import org.apache.hadoop.fs.Abortable;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reposition in org.apache.* block below.

Copy link
Contributor

@mukund-thakur mukund-thakur left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Prod code looks good.
Pending checkstyle as well.
Let me think what can done for better tests. I like mehakmeet's suggention. Also about the find bug.


@Test
public void testAbortAfterWrite() throws Throwable {
Path dest = path("testAbortAfterWrite");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use getMethodName()

@Test
public void testAbortAfterWrite() throws Throwable {
Path dest = path("testAbortAfterWrite");
describe(" testAbortAfterWrite");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: A bit more explanatory.

return true;

// S3A supports abort.
case StreamCapabilities.ABORTABLE:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can merge both case statements


S3ADataBlocks.DataBlock block = getActiveBlock();
try {
if (multiPartUpload != null) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wondering what happens in case of non multipart upload

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it,we are closing the stream before only.

@mukund-thakur
Copy link
Contributor

I don't get the warning sign findbugs provided; multipartUpload field is already accessed via both synchronized/unsynchronized ways before this change, so probably something to ignore here?

https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2667/1/artifact/out/new-findbugs-hadoop-tools_hadoop-aws.html

This is the reason.
The number of unsynchronized field accesses (reads and writes) was no more than one third of all accesses, with writes being weighed twice as high as reads

I think we can ignore this. What do you think @steveloughran ?

@steveloughran
Copy link
Contributor

Added an extra patch to this, see the PR #2684 .

@HeartSaVioR
Copy link
Contributor Author

Thanks @steveloughran for helping! I see you've addressed lots of points especially amazing efforts on doc. Let me close this and jump on your PR. Thanks again!

@HeartSaVioR HeartSaVioR closed this Feb 6, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants