Skip to content

Conversation

@abmo-x
Copy link
Contributor

@abmo-x abmo-x commented Jul 20, 2022

current implementation only verifies mock invocation on an exception in catch block. This is flaky as the test won't verify if there are no failures, but we want to make sure the calls are made in the test.

@github-actions github-actions bot added the AWS label Jul 20, 2022
@abmo-x
Copy link
Contributor Author

abmo-x commented Jul 20, 2022

Updated with recommendations, Thanks @RussellSpitzer

Copy link
Member

@RussellSpitzer RussellSpitzer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there are a few formatting issues remaining but other than that this is good to go. We need to revert all the changes which aren't directly related to the patch. I think the byte [] to byte[] is correct but we should separate that out into another patch. Since we are about to do a big programmatic style application soon it is probably best to just leave it as is for now.

@abmo-x
Copy link
Contributor Author

abmo-x commented Jul 21, 2022

I think there are a few formatting issues remaining but other than that this is good to go. We need to revert all the changes which aren't directly related to the patch. I think the byte [] to byte[] is correct but we should separate that out into another patch. Since we are about to do a big programmatic style application soon it is probably best to just leave it as is for now.

Reverted formatting which was not part of the PR. Thanks!

Copy link
Member

@RussellSpitzer RussellSpitzer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the test improvement!

@RussellSpitzer RussellSpitzer merged commit d5c0aa4 into apache:master Jul 26, 2022
@aokolnychyi
Copy link
Contributor

aokolnychyi commented Jul 27, 2022

@RussellSpitzer @abmo-x, we have started seeing test failures in affected tests (I am not sure if they were there before). Could you help investigate?

org.apache.iceberg.aws.s3.TestS3OutputStream > testAbortAfterFailedPartUpload FAILED
[213](https://github.com/apache/iceberg/runs/7530666701?check_suite_focus=true#step:6:214)
    java.lang.AssertionError: 
[214](https://github.com/apache/iceberg/runs/7530666701?check_suite_focus=true#step:6:215)
    Expecting throwable message:
[215](https://github.com/apache/iceberg/runs/7530666701?check_suite_focus=true#step:6:216)
      "java.io.UncheckedIOException: java.nio.file.NoSuchFileException: /tmp/s3fileio-test-3731532446800745900/s3fileio-2448969152692915450.tmp"
[216](https://github.com/apache/iceberg/runs/7530666701?check_suite_focus=true#step:6:217)
    to contain:
[217](https://github.com/apache/iceberg/runs/7530666701?check_suite_focus=true#step:6:218)
      "mock uploadPart failure"
[218](https://github.com/apache/iceberg/runs/7530666701?check_suite_focus=true#step:6:219)
    but did not.
[219](https://github.com/apache/iceberg/runs/7530666701?check_suite_focus=true#step:6:220)

@abmo-x
Copy link
Contributor Author

abmo-x commented Jul 27, 2022

testAbortAfterFailedPartUpload

@aokolnychyi looks like the expected tmp staging file that gets created disappeared, this failure occurs before the mock gets invoked.
https://github.com/apache/iceberg/runs/7530666701?check_suite_focus=true#step:6:239

Caused by: java.nio.file.NoSuchFileException: /tmp/s3fileio-test-3731532446800745900/s3fileio-2448969152692915450.tmp

Wondering what could be different in the test environment, the test works fine when ran locally both from cli and IDE.

Do we run unit tests as part of the PR checks before its merged? wanted to check if this test passed in that build.

@aokolnychyi
Copy link
Contributor

Looks like it fails sporadically. I'll look into it tomorrow unless @RussellSpitzer gets to it first.

@RussellSpitzer
Copy link
Member

RussellSpitzer commented Jul 27, 2022

Haven't taken a look yet but @abmo-x we do run the tests as you will see if you click the "details" on the merge note above. All tests passed at that time.

image

@abmo-x
Copy link
Contributor Author

abmo-x commented Jul 27, 2022

It's possible this exception was occurring before and the test ignored it as the test was catching all exceptions before this change. Now we are explicitly checking for the exception to be of certain type with certain msg.

Not sure if there is some race condition going on which causes file to be deleted on exit or some parallel test which is cleaning up the tmp dir

@amogh-jahagirdar
Copy link
Contributor

Yeah it's failing for me too. https://github.com/apache/iceberg/runs/7571542364?check_suite_focus=true

I'm also taking a look into it

@singhpk234
Copy link
Contributor

singhpk234 commented Jul 29, 2022

+1, I also observed this, have a possible RC, presently in case of any failure of completable future in uploadParts we call abortUpload() which further deletes all the staging files

private void abortUpload() {
if (multipartUploadId != null) {
try {
s3.abortMultipartUpload(
AbortMultipartUploadRequest.builder()
.bucket(location.bucket())
.key(location.key())
.uploadId(multipartUploadId)
.build());
} finally {
cleanUpStagingFiles();
}
}

Now when another completable future starts to read file for creating a request

UploadPartResponse response =
s3.uploadPart(uploadRequest, RequestBody.fromFile(f));
return CompletedPart.builder()
.eTag(response.eTag())
.partNumber(uploadRequest.partNumber())
.build();
it fails with a FileNotFoundException (as staging files have been deleted from earlier future) Now when we do join of all the completable future we can get a failure due to FileNotFound, and we expecting our injected runtime failure

here is a gist for repro consistently & complete stack trace : https://gist.github.com/singhpk234/4257ea980017db5704857c3c7cc2fd0b

I think this PR of mine can fix this flakyness as well : #5366

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants