-
Notifications
You must be signed in to change notification settings - Fork 3.4k
AWS: abort S3 input stream on close if not EOS #7262
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 5 commits
3939b55
cf247eb
492c419
ac2e5df
d365d91
69d8f1a
f881681
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -36,6 +36,7 @@ | |
| import org.slf4j.Logger; | ||
| import org.slf4j.LoggerFactory; | ||
| import software.amazon.awssdk.core.sync.ResponseTransformer; | ||
| import software.amazon.awssdk.http.Abortable; | ||
| import software.amazon.awssdk.services.s3.S3Client; | ||
| import software.amazon.awssdk.services.s3.model.GetObjectRequest; | ||
| import software.amazon.awssdk.services.s3.model.NoSuchKeyException; | ||
|
|
@@ -194,9 +195,29 @@ private void openStream() throws IOException { | |
| } | ||
| } | ||
|
|
||
| private void closeStream() throws IOException { | ||
| private void closeStream() { | ||
| if (stream != null) { | ||
| stream.close(); | ||
| // if we aren't at the end of the stream, and the stream is abortable, then | ||
| // call abort() so we don't read the remaining data with the Apache HTTP client | ||
| abortStream(); | ||
| try { | ||
| stream.close(); | ||
| } catch (Exception e) { | ||
| // log at trace level as closing an aborted stream will throw a content length | ||
| // check exception with the Apache HTTP client | ||
| LOG.trace("Error closing stream", e); | ||
| } | ||
| stream = null; | ||
| } | ||
| } | ||
|
|
||
| private void abortStream() { | ||
| try { | ||
| if (stream instanceof Abortable && stream.read() != -1) { | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. why do we want to read one more byte here? It might cause one more request. I think it does not hurt to even abort when it's already fully read.
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. If you abort the connection then it will be invalidated and removed from the pool, so it won't be reused. So this is an optimization to attempt to reuse the connection in cases where it has been fully read. |
||
| ((Abortable) stream).abort(); | ||
| } | ||
| } catch (Exception e) { | ||
| LOG.debug("An error occurred while aborting the stream", e); | ||
|
jackye1995 marked this conversation as resolved.
Outdated
|
||
| } | ||
| } | ||
|
|
||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That means we will see exceptions for every stream close if trace is enabled. If it is expected to fail with the content length check, can we at least skip that? Then we don't need to log in trace, but we can log in maybe a warning level.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is how I originally had it but the feedback was to log something. I'd be happy to remove that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@singhpk234 @danielcweeks are you ok if I revert this back to a no-op?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess the question to ask before this is, why do we need to close it again if it's already aborted? If you see in the Trino logic I referenced, it's an if-else.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That would work for the Apache HTTP client, the close doesn't do anything for that. But aborting the stream opened by URL connection client doesn't do anything, so we need to close it in that case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, that's interesting difference, thanks for the explanation!