-
Notifications
You must be signed in to change notification settings - Fork 736
Limit S3 concurrent downloads to fix Java Heap OOM #6402
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: jorgee <[email protected]>
✅ Deploy Preview for nextflow-docs-staging canceled.
|
Signed-off-by: jorgee <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've made a small suggestion for the docs. It's non-blocking so I've approved.
Co-authored-by: Chris Hakkaart <[email protected]> Signed-off-by: Jorge Ejarque <[email protected]>
Signed-off-by: jorgee <[email protected]>
@claude review |
Claude finished @pditommaso's task —— View job Code Review: S3 Concurrent Download Memory ManagementTask: Review PR implementation for limiting S3 concurrent downloads to prevent Java Heap OOM Todo List:
SummaryThis PR addresses a critical memory management issue in S3 concurrent downloads by implementing a semaphore-based solution to limit concurrent transfers based on heap buffer consumption. The implementation reverts from using AWS SDK's Key Changes Analysis✅ Strengths
|
plugins/nf-amazon/src/main/nextflow/cloud/aws/config/AwsS3Config.groovy
Outdated
Show resolved
Hide resolved
Signed-off-by: Paolo Di Tommaso <[email protected]>
Signed-off-by: Paolo Di Tommaso <[email protected]>
4f99c77
to
5c163ca
Compare
Signed-off-by: Paolo Di Tommaso <[email protected]>
@claude review again |
Claude finished @pditommaso's task —— View job Re-reviewing S3 Concurrent Download Memory Management PRTodo List:
Updated Code Review AnalysisAfter reviewing the current state of the PR and comparing it to the previous review feedback from September 22nd, here are my findings: ✅ Issues Addressed Since Previous Review
|
…eption Replace logging with throwing InterruptedIOException when handling InterruptedException in S3 operations. This provides consistent exception propagation instead of silently logging and continuing execution, allowing proper interruption handling in calling code. Changes: - downloadFile: throw InterruptedIOException instead of logging - downloadDirectory: throw InterruptedIOException in both file visitor and main method - uploadFile: throw InterruptedIOException instead of logging - uploadDirectory: throw InterruptedIOException instead of logging - copyFile: throw InterruptedIOException instead of logging 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]>
InterruptedException Handling ImprovementsI've updated the S3Client to improve InterruptedException handling across all S3 operations (downloadFile, downloadDirectory, uploadFile, uploadDirectory, and copyFile). Approach UsedWhen an
RationaleThis approach provides several benefits:
Previous vs. Current BehaviorBefore: InterruptedException was caught, logged at debug level, and execution continued normally } catch (InterruptedException e) {
log.debug("S3 download file: s3://{}/{} cancelled", source.getBucket(), source.getKey());
Thread.currentThread().interrupt();
// Execution continues without throwing
} After: InterruptedException is caught, thread state is restored, and a proper IOException is thrown } catch (InterruptedException e) {
Thread.currentThread().interrupt();
throw new InterruptedIOException(String.format("S3 download file: s3://%s/%s cancelled", source.getBucket(), source.getKey()));
} This change ensures that thread interruptions in S3 operations are handled correctly and can be properly responded to by calling code. |
Signed-off-by: Paolo Di Tommaso <[email protected]>
Cursor is annoying, but this could be a relevant point to address |
…to s3 transfer manager Signed-off-by: jorgee <[email protected]>
Signed-off-by: jorgee <[email protected]>
Signed-off-by: jorgee <[email protected]>
@pditommaso fixed the cursor comments. The relevant changes.
There is a commit from you with a sign-off missingc7fad5f |
Signed-off-by: Paolo Di Tommaso <[email protected]>
Signed-off-by: Paolo Di Tommaso <[email protected]>
There is a current issue in the S3 CRT client, where concurrent downloads can consume large amount of heap memory.
aws/aws-sdk-java-v2#6323
It was produced when benchmarking the download directory with several large files from S3 to an EC2. I initially fixed with
S3TranferManager.transferDirectoryMaxConcurrency
included recentlyin the AWS SDK v2, but it just solve in the case of a single download directory, but not for concurrent directory downloads. Moreover, the same problem is produced when increasing the executor pool size and downloading several big files concurrently.To avoid the OOM, I have reverted the downloadDirectory method to be managed by Nextflow as in V1. A new class is created that extends the S3 Transfer manager, limiting the number of file downloads based on the buffer created by each task and the maximum heap memory we want to dedicate to file download buffers. By default, it is set to 400MB, but it can be modified with
aws.client.maxDownloadBuffer
The extended transfer manager contains a semaphore to limit file downloads. The number of permits of this semaphore is
floor(maxDownloadBuffer / minimumPartSize)
. A file consumes a maximum of 10 parts because (if size exceeds 10 *minimumPartSize) or the number of parts (if size is smaller). This is how CRT client manages the download buffer