-
Notifications
You must be signed in to change notification settings - Fork 3.4k
Add support for multipart uploads in Azure FS #26225
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Supersedes #26204 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR adds support for multipart uploads in the Azure filesystem implementation, allowing large files to be written in parallel blocks when enabled.
- Introduce
azure.multipart-write-enabledconfiguration and expose it throughAzureFileSystemConfig - Propagate the new setting and an upload executor through
AzureFileSystemFactoryandAzureFileSystem - Implement
AzureMultipartOutputStreamand updateAzureOutputFileto switch between single-part and multipart writes - Update tests and abstract test utilities to initialize and verify multipart behavior
Reviewed Changes
Copilot reviewed 11 out of 11 changed files in this pull request and generated 1 comment.
Show a summary per file
| File | Description |
|---|---|
| TestAzureFileSystemGen2Hierarchical.java | Use initializeWithAccessKeyAndMultipartWrites in setup |
| TestAzureFileSystemGen2Flat.java | Use initializeWithAccessKeyAndMultipartWrites in setup |
| TestAzureFileSystemConfig.java | Add default and explicit mapping for multipartWriteEnabled |
| AbstractTestAzureFileSystem.java | Add initializeWithAccessKeyAndMultipartWrites and extend initialize signature |
| AzureUtils.java | Broaden exception handlers to accept Throwable |
| AzureOutputFile.java | Add executorService, multipartWriteEnabled, and switch output stream creation |
| AzureMultipartOutputStream.java | New class providing multipart upload logic |
| AzureFileSystemFactory.java | Create a shared uploadExecutor, propagate and pass multipart flag |
| AzureFileSystemConfig.java | Add setMultipartWriteEnabled config property |
| AzureFileSystem.java | Propagate uploadExecutor and multipartWriteEnabled to output files |
| pom.xml | Add dependency on io.airlift:concurrent |
Comments suppressed due to low confidence (3)
lib/trino-filesystem-azure/src/main/java/io/trino/filesystem/azure/AzureFileSystemFactory.java:56
- [nitpick] Rename the field 'multipart' to 'multipartWriteEnabled' to align with the config property and improve clarity.
private final boolean multipart;
lib/trino-filesystem-azure/src/test/java/io/trino/filesystem/azure/TestAzureFileSystemConfig.java:42
- [nitpick] Add a test that verifies when multipartWriteEnabled=false, AzureOutputFile uses the single-part AzureOutputStream implementation, covering the non-multipart code path.
.setMultipartWriteEnabled(false));
lib/trino-filesystem-azure/src/main/java/io/trino/filesystem/azure/AzureUtils.java:34
- Catching Throwable can inadvertently catch Errors (e.g., OutOfMemoryError). It may be safer to narrow the parameter to Exception or RuntimeException to avoid handling unrecoverable errors.
public static IOException handleAzureException(Throwable exception, String action, AzureLocation location)
lib/trino-filesystem-azure/src/main/java/io/trino/filesystem/azure/AzureFileSystemFactory.java
Show resolved
Hide resolved
lib/trino-filesystem-azure/src/main/java/io/trino/filesystem/azure/AzureFileSystemConfig.java
Show resolved
Hide resolved
lib/trino-filesystem-azure/src/main/java/io/trino/filesystem/azure/AzureFileSystemFactory.java
Outdated
Show resolved
Hide resolved
...ino-filesystem-azure/src/main/java/io/trino/filesystem/azure/AzureMultipartOutputStream.java
Outdated
Show resolved
Hide resolved
| checkArgument(writeBlockSizeBytes >= 0, "writeBlockSizeBytes is negative"); | ||
|
|
||
| this.location = location; | ||
| this.writeBlockSizeBytes = writeBlockSizeBytes; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s3.streaming.part-size default is 32MB. We've seen low values cause throttling in S3 (#25781).
Is the 4MB default here too low ?
...ino-filesystem-azure/src/main/java/io/trino/filesystem/azure/AzureMultipartOutputStream.java
Outdated
Show resolved
Hide resolved
f979106 to
716030a
Compare
716030a to
f9cc4fa
Compare
Description
Additional context and related issues
Release notes
( ) This is not user-visible or is docs only, and no release notes are required.
( ) Release notes are required. Please propose a release note for me.
(x) Release notes are required, with the following suggested text: