-
Notifications
You must be signed in to change notification settings - Fork 3.4k
Support multipart upload to Azure #26204
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
This is modeled the same way S3OutputStream is working (except for the fact that there is no multipart abort since the staged blocks are automatically deleted by azure and there is no way to explicitly clean them) |
083fcd7 to
12eaba8
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR adds support for multipart uploads by staging and committing blocks for Azure Blob Storage.
- Removed legacy
max-write-concurrencyandmax-single-upload-sizeconfig options in favor of a unifiedwrite-block-sizeand multipart logic - Increased default
write-block-sizefrom 4 MB to 8 MB - Introduced a custom
AzureOutputStreamthat buffers, stages, and commits blocks asynchronously using a thread pool
Reviewed Changes
Copilot reviewed 8 out of 8 changed files in this pull request and generated 1 comment.
Show a summary per file
| File | Description |
|---|---|
| lib/trino-filesystem-azure/src/test/java/io/trino/filesystem/azure/TestAzureFileSystemConfig.java | Updated default write-block-size expectation and removed deprecated config tests |
| lib/trino-filesystem-azure/src/main/java/io/trino/filesystem/azure/AzureUtils.java | Expanded handleAzureException and isFileNotFoundException to accept all Exception types |
| lib/trino-filesystem-azure/src/main/java/io/trino/filesystem/azure/AzureOutputStream.java | Replaced BlockBlobOutputStreamOptions with custom buffering, block staging, and commit logic |
| lib/trino-filesystem-azure/src/main/java/io/trino/filesystem/azure/AzureOutputFile.java | Wired new Executor and block-size into AzureOutputStream |
| lib/trino-filesystem-azure/src/main/java/io/trino/filesystem/azure/AzureFileSystemFactory.java | Added a daemon thread pool for uploads and removed concurrency configs |
| lib/trino-filesystem-azure/src/main/java/io/trino/filesystem/azure/AzureFileSystemConfig.java | Removed concurrency settings, added @DefunctConfig, bumped default write-block-size |
| lib/trino-filesystem-azure/src/main/java/io/trino/filesystem/azure/AzureFileSystem.java | Injected upload executor, removed concurrency parameters, updated output file creation |
| lib/trino-filesystem-azure/pom.xml | Added dependency on io.airlift:concurrent for the upload executor |
Comments suppressed due to low confidence (2)
lib/trino-filesystem-azure/src/main/java/io/trino/filesystem/azure/AzureOutputStream.java:146
- The new multipart upload logic in
flushBuffer,stageBlock, andcommitBlocksIfNeededis not covered by existing tests. Consider adding unit tests to verify behavior when the write spans multiple blocks and to confirm the correct block list is committed.
private void flushBuffer(boolean finished)
lib/trino-filesystem-azure/src/main/java/io/trino/filesystem/azure/AzureFileSystemConfig.java:26
- The properties
azure.max-write-concurrencyandazure.max-single-upload-sizehave been removed. Please update the release notes or user documentation to reflect these defunct configs and the new defaults.
@DefunctConfig({"azure.max-write-concurrency", "azure.max-single-upload-size"})
lib/trino-filesystem-azure/src/main/java/io/trino/filesystem/azure/AzureOutputStream.java
Show resolved
Hide resolved
dain
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice, I didn't know they supported multipart upload
lib/trino-filesystem-azure/src/main/java/io/trino/filesystem/azure/AzureOutputStream.java
Outdated
Show resolved
Hide resolved
lib/trino-filesystem-azure/src/main/java/io/trino/filesystem/azure/AzureOutputStream.java
Outdated
Show resolved
Hide resolved
12eaba8 to
2f4079a
Compare
|
@wendigo The |
|
I'll expose it behind the config toggle to have two separate implementations for writes |
Description
Additional context and related issues
Release notes
( ) This is not user-visible or is docs only, and no release notes are required.
(x) Release notes are required. Please propose a release note for me.
( ) Release notes are required, with the following suggested text: