Skip to content

Conversation

@wendigo
Copy link
Contributor

@wendigo wendigo commented Jul 15, 2025

Description

Additional context and related issues

Release notes

( ) This is not user-visible or is docs only, and no release notes are required.
(x) Release notes are required. Please propose a release note for me.
( ) Release notes are required, with the following suggested text:

## Section
* Fix some things. ({issue}`issuenumber`)

@wendigo
Copy link
Contributor Author

wendigo commented Jul 15, 2025

This is modeled the same way S3OutputStream is working (except for the fact that there is no multipart abort since the staged blocks are automatically deleted by azure and there is no way to explicitly clean them)

@wendigo wendigo force-pushed the serafin/staged-upload branch 2 times, most recently from 083fcd7 to 12eaba8 Compare July 15, 2025 13:54
@wendigo wendigo requested a review from Copilot July 15, 2025 13:54
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR adds support for multipart uploads by staging and committing blocks for Azure Blob Storage.

  • Removed legacy max-write-concurrency and max-single-upload-size config options in favor of a unified write-block-size and multipart logic
  • Increased default write-block-size from 4 MB to 8 MB
  • Introduced a custom AzureOutputStream that buffers, stages, and commits blocks asynchronously using a thread pool

Reviewed Changes

Copilot reviewed 8 out of 8 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
lib/trino-filesystem-azure/src/test/java/io/trino/filesystem/azure/TestAzureFileSystemConfig.java Updated default write-block-size expectation and removed deprecated config tests
lib/trino-filesystem-azure/src/main/java/io/trino/filesystem/azure/AzureUtils.java Expanded handleAzureException and isFileNotFoundException to accept all Exception types
lib/trino-filesystem-azure/src/main/java/io/trino/filesystem/azure/AzureOutputStream.java Replaced BlockBlobOutputStreamOptions with custom buffering, block staging, and commit logic
lib/trino-filesystem-azure/src/main/java/io/trino/filesystem/azure/AzureOutputFile.java Wired new Executor and block-size into AzureOutputStream
lib/trino-filesystem-azure/src/main/java/io/trino/filesystem/azure/AzureFileSystemFactory.java Added a daemon thread pool for uploads and removed concurrency configs
lib/trino-filesystem-azure/src/main/java/io/trino/filesystem/azure/AzureFileSystemConfig.java Removed concurrency settings, added @DefunctConfig, bumped default write-block-size
lib/trino-filesystem-azure/src/main/java/io/trino/filesystem/azure/AzureFileSystem.java Injected upload executor, removed concurrency parameters, updated output file creation
lib/trino-filesystem-azure/pom.xml Added dependency on io.airlift:concurrent for the upload executor
Comments suppressed due to low confidence (2)

lib/trino-filesystem-azure/src/main/java/io/trino/filesystem/azure/AzureOutputStream.java:146

  • The new multipart upload logic in flushBuffer, stageBlock, and commitBlocksIfNeeded is not covered by existing tests. Consider adding unit tests to verify behavior when the write spans multiple blocks and to confirm the correct block list is committed.
    private void flushBuffer(boolean finished)

lib/trino-filesystem-azure/src/main/java/io/trino/filesystem/azure/AzureFileSystemConfig.java:26

  • The properties azure.max-write-concurrency and azure.max-single-upload-size have been removed. Please update the release notes or user documentation to reflect these defunct configs and the new defaults.
@DefunctConfig({"azure.max-write-concurrency", "azure.max-single-upload-size"})

Copy link
Member

@dain dain left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, I didn't know they supported multipart upload

@wendigo wendigo force-pushed the serafin/staged-upload branch from 12eaba8 to 2f4079a Compare July 16, 2025 09:45
@chenjian2664
Copy link
Contributor

@wendigo The Release Notes section has not been updated yet.

@wendigo
Copy link
Contributor Author

wendigo commented Jul 17, 2025

I'll expose it behind the config toggle to have two separate implementations for writes

@wendigo wendigo closed this Jul 17, 2025
@wendigo wendigo deleted the serafin/staged-upload branch August 1, 2025 19:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Development

Successfully merging this pull request may close these issues.

6 participants