Skip to content

Conversation

@wendigo
Copy link
Contributor

@wendigo wendigo commented Jul 17, 2025

Description

Additional context and related issues

Release notes

( ) This is not user-visible or is docs only, and no release notes are required.
( ) Release notes are required. Please propose a release note for me.
(x) Release notes are required, with the following suggested text:

## Azure Native FS
* Add `azure.multipart-write-enabled` that enables multipart uploads for large files ({issue}`issuenumber`)

@cla-bot cla-bot bot added the cla-signed label Jul 17, 2025
@wendigo
Copy link
Contributor Author

wendigo commented Jul 17, 2025

Supersedes #26204

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR adds support for multipart uploads in the Azure filesystem implementation, allowing large files to be written in parallel blocks when enabled.

  • Introduce azure.multipart-write-enabled configuration and expose it through AzureFileSystemConfig
  • Propagate the new setting and an upload executor through AzureFileSystemFactory and AzureFileSystem
  • Implement AzureMultipartOutputStream and update AzureOutputFile to switch between single-part and multipart writes
  • Update tests and abstract test utilities to initialize and verify multipart behavior

Reviewed Changes

Copilot reviewed 11 out of 11 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
TestAzureFileSystemGen2Hierarchical.java Use initializeWithAccessKeyAndMultipartWrites in setup
TestAzureFileSystemGen2Flat.java Use initializeWithAccessKeyAndMultipartWrites in setup
TestAzureFileSystemConfig.java Add default and explicit mapping for multipartWriteEnabled
AbstractTestAzureFileSystem.java Add initializeWithAccessKeyAndMultipartWrites and extend initialize signature
AzureUtils.java Broaden exception handlers to accept Throwable
AzureOutputFile.java Add executorService, multipartWriteEnabled, and switch output stream creation
AzureMultipartOutputStream.java New class providing multipart upload logic
AzureFileSystemFactory.java Create a shared uploadExecutor, propagate and pass multipart flag
AzureFileSystemConfig.java Add setMultipartWriteEnabled config property
AzureFileSystem.java Propagate uploadExecutor and multipartWriteEnabled to output files
pom.xml Add dependency on io.airlift:concurrent
Comments suppressed due to low confidence (3)

lib/trino-filesystem-azure/src/main/java/io/trino/filesystem/azure/AzureFileSystemFactory.java:56

  • [nitpick] Rename the field 'multipart' to 'multipartWriteEnabled' to align with the config property and improve clarity.
    private final boolean multipart;

lib/trino-filesystem-azure/src/test/java/io/trino/filesystem/azure/TestAzureFileSystemConfig.java:42

  • [nitpick] Add a test that verifies when multipartWriteEnabled=false, AzureOutputFile uses the single-part AzureOutputStream implementation, covering the non-multipart code path.
                .setMultipartWriteEnabled(false));

lib/trino-filesystem-azure/src/main/java/io/trino/filesystem/azure/AzureUtils.java:34

  • Catching Throwable can inadvertently catch Errors (e.g., OutOfMemoryError). It may be safer to narrow the parameter to Exception or RuntimeException to avoid handling unrecoverable errors.
    public static IOException handleAzureException(Throwable exception, String action, AzureLocation location)

checkArgument(writeBlockSizeBytes >= 0, "writeBlockSizeBytes is negative");

this.location = location;
this.writeBlockSizeBytes = writeBlockSizeBytes;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s3.streaming.part-size default is 32MB. We've seen low values cause throttling in S3 (#25781).
Is the 4MB default here too low ?

@wendigo wendigo force-pushed the serafin/staged-upload-v2 branch 3 times, most recently from f979106 to 716030a Compare July 18, 2025 10:00
@wendigo wendigo merged commit 1879d98 into master Jul 21, 2025
103 of 104 checks passed
@wendigo wendigo deleted the serafin/staged-upload-v2 branch July 21, 2025 09:51
@github-actions github-actions bot added this to the 477 milestone Jul 21, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Development

Successfully merging this pull request may close these issues.

3 participants