Increase streaming part size to 32MB #25781

sopel39 · 2025-05-13T13:37:57Z

Default of 16MB causes S3 throttling issues with large clusters during inserts

Description

Additional context and related issues

Release notes

( ) This is not user-visible or is docs only, and no release notes are required.
( ) Release notes are required. Please propose a release note for me.
(x) Release notes are required, with the following suggested text:

## Hive, Iceberg, Delta Lake
* Reduce S3 throttling failures by increasing streaming upload part size and reducing number of requests. ({issue}`25781`)

pettyjamesm · 2025-05-13T14:04:04Z

Agreed that 16MB seems too small, but 256MB might be too large for a default value- especially given that S3FileSystemConfig has 256MB as the maximum allowed value. Obviously this is a somewhat arbitrary value and partially depends on how much memory you're willing to set aside for S3 upload buffering, how close to the throttling limit you run on your S3 bucket, etc- but there is a downside to going too large: if your average file size is smaller than 2x the part size then you lose performance on upload by limiting the opportunity for parallelism. Would 64MB or 128MB make more sense here?

sopel39 · 2025-05-13T15:07:28Z

S3FileSystemConfig has 256MB as the maximum allowed value

256MB max is arbitrary. According to https://docs.aws.amazon.com/AmazonS3/latest/userguide/qfacts.html actual max part is 5GB.

Would 64MB or 128MB make more sense here?

We already have writer scaling within worker and across cluster. We don't need extra parallelism, but rather reliability. It's pretty easy to exhaust S3 limits even with moderate cluster (5-10 nodes).

pettyjamesm · 2025-05-13T15:33:11Z

256MB max is arbitrary

I agree that the maximum configurable value was probably an arbitrary choice, but S3FileSystemConfig#getStreamingPartSize() is still annotated with @MaxDataSize("256MB") so it would be weird to have the maximum and defaults be the same in this instance, no?

We already have writer scaling within worker and across cluster. We don't need extra parallelism, but rather reliability.

This will still hurt pipelined parallelism for each writer instance when file size is smaller than 2 * partSize which means worse overall performance on each of those file writers observed as a long delay when attempting to close the upload stream. It also means that the memory cost for writer scaling will be higher because of the amount of memory buffered for each writer is increasing by 16x which is another way to hurt reliability here due to OOM.

It's pretty easy to exhaust S3 limits even with moderate cluster (5-10 nodes).

You should be able to push 3,500 operations per second through the API and each stream has an upload speed of ~100MB/s maximum- so you would need a very large number of open writers on a cluster with that many nodes to bump up against the limit (and based on a value of 256MB, you would now expect #close() to take ~2.5 seconds to complete in a best case scenario when the buffer is full)

Again, I'm just saying that 256MB seems high as a default and that 64MB or 128MB seems like more conservative increase knowing that users can always configure larger values if it makes sense for their workload. If you still think 256MB is the right arbitrary default value to pick though, then you probably also want to increase the maximum allowed value that the config permits.

raunaqmorarka

The default max row group size in parquet writer is 128mb, so raising s3.streaming.part-size beyond that by default does not make sense.
We had lots of customers complaining about increased memory requirements for writes when task level writer scaling was rolled out, so I would rather not aggressively push memory requirements for write again.
I think the default config bump should be the minimal amount needed to get past throttling, like 32MB.
If we have to push beyond that then I recommend adding a note about tuning this higher in the documentation and making it easier for users to observe if they are being throttled.

Default of 16MB causes S3 throttling issues with large clusters during inserts

sopel39 · 2025-05-14T09:50:41Z

Changed to 32MB. Empirically 16MB is too small even for 10 nodes. With 256MB throttling issues go away even with large clusters and node cap is not required. I'm not sure if 32MB will solve the issue though.

cla-bot bot added the cla-signed label May 13, 2025

sopel39 requested a review from raunaqmorarka May 13, 2025 13:38

sopel39 added the release-notes label May 13, 2025

raunaqmorarka reviewed May 14, 2025

View reviewed changes

Increase streaming part size to 32MB

8ce6f83

Default of 16MB causes S3 throttling issues with large clusters during inserts

sopel39 force-pushed the ks/bump_size branch from 3ce4e44 to 8ce6f83 Compare May 14, 2025 09:43

github-actions bot added the docs label May 14, 2025

sopel39 requested a review from raunaqmorarka May 14, 2025 09:51

raunaqmorarka requested a review from pettyjamesm May 14, 2025 10:12

raunaqmorarka approved these changes May 14, 2025

View reviewed changes

sopel39 merged commit 82f3f03 into trinodb:master May 14, 2025
67 checks passed

sopel39 changed the title ~~Increase streaming part size to 256MB~~ Increase streaming part size to 32MB May 14, 2025

sopel39 deleted the ks/bump_size branch May 14, 2025 10:42

github-actions bot added this to the 476 milestone May 14, 2025

raunaqmorarka mentioned this pull request Jul 17, 2025

Add support for multipart uploads in Azure FS #26225

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Increase streaming part size to 32MB #25781

Increase streaming part size to 32MB #25781

Uh oh!

sopel39 commented May 13, 2025 •

edited by raunaqmorarka

Loading

Uh oh!

pettyjamesm commented May 13, 2025

Uh oh!

sopel39 commented May 13, 2025 •

edited

Loading

Uh oh!

pettyjamesm commented May 13, 2025 •

edited

Loading

Uh oh!

raunaqmorarka left a comment

Uh oh!

sopel39 commented May 14, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

3 participants

Increase streaming part size to 32MB #25781

Increase streaming part size to 32MB #25781

Uh oh!

Conversation

sopel39 commented May 13, 2025 • edited by raunaqmorarka Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Additional context and related issues

Release notes

Uh oh!

pettyjamesm commented May 13, 2025

Uh oh!

sopel39 commented May 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pettyjamesm commented May 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

raunaqmorarka left a comment

Choose a reason for hiding this comment

Uh oh!

sopel39 commented May 14, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

3 participants

sopel39 commented May 13, 2025 •

edited by raunaqmorarka

Loading

sopel39 commented May 13, 2025 •

edited

Loading

pettyjamesm commented May 13, 2025 •

edited

Loading