Reduce memory usage in writer by freeing unused buffers by chenyangfb · Pull Request #23724 · prestodb/presto

chenyangfb · 2024-09-25T18:05:56Z

Description

Currently ChunkedSliceOutput keep buffers in bufferPool and usedBuffers, it never free those buffer, even with reset(), leads to extra memory usage and OOM.

This PR avoid those extra memory usage by freeing unused buffer in chunk supplier during reset. This behavior is controlled by resetOutputBuffer in OrcWriterOptions, and it's disabled by default.

Example ChunkedSliceOutput behaviour before and after the change
Assume first batch of output is large, it used 3.75M
after writing those output, and before we call reset()
the usedBuffers contains a list of buffer with increasing sizes (assuming min buffer is 256k, max buffer is 1M for illustration purpose)
256k, 512k, 1M, 1M, 1M

The second batch of output is smaller, it will only used 1.75M
after writing those output, and before we call reset()
usedBuffers have 256k, 512k, 1M,
bufferPool have 1M, 1M

before the change
we will keep all 5 buffers, and will never free them
after the change
we will keep the 3 buffer which are used (256k, 512k, 1M) and free up the 2 buffer which are unused (1M, 1M)

In other word, previous behaviour only scale up number of buffers, new behaviour also can scale down number of buffer based on memory usage

Impact

Reduce memory usage in writer.

Test Plan

Tested with Spark workload

Example output showing unused buffer getting freed after the change

Reset unused buffers, used 17 chunks (8387584 bytes), unused 65 chunks (68157440 bytes)
Reset unused buffers, used 82 chunks (76545024 bytes), unused 4 chunks (4194304 bytes)

== RELEASE NOTES ==
General change
* Improve memory usage in writer by feeing unused buffers.

sdruzkin · 2025-04-20T02:47:09Z

Memory allocation is expensive and we added the pool for this reason. Please add performance benchmark results for a case comparing writing a 500MB file with 5-7 columns with the new option enabled and disabled.

steveburnett · 2025-05-06T15:49:00Z

Thanks for the release note! Just a nit or two suggested.

== RELEASE NOTES ==

General Changes
* Improve memory usage in writer by freeing unused buffers.

sdruzkin · 2025-05-06T19:53:40Z

presto-orc/src/main/java/com/facebook/presto/orc/ChunkedSliceOutput.java

+                bufferPool.clear();
+                System.setProperty("RESET_OUTPUT_BUFFER", "RESET_OUTPUT_BUFFER");
+            }
            bufferPool.addAll(0, usedBuffers);


If you want to save on the memory, do you really want to keep the last batch of chunks? I suppose you don't.

sdruzkin · 2025-05-06T20:02:59Z

presto-orc/src/main/java/com/facebook/presto/orc/ChunkedSliceOutput.java

+                        usedBuffers.stream().mapToInt(b -> b.length).sum(),
+                        bufferPool.size(),
+                        bufferPool.stream().mapToInt(b -> b.length).sum());
+                bufferPool.clear();


This is a tricky place. If you look at the ChunkSupplier.get method you will see that ChunkSupplier scales up chunks from 256 bytes all the way to 16MB when it needs to produce a new chunk. If you reset the pool and start producing new chunks you will eventually end up only with big 16MB new chunks. Which is probably opposite of your goal.

I don't know what is a best strategy here, because it's really depends on the data shapes, but I see a few options:

Have chunk supplier that does not have a buffer, and alway produces smaller fixed size chunks. Will perform best in terms of overhead, but will regress for small streams.

Have chunk supplier that does not have a buffer, and alway produces scaled up chunks with reset resetting the current size to min size. Good middle ground option.

Have chunk supplier that does not have a buffer, and alway produces scaled up chunks with reset resetting the current size to min size AND using smaller max chunk size. Will perform even better in terms of overhead, won't regress as much as 1 for small streams.

prestodb-ci · 2025-06-09T13:47:24Z

Saved that user @chenyangfb is from Meta

chenyangfb · 2025-06-09T17:39:19Z

Discussed with Sergii offline, ran Validation Service (Vader), results looks good, ~3.5k successful samples without failures https://fburl.com/scuba/dwrf_reader_checksum/crm55ef8

chenyangfb force-pushed the orc_reset_chunk_supplier branch from 01405f7 to 3f8ec38 Compare September 27, 2024 23:23

chenyangfb force-pushed the orc_reset_chunk_supplier branch from 3f8ec38 to 7cab405 Compare November 3, 2024 18:43

chenyangfb force-pushed the orc_reset_chunk_supplier branch 4 times, most recently from a6fd141 to 2d5d698 Compare April 14, 2025 16:45

chenyangfb changed the title ~~Free unused buffer in chunk supplier during reset~~ Reduce memory usage in writer by freing unused buffers Apr 14, 2025

chenyangfb changed the title ~~Reduce memory usage in writer by freing unused buffers~~ Reduce memory usage in writer by frefing unused buffers Apr 14, 2025

chenyangfb changed the title ~~Reduce memory usage in writer by frefing unused buffers~~ Reduce memory usage in writer by freeing unused buffers Apr 14, 2025

chenyangfb marked this pull request as ready for review April 14, 2025 17:44

chenyangfb requested review from a team and sdruzkin as code owners April 14, 2025 17:44

chenyangfb requested a review from jaystarshot April 14, 2025 17:44

chenyangfb mentioned this pull request Apr 14, 2025

Reduce memory usage in writer with more memory efficient output buffer implementation #24913

Merged

chenyangfb force-pushed the orc_reset_chunk_supplier branch from 2d5d698 to 68885be Compare April 18, 2025 23:51

chenyangfb force-pushed the orc_reset_chunk_supplier branch 2 times, most recently from 76c15f2 to e78aef2 Compare May 5, 2025 21:54

sdruzkin reviewed May 6, 2025

View reviewed changes

Support free unused buffer in output buffer chunk supplier

3fb5783

chenyangfb force-pushed the orc_reset_chunk_supplier branch from e78aef2 to 3fb5783 Compare June 6, 2025 16:07

tdcmeehan added the from:Meta PR from Meta label Jun 9, 2025

sdruzkin approved these changes Jun 13, 2025

View reviewed changes

sdruzkin merged commit f137ef2 into prestodb:master Jun 13, 2025
97 checks passed

unidevel mentioned this pull request Jun 19, 2025

Add release notes for 0.294 unix280/presto#33

Closed

6 tasks

This was referenced Jul 4, 2025

Add release notes for 0.294 unix280/presto#35

Merged

Add release notes for 0.294 unix280/presto#36

Closed

unidevel mentioned this pull request Jul 11, 2025

Add release notes for 0.294 unix280/presto#37

Closed

7 tasks

This was referenced Jul 24, 2025

Add release notes for 0.294 unix280/presto#39

Merged

Add release notes for 0.294 unix280/presto#40

Merged

prestodb-ci mentioned this pull request Jul 28, 2025

Add release notes for 0.294 #25633

Merged

6 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reduce memory usage in writer by freeing unused buffers#23724

Reduce memory usage in writer by freeing unused buffers#23724
sdruzkin merged 1 commit intoprestodb:masterfrom
chenyangfb:orc_reset_chunk_supplier

chenyangfb commented Sep 25, 2024 •

edited

Loading

Uh oh!

sdruzkin commented Apr 20, 2025 •

edited

Loading

Uh oh!

steveburnett commented May 6, 2025

Uh oh!

sdruzkin May 6, 2025

Uh oh!

sdruzkin May 6, 2025

Uh oh!

prestodb-ci commented Jun 9, 2025

Uh oh!

chenyangfb commented Jun 9, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Conversation

chenyangfb commented Sep 25, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Impact

Test Plan

Uh oh!

sdruzkin commented Apr 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

steveburnett commented May 6, 2025

Uh oh!

sdruzkin May 6, 2025

Choose a reason for hiding this comment

Uh oh!

sdruzkin May 6, 2025

Choose a reason for hiding this comment

Uh oh!

prestodb-ci commented Jun 9, 2025

Uh oh!

chenyangfb commented Jun 9, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

chenyangfb commented Sep 25, 2024 •

edited

Loading

sdruzkin commented Apr 20, 2025 •

edited

Loading