Split spooling partition output file into multiple files when size is large #11745
Split spooling partition output file into multiple files when size is large #11745arhimondr merged 4 commits intotrinodb:masterfrom
Conversation
arhimondr
left a comment
There was a problem hiding this comment.
Looks good, a couple of comments
plugin/trino-exchange/src/main/java/io/trino/plugin/exchange/FileSystemExchangeConfig.java
Outdated
Show resolved
Hide resolved
plugin/trino-exchange/src/main/java/io/trino/plugin/exchange/FileSystemExchangeSink.java
Outdated
Show resolved
Hide resolved
plugin/trino-exchange/src/main/java/io/trino/plugin/exchange/FileSystemExchangeSink.java
Outdated
Show resolved
Hide resolved
...xchange/src/test/java/io/trino/plugin/exchange/local/TestLocalFileSystemExchangeManager.java
Outdated
Show resolved
Hide resolved
plugin/trino-exchange/src/test/java/io/trino/plugin/exchange/containers/MinioStorage.java
Outdated
Show resolved
Hide resolved
4eab2b4 to
ddf51b0
Compare
plugin/trino-exchange/src/main/java/io/trino/plugin/exchange/FileSystemExchangeSink.java
Outdated
Show resolved
Hide resolved
ddf51b0 to
9181009
Compare
04cccf5 to
a217e73
Compare
arhimondr
left a comment
There was a problem hiding this comment.
nit: Missed a new line after the Split partition output file into multiple files when size is large commit title
plugin/trino-exchange/src/main/java/io/trino/plugin/exchange/FileSystemExchangeSink.java
Outdated
Show resolved
Hide resolved
|
nit: I was looking at Could you please convert these values into local variables in the test method? |
plugin/trino-exchange/src/main/java/io/trino/plugin/exchange/FileSystemExchangeSink.java
Outdated
Show resolved
Hide resolved
b0c1e3f to
c26c935
Compare
There was a problem hiding this comment.
I would suggest using softer naming: sink-target-file-size. Technically we can overshoot the size if a single row is larger than the configured value. "target" naming also allows for more elastic tweaks in implementation without mismanaging user expectations.
If you change name here update also in-code variable names.
There was a problem hiding this comment.
I added checkArgument to make sure max page storage size is no larger than max file size
plugin/trino-exchange/src/main/java/io/trino/plugin/exchange/FileSystemExchangeSink.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Keeping track of currentFileSize feels like ExchangeStorageWriter responsibility. Consider moving it there and exposing long ExchangeStorageWriter.writtenBytes()
There was a problem hiding this comment.
That's reasonable, but that will require us to deduplicate logic in LocalFileSystemExchangeStorage, S3FileSystemExchangeStorage, and in the future Azure and GCS too. So I feel it might be better to just track it here.
plugin/trino-exchange/src/test/java/io/trino/plugin/exchange/AbstractTestExchangeManager.java
Outdated
Show resolved
Hide resolved
c26c935 to
57a5afc
Compare
Based on exchangeSinkMaxFileSize, we write multiple files to improve read parallelism. Spool file uses the naming convention {partitionId}_{partNumber}.data.
E.g., suppose max file size is 1GB, and previously we write a file 0.data of 3.2GB, now we will write four files 0_0.data, 0_1.data, 0_2.data, 0_3.data.
57a5afc to
3ed9489
Compare
Description
Improvement.
trino-exchange plugin
In fault tolerant execution, if spooling file is large, we actually want it to be split into multiple reasonably sized files such that we can exploit more parallelism when reading the data. This PR splits large spooling files based on target size.
Suppose target size is 1GB, and previously we write a single file named
0.dataof 3.2GB, after this change, we will write four files named0_0.data,0_1.data,0_2.data,0_3.data, in the form of{partitionId}_{partNum}.data, the sizes will be approximately 1GB, 1GB, 1GB, 0.2GB.Related issues, pull requests, and links
Documentation
() No documentation is needed.
(x) Sufficient documentation is included in this PR.
( ) Documentation PR is available with #prnumber.
( ) Documentation issue #issuenumber is filed, and can be handled later.
Release notes
(x) No release notes entries required.
() Release notes entries required with the following suggested text: