Skip to content

Split spooling partition output file into multiple files when size is large #11745

Merged
arhimondr merged 4 commits intotrinodb:masterfrom
linzebing:split-large-spooling-file
Apr 2, 2022
Merged

Split spooling partition output file into multiple files when size is large #11745
arhimondr merged 4 commits intotrinodb:masterfrom
linzebing:split-large-spooling-file

Conversation

@linzebing
Copy link
Member

@linzebing linzebing commented Mar 31, 2022

Description

Is this change a fix, improvement, new feature, refactoring, or other?

Improvement.

Is this a change to the core query engine, a connector, client library, or the SPI interfaces? (be specific)

trino-exchange plugin

How would you describe this change to a non-technical end user or system administrator?

In fault tolerant execution, if spooling file is large, we actually want it to be split into multiple reasonably sized files such that we can exploit more parallelism when reading the data. This PR splits large spooling files based on target size.

Suppose target size is 1GB, and previously we write a single file named 0.data of 3.2GB, after this change, we will write four files named 0_0.data, 0_1.data, 0_2.data, 0_3.data, in the form of {partitionId}_{partNum}.data, the sizes will be approximately 1GB, 1GB, 1GB, 0.2GB.

Related issues, pull requests, and links

Documentation

() No documentation is needed.
(x) Sufficient documentation is included in this PR.
( ) Documentation PR is available with #prnumber.
( ) Documentation issue #issuenumber is filed, and can be handled later.

Release notes

(x) No release notes entries required.
() Release notes entries required with the following suggested text:

# Section
* Fix some things. ({issue}`issuenumber`)

@cla-bot cla-bot bot added the cla-signed label Mar 31, 2022
@linzebing linzebing requested review from arhimondr and losipiuk March 31, 2022 23:07
@github-actions github-actions bot added the docs label Mar 31, 2022
Copy link
Contributor

@arhimondr arhimondr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, a couple of comments

@linzebing linzebing force-pushed the split-large-spooling-file branch from 4eab2b4 to ddf51b0 Compare April 1, 2022 02:26
@linzebing linzebing force-pushed the split-large-spooling-file branch from ddf51b0 to 9181009 Compare April 1, 2022 02:38
@linzebing linzebing force-pushed the split-large-spooling-file branch 3 times, most recently from 04cccf5 to a217e73 Compare April 1, 2022 02:46
Copy link
Contributor

@arhimondr arhimondr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Missed a new line after the Split partition output file into multiple files when size is large commit title

@arhimondr
Copy link
Contributor

nit: I was looking at testLargePages and I realized that we are allocating large strings and storing them as static fields. That's >21MB of memory wasted: https://github.com/trinodb/trino/blob/master/plugin/trino-exchange/src/test/java/io/trino/plugin/exchange/AbstractTestExchangeManager.java#L53

Could you please convert these values into local variables in the test method?

@linzebing linzebing force-pushed the split-large-spooling-file branch 2 times, most recently from b0c1e3f to c26c935 Compare April 1, 2022 07:09
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would suggest using softer naming: sink-target-file-size. Technically we can overshoot the size if a single row is larger than the configured value. "target" naming also allows for more elastic tweaks in implementation without mismanaging user expectations.
If you change name here update also in-code variable names.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added checkArgument to make sure max page storage size is no larger than max file size

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Keeping track of currentFileSize feels like ExchangeStorageWriter responsibility. Consider moving it there and exposing long ExchangeStorageWriter.writtenBytes()

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's reasonable, but that will require us to deduplicate logic in LocalFileSystemExchangeStorage, S3FileSystemExchangeStorage, and in the future Azure and GCS too. So I feel it might be better to just track it here.

Copy link
Member

@losipiuk losipiuk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@findepi findepi changed the title Split partition output file into multiple files when size is large Split spooling's partition output file into multiple files when size is large Apr 1, 2022
@linzebing linzebing changed the title Split spooling's partition output file into multiple files when size is large Split spooling partition output file into multiple files when size is large Apr 1, 2022
@linzebing linzebing force-pushed the split-large-spooling-file branch from c26c935 to 57a5afc Compare April 1, 2022 18:21
Based on exchangeSinkMaxFileSize, we write multiple files to improve read parallelism. Spool file uses the naming convention {partitionId}_{partNumber}.data.
E.g., suppose max file size is 1GB, and previously we write a file 0.data of 3.2GB, now we will write four files 0_0.data, 0_1.data, 0_2.data, 0_3.data.
@linzebing linzebing force-pushed the split-large-spooling-file branch from 57a5afc to 3ed9489 Compare April 1, 2022 18:25
@arhimondr arhimondr merged commit c7bc38f into trinodb:master Apr 2, 2022
@github-actions github-actions bot added this to the 376 milestone Apr 2, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Development

Successfully merging this pull request may close these issues.

3 participants