Evenly distribute exchange spooling data across different shards#11987
Merged
arhimondr merged 3 commits intotrinodb:masterfrom Apr 20, 2022
Merged
Evenly distribute exchange spooling data across different shards#11987arhimondr merged 3 commits intotrinodb:masterfrom
arhimondr merged 3 commits intotrinodb:masterfrom
Conversation
arhimondr
reviewed
Apr 18, 2022
plugin/trino-exchange/src/main/java/io/trino/plugin/exchange/FileSystemExchange.java
Outdated
Show resolved
Hide resolved
plugin/trino-exchange/src/main/java/io/trino/plugin/exchange/FileSystemExchange.java
Outdated
Show resolved
Hide resolved
eeeeb2c to
bf7f12f
Compare
arhimondr
reviewed
Apr 18, 2022
plugin/trino-exchange/src/main/java/io/trino/plugin/exchange/FileSystemExchange.java
Outdated
Show resolved
Hide resolved
plugin/trino-exchange/src/main/java/io/trino/plugin/exchange/FileSystemExchangeManager.java
Outdated
Show resolved
Hide resolved
bf7f12f to
e757d1f
Compare
arhimondr
approved these changes
Apr 18, 2022
plugin/trino-exchange/src/main/java/io/trino/plugin/exchange/FileSystemExchange.java
Outdated
Show resolved
Hide resolved
bfff687 to
c7223b5
Compare
losipiuk
reviewed
Apr 19, 2022
Member
There was a problem hiding this comment.
nit: passing i as taskPartitionId is somewhat ugly, and exploits the implementation details of getExchangeDirectory (that it uses partitionId modulo number of base directories to determine final directory shape).
A cleaner approach would be to have explicitly method List<URI> getAllExchangeDirectories() and use it in close and in initialize
Member
Author
There was a problem hiding this comment.
With the last commit, this part has been changed to delete all task output directories. I will create a follow up PR to delete the directories in batches.
losipiuk
reviewed
Apr 19, 2022
plugin/trino-exchange/src/main/java/io/trino/plugin/exchange/FileSystemExchangeModule.java
Outdated
Show resolved
Hide resolved
losipiuk
reviewed
Apr 19, 2022
core/trino-main/src/test/java/io/trino/operator/TestDeduplicatingDirectExchangeBuffer.java
Outdated
Show resolved
Hide resolved
losipiuk
reviewed
Apr 19, 2022
plugin/trino-exchange/src/main/java/io/trino/plugin/exchange/FileSystemExchange.java
Outdated
Show resolved
Hide resolved
7f393c6 to
dda1610
Compare
Add a randomized prefix to evenly distribute data into different S3 shards. Data output file path format: {randomizedPrefix}.{queryId}.{stageId}.{sinkPartitionId}/{attemptId}/{sourcePartitionId}_{splitId}.data
dda1610 to
8c6a4c1
Compare
arhimondr
approved these changes
Apr 19, 2022
losipiuk
approved these changes
Apr 20, 2022
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
Improvement.
trino-exchange
Sometimes we run into cases where we get throttled by S3.
Related issues, pull requests, and links
Documentation
() No documentation is needed.
(x) Sufficient documentation is included in this PR.
( ) Documentation PR is available with #prnumber.
( ) Documentation issue #issuenumber is filed, and can be handled later.
Release notes
(x) No release notes entries required.
( ) Release notes entries required with the following suggested text: