[WIP] Implement exchange spooling#10376
[WIP] Implement exchange spooling#10376linzebing wants to merge 14 commits intotrinodb:masterfrom linzebing:exchange-spooling
Conversation
|
Is "spooling" and "spilling" a different things? Do we reuse one for the other? |
Different. Not very related. Spooling is a single word name for the implementation of persistent exchange buffers (concept added in task level retries PR) which dump data to external filesystem. |
plugin/trino-exchange/src/main/java/io/trino/plugin/exchange/SecretKeySerializer.java
Outdated
Show resolved
Hide resolved
plugin/trino-exchange/src/main/java/io/trino/plugin/exchange/FileSystemExchange.java
Outdated
Show resolved
Hide resolved
plugin/trino-exchange/src/main/java/io/trino/plugin/exchange/FileSystemExchange.java
Outdated
Show resolved
Hide resolved
plugin/trino-exchange/src/main/java/io/trino/plugin/exchange/FileSystemExchange.java
Outdated
Show resolved
Hide resolved
plugin/trino-exchange/src/main/java/io/trino/plugin/exchange/FileSystemExchange.java
Outdated
Show resolved
Hide resolved
plugin/trino-exchange/src/main/java/io/trino/plugin/exchange/FileSystemExchangeManager.java
Outdated
Show resolved
Hide resolved
plugin/trino-exchange/src/main/java/io/trino/plugin/exchange/FileSystemExchangeManager.java
Outdated
Show resolved
Hide resolved
plugin/trino-exchange/src/main/java/io/trino/plugin/exchange/FileSystemExchangeManager.java
Outdated
Show resolved
Hide resolved
plugin/trino-exchange/src/main/java/io/trino/plugin/exchange/FileSystemExchangeManager.java
Outdated
Show resolved
Hide resolved
plugin/trino-exchange/src/main/java/io/trino/plugin/exchange/FileSystemExchangeSink.java
Outdated
Show resolved
Hide resolved
plugin/trino-exchange/src/main/java/io/trino/plugin/exchange/ExchangeStorageWriter.java
Outdated
Show resolved
Hide resolved
plugin/trino-exchange/src/main/java/io/trino/plugin/exchange/FileSystemExchangeStorage.java
Outdated
Show resolved
Hide resolved
...in/trino-exchange/src/main/java/io/trino/plugin/exchange/s3/S3FileSystemExchangeStorage.java
Outdated
Show resolved
Hide resolved
...in/trino-exchange/src/main/java/io/trino/plugin/exchange/s3/S3FileSystemExchangeStorage.java
Outdated
Show resolved
Hide resolved
plugin/trino-exchange/src/test/java/io/trino/plugin/exchange/FileSystemExchangeQueryRunner.java
Outdated
Show resolved
Hide resolved
plugin/trino-hive/src/test/java/io/trino/plugin/hive/AbstractHiveConnectorTest.java
Outdated
Show resolved
Hide resolved
plugin/trino-hive/src/test/java/io/trino/plugin/hive/HiveQueryRunner.java
Outdated
Show resolved
Hide resolved
plugin/trino-exchange/src/main/java/io/trino/plugin/exchange/FileSystemExchangeSink.java
Outdated
Show resolved
Hide resolved
plugin/trino-exchange/src/main/java/io/trino/plugin/exchange/FileSystemExchangeSink.java
Outdated
Show resolved
Hide resolved
plugin/trino-exchange/src/main/java/io/trino/plugin/exchange/FileSystemExchangeSink.java
Outdated
Show resolved
Hide resolved
plugin/trino-exchange/src/main/java/io/trino/plugin/exchange/FileSystemExchangeSink.java
Outdated
Show resolved
Hide resolved
|
I realized that it might be better to implement parallel read illustrated using a producer-consumer model, instead of async read via double buffering. I will test both. |
plugin/trino-exchange/src/test/java/io/trino/plugin/exchange/containers/Minio.java
Outdated
Show resolved
Hide resolved
plugin/trino-exchange/src/test/java/io/trino/plugin/exchange/containers/S3MinioStorage.java
Outdated
Show resolved
Hide resolved
...rino-exchange/src/test/java/io/trino/plugin/exchange/s3/TestS3FileSystemExchangeManager.java
Outdated
Show resolved
Hide resolved
plugin/trino-exchange/src/main/java/io/trino/plugin/exchange/s3/ExchangeS3Config.java
Outdated
Show resolved
Hide resolved
plugin/trino-exchange/src/main/java/io/trino/plugin/exchange/s3/ExchangeS3Config.java
Outdated
Show resolved
Hide resolved
plugin/trino-exchange/src/main/java/io/trino/plugin/exchange/FileSystemExchangeSource.java
Outdated
Show resolved
Hide resolved
|
Thanks. Looks good. I did not reread everything. I assume changes in the old commits were (mostly) addressing of comments. Please point me to a specific piece of code if you feel I should look at it. It would be nice if github exposed force-push timeline for PRs and allowed to get a diff including all the changes since I last reviewed that. |
|
Resolves #9936 |
Running out of heap space usually indicates a memory accounting problem. Also the number of partitions configured for those tests is only Also I found a different memory issue related to S3 (#10464) and I disabled streaming upload for MinIO based tests for now (b555618). I wonder if it could be anyhow related? |
|
@arhimondr I will try to get a heap dump |
plugin/trino-exchange/src/main/java/io/trino/plugin/exchange/ExchangeStorageWriter.java
Outdated
Show resolved
Hide resolved
plugin/trino-exchange/src/main/java/io/trino/plugin/exchange/FileSystemExchangeManager.java
Outdated
Show resolved
Hide resolved
plugin/trino-exchange/src/main/java/io/trino/plugin/exchange/FileSystemExchangeManager.java
Outdated
Show resolved
Hide resolved
plugin/trino-exchange/src/main/java/io/trino/plugin/exchange/FileSystemExchangeSinkHandle.java
Outdated
Show resolved
Hide resolved
...-hive/src/test/java/io/trino/plugin/hive/TestHiveFaultTolerantExecutionJoinQueriesMinio.java
Outdated
Show resolved
Hide resolved
plugin/trino-exchange/src/main/java/io/trino/plugin/exchange/ExchangeStorageWriter.java
Outdated
Show resolved
Hide resolved
plugin/trino-exchange/src/main/java/io/trino/plugin/exchange/FileSystemExchangeSource.java
Outdated
Show resolved
Hide resolved
plugin/trino-exchange/src/main/java/io/trino/plugin/exchange/FileSystemExchangeSource.java
Outdated
Show resolved
Hide resolved
...in/trino-exchange/src/main/java/io/trino/plugin/exchange/s3/S3FileSystemExchangeStorage.java
Show resolved
Hide resolved
...in/trino-exchange/src/main/java/io/trino/plugin/exchange/s3/S3FileSystemExchangeStorage.java
Show resolved
Hide resolved
...in/trino-exchange/src/main/java/io/trino/plugin/exchange/s3/S3FileSystemExchangeStorage.java
Show resolved
Hide resolved
...in/trino-exchange/src/main/java/io/trino/plugin/exchange/s3/S3FileSystemExchangeStorage.java
Show resolved
Hide resolved
|
|
||
| try { | ||
| List<CompletedPart> completedParts = uploadFutures.stream() | ||
| .map(CompletableFuture::join) |
There was a problem hiding this comment.
This will cause the close method to block for potentially a long time (as it will have to wait for all parts to finish upload). Currently if you look at the call stack it is not designed to block for a very long time. The call to ExchangeSink#finish is done from the OutputBuffer#noMorePages that is executed from a rather tiny executor that can easily run out of threads.
This is actually my mistake. Now after carefully looking at the implementation of other output buffers it looks like ExchangeSink#finish and ExchangeSink#abort should be asynchronous and return a Future. I'm going to address this issue in the main PR.
plugin/trino-exchange/src/main/java/io/trino/plugin/exchange/FileSystemExchangeSink.java
Show resolved
Hide resolved
plugin/trino-exchange/src/main/java/io/trino/plugin/exchange/FileSystemExchangeSink.java
Show resolved
Hide resolved
plugin/trino-exchange/src/main/java/io/trino/plugin/exchange/FileSystemExchangeSink.java
Show resolved
Hide resolved
plugin/trino-exchange/src/main/java/io/trino/plugin/exchange/FileSystemExchangeSink.java
Show resolved
Hide resolved
| } | ||
| } | ||
| try { | ||
| exchangeStorage.createEmptyFile(outputDirectory.resolve(COMMITTED_MARKER_FILE_NAME)); |
There was a problem hiding this comment.
Same here, I wonder if this should be non blocking
This PR adds
trino-exchangeplugin to Trino, which contains a local file system implementation as well as a S3-based implementation.