Support exchange spooling on GCS by linzebing · Pull Request #12360 · trinodb/trino

linzebing · 2022-05-12T22:17:46Z

Description

Is this change a fix, improvement, new feature, refactoring, or other?

New feature.

Is this a change to the core query engine, a connector, client library, or the SPI interfaces? (be specific)

trino-exchange-filesystem

How would you describe this change to a non-technical end user or system administrator?

This PR adds support for exchange spooling on GCS. GCS is mostly S3-compatible, except for two minor incompatibilities.

An example exchange-manager.properties:

exchange-manager.name=filesystem
exchange.base-directories=gs://your-bucket-name
exchange.s3.region=us-west-1
exchange.s3.aws-access-key=your-google-access-key-id
exchange.s3.aws-secret-key=your-google-access-key-secret
exchange.s3.endpoint=https://storage.googleapis.com

Related issues, pull requests, and links

Documentation

( ) No documentation is needed.
( ) Sufficient documentation is included in this PR.
( ) Documentation PR is available with #prnumber.
(x) Documentation issue #issuenumber is filed, and can be handled later.
#12467

Release notes

( ) No release notes entries required.
(x) Release notes entries required with the following suggested text:

# Section
* Support exchange spooling on Google Cloud Storage.
* Dropped exchange spooling support for legacy S3 schemes s3n:// and s3a://.

...e-filesystem/src/main/java/io/trino/plugin/exchange/filesystem/FileSystemExchangeModule.java

...system/src/main/java/io/trino/plugin/exchange/filesystem/s3/S3FileSystemExchangeStorage.java

mosabua · 2022-05-17T19:26:24Z

This will need docs .. please work with @colebow on adding this.

...system/src/main/java/io/trino/plugin/exchange/filesystem/s3/S3FileSystemExchangeStorage.java

arhimondr · 2022-05-18T20:36:24Z

...system/src/main/java/io/trino/plugin/exchange/filesystem/s3/S3FileSystemExchangeStorage.java

This can in theory grow indefinitely. In FileSystemExchange#close we call deleteRecursively in a loop for each task. This may result in huge spikes in number of threads (hundreds or even thousands). I would recommend going with a bounded executor with the number of threads set to desired concurrency, e.g.:

ThreadPoolExecutor executor = new ThreadPoolExecutor( maximumConcurrency, maximumConcurrency, 10, SECONDS, new LinkedBlockingQueue<>(), threadsNamed("gcs-delete-%s"));

I don't know what value do we want to pick for max concurrency though, maybe 50? 100?

Also it is a generally good idea to set executor.allowCoreThreadTimeOut(true); to let the inactive threads be reclaimed after a spike.

It might also be reasonable to improve FileSystemExchange#close to batch delete requests across multiple partitions.

Yeah, I think it's better to batch delete requests. Changing it right now.

With batching, I think directly using cachedExecutor will be sufficient.

...system/src/main/java/io/trino/plugin/exchange/filesystem/s3/S3FileSystemExchangeStorage.java

linzebing · 2022-05-19T22:17:42Z

I have changed to batch the batch deletes, such to minimize API calls to GCS.

Wonder if we should do the same for Azure and S3. Currently we are deleting a task output directory at a time. In theory, we can do something similar to GCS, collect all the objects into a list, and batch delete them. @arhimondr @losipiuk

losipiuk · 2022-05-20T14:12:39Z

I have changed to batch the batch deletes, such to minimize API calls to GCS.

Wonder if we should do the same for Azure and S3. Currently we are deleting a task output directory at a time. In theory, we can do something similar to GCS, collect all the objects into a list, and batch delete them. @arhimondr @losipiuk

Would make sense IMO. Good catch.

...system/src/main/java/io/trino/plugin/exchange/filesystem/s3/S3FileSystemExchangeStorage.java

...e-filesystem/src/main/java/io/trino/plugin/exchange/filesystem/FileSystemExchangeModule.java

arhimondr

LGTM % comment

...system/src/main/java/io/trino/plugin/exchange/filesystem/s3/S3FileSystemExchangeStorage.java

linzebing · 2022-05-20T18:29:01Z

Addressed comments. On batching deletes for S3 and Azure, decided to do it in a separate PR. It's a bit more complex than I thought as I need to deal with multiple buckets

arhimondr · 2022-05-20T18:42:09Z

...system/src/main/java/io/trino/plugin/exchange/filesystem/s3/S3FileSystemExchangeStorage.java

nit: one parameter per line, static import listeningDecorator

Also set the core pool size to 100, otherwise it will keep running only a single thread until the queue is full

Ah you are right. I'm using a SynchronousQueue here, which basically has a size of 0, and if concurrent tasks exceed 100, rejection will happen. Your suggestion above is better.

cla-bot bot added the cla-signed label May 12, 2022

linzebing requested review from arhimondr and losipiuk May 12, 2022 22:17

electrum reviewed May 12, 2022

View reviewed changes

...e-filesystem/src/main/java/io/trino/plugin/exchange/filesystem/FileSystemExchangeModule.java Outdated Show resolved Hide resolved

electrum reviewed May 12, 2022

View reviewed changes

...system/src/main/java/io/trino/plugin/exchange/filesystem/s3/S3FileSystemExchangeStorage.java Outdated Show resolved Hide resolved

...system/src/main/java/io/trino/plugin/exchange/filesystem/s3/S3FileSystemExchangeStorage.java Outdated Show resolved Hide resolved

losipiuk reviewed May 13, 2022

View reviewed changes

...system/src/main/java/io/trino/plugin/exchange/filesystem/s3/S3FileSystemExchangeStorage.java Outdated Show resolved Hide resolved

linzebing force-pushed the gcs branch from 53c289f to 2f33773 Compare May 18, 2022 01:24

github-actions bot added the tests:hive label May 18, 2022

linzebing force-pushed the gcs branch from 2f33773 to b241af8 Compare May 18, 2022 03:55

linzebing requested review from electrum and losipiuk May 18, 2022 06:54

losipiuk reviewed May 18, 2022

View reviewed changes

...system/src/main/java/io/trino/plugin/exchange/filesystem/s3/S3FileSystemExchangeStorage.java Outdated Show resolved Hide resolved

losipiuk reviewed May 18, 2022

View reviewed changes

...system/src/main/java/io/trino/plugin/exchange/filesystem/s3/S3FileSystemExchangeStorage.java Outdated Show resolved Hide resolved

losipiuk reviewed May 18, 2022

View reviewed changes

...system/src/main/java/io/trino/plugin/exchange/filesystem/s3/S3FileSystemExchangeStorage.java Outdated Show resolved Hide resolved

losipiuk reviewed May 18, 2022

View reviewed changes

...system/src/main/java/io/trino/plugin/exchange/filesystem/s3/S3FileSystemExchangeStorage.java Outdated Show resolved Hide resolved

losipiuk reviewed May 18, 2022

View reviewed changes

...system/src/main/java/io/trino/plugin/exchange/filesystem/s3/S3FileSystemExchangeStorage.java Outdated Show resolved Hide resolved

linzebing force-pushed the gcs branch 2 times, most recently from e02cbc1 to 75ff0a5 Compare May 18, 2022 20:11

linzebing mentioned this pull request May 18, 2022

Add documentation for exchange spooling on Azure and GCS #12467

Closed

arhimondr reviewed May 18, 2022

View reviewed changes

linzebing force-pushed the gcs branch from 75ff0a5 to b6459ce Compare May 18, 2022 23:22

linzebing mentioned this pull request May 19, 2022

Add docs on configuring Azure/GCS for exchange spooling #12472

Merged

linzebing requested review from arhimondr and losipiuk May 19, 2022 22:14

losipiuk reviewed May 20, 2022

View reviewed changes

...system/src/main/java/io/trino/plugin/exchange/filesystem/s3/S3FileSystemExchangeStorage.java Outdated Show resolved Hide resolved

losipiuk reviewed May 20, 2022

View reviewed changes

...system/src/main/java/io/trino/plugin/exchange/filesystem/s3/S3FileSystemExchangeStorage.java Outdated Show resolved Hide resolved

losipiuk reviewed May 20, 2022

View reviewed changes

...e-filesystem/src/main/java/io/trino/plugin/exchange/filesystem/FileSystemExchangeModule.java Outdated Show resolved Hide resolved

losipiuk approved these changes May 20, 2022

View reviewed changes

arhimondr approved these changes May 20, 2022

View reviewed changes

...system/src/main/java/io/trino/plugin/exchange/filesystem/s3/S3FileSystemExchangeStorage.java Outdated Show resolved Hide resolved

linzebing force-pushed the gcs branch from b6459ce to a901205 Compare May 20, 2022 18:13

arhimondr approved these changes May 20, 2022

View reviewed changes

linzebing added 3 commits May 20, 2022 14:00

Support exchange spooling on GCS

fd5befe

Drop support for s3n/s3a in exchange spooling

a169208

Batch delete directories for GCS exchange spooling

924e170

linzebing force-pushed the gcs branch from a901205 to 924e170 Compare May 20, 2022 21:31

linzebing mentioned this pull request May 23, 2022

Batch delete requests for exchange spooling on S3 and Azure #12511

Merged

arhimondr merged commit 2726bdf into trinodb:master May 24, 2022

github-actions bot added this to the 382 milestone May 24, 2022

mosabua mentioned this pull request May 24, 2022

Add Trino 382 release notes #12440

Merged

Conversation

linzebing commented May 12, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Related issues, pull requests, and links

Documentation

Release notes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

mosabua commented May 17, 2022

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

arhimondr May 18, 2022

Choose a reason for hiding this comment

Uh oh!

linzebing May 18, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

linzebing commented May 19, 2022

Uh oh!

losipiuk commented May 20, 2022

Uh oh!

Uh oh!

Uh oh!

Uh oh!

arhimondr left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

linzebing commented May 20, 2022

Uh oh!

arhimondr May 20, 2022

Choose a reason for hiding this comment

Uh oh!

arhimondr May 20, 2022

Choose a reason for hiding this comment

Uh oh!

linzebing May 20, 2022

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

5 participants

linzebing commented May 12, 2022 •

edited

Loading

linzebing May 18, 2022 •

edited

Loading