Skip to content

Commit 3f0873d

Browse files
committed
Improve docs for Redshift parallel read
1 parent d09fac6 commit 3f0873d

File tree

1 file changed

+44
-37
lines changed

1 file changed

+44
-37
lines changed

docs/src/main/sphinx/connector/redshift.md

Lines changed: 44 additions & 37 deletions
Original file line numberDiff line numberDiff line change
@@ -64,43 +64,6 @@ documentation](https://docs.aws.amazon.com/redshift/latest/mgmt/jdbc20-configura
6464
```{include} jdbc-authentication.fragment
6565
```
6666

67-
### UNLOAD configuration
68-
69-
This feature enables using Amazon S3 to efficiently transfer data out of Redshift
70-
instead of the default single threaded JDBC based implementation.
71-
The connector automatically triggers the appropriate `UNLOAD` command
72-
on Redshift to extract the output from Redshift to the configured
73-
S3 bucket in the form of Parquet files. These Parquet files are read in parallel
74-
from S3 to improve latency of reading from Redshift tables. The Parquet
75-
files will be removed when Trino finishes executing the query. It is recommended
76-
to define a custom life cycle policy on the S3 bucket used for unloading the
77-
Redshift query results.
78-
This feature is supported only when the Redshift cluster and the configured S3
79-
bucket are in the same AWS region.
80-
81-
The following table describes configuration properties for using
82-
`UNLOAD` command in Redshift connector. `redshift.unload-location` must be set
83-
to use `UNLOAD`.
84-
85-
:::{list-table} UNLOAD configuration properties
86-
:widths: 30, 60
87-
:header-rows: 1
88-
89-
* - Property value
90-
- Description
91-
* - `redshift.unload-location`
92-
- A writeable location in Amazon S3, to be used for temporarily unloading
93-
Redshift query results.
94-
* - `redshift.unload-iam-role`
95-
- Optional. Fully specified ARN of the IAM Role attached to the Redshift cluster.
96-
Provided role will be used in `UNLOAD` command. IAM role must have access to
97-
Redshift cluster and write access to S3 bucket. The default IAM role attached to
98-
Redshift cluster is used when this property is not configured.
99-
:::
100-
101-
Additionally, define appropriate [S3 configurations](/object-storage/file-system-s3)
102-
except `fs.native-s3.enabled`, required to read Parquet files from S3 bucket.
103-
10467
### Multiple Redshift databases or clusters
10568

10669
The Redshift connector can only access a single database within
@@ -255,3 +218,47 @@ FROM
255218

256219
```{include} query-table-function-ordering.fragment
257220
```
221+
222+
## Performance
223+
224+
The connector includes a number of performance improvements, detailed in the
225+
following sections.
226+
227+
### Parallel read via S3
228+
229+
The connector supports the Redshift `UNLOAD` command to transfer data to Parquet
230+
files on S3. This enables parallel read of the data in Trino instead of the
231+
default, single-threaded JDBC-based connection to Redshift, used by the
232+
connector.
233+
234+
Configure the required S3 location with `redshift.unload-location` to enable the
235+
parallel read. Parquet files are automatically removed with query completion.
236+
The Redshift cluster and the configured S3 bucket must use the same AWS region.
237+
238+
:::{list-table} Parallel read configuration properties
239+
:widths: 30, 60
240+
:header-rows: 1
241+
242+
* - Property value
243+
- Description
244+
* - `redshift.unload-location`
245+
- A writeable location in Amazon S3 in the same AWS region as the Redshift
246+
cluster. Used for temporary storage during query processing using the
247+
`UNLOAD` command from Redshift. To ensure cleanup even for failed automated
248+
removal, configure a life cycle policy to auto clean up the bucket
249+
regularly.
250+
* - `redshift.unload-iam-role`
251+
- Optional. Fully specified ARN of the IAM Role attached to the Redshift
252+
cluster to use for the `UNLOAD` command. The role must have read access to
253+
the Redshift cluster and write access to the S3 bucket. Defaults to use the
254+
default IAM role attached to the Redshift cluster.
255+
256+
:::
257+
258+
Use the `unload_enabled` [catalog session property](/sql/set-session) to
259+
deactivate the parallel read during a client session for a specific query, and
260+
potentially re-activate it again afterwards.
261+
262+
Additionally, define further required [S3 configuration such as IAM key, role,
263+
or region](/object-storage/file-system-s3), except `fs.native-s3.enabled`,
264+

0 commit comments

Comments
 (0)