-
Notifications
You must be signed in to change notification settings - Fork 3.6k
Add docs on configuring Azure/GCS for exchange spooling #12472
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -34,7 +34,7 @@ The following configuration properties control the behavior of fault-tolerant | |
| execution on a Trino cluster: | ||
|
|
||
| .. list-table:: Fault-tolerant execution configuration properties | ||
| :widths: 30, 40, 30 | ||
| :widths: 30, 50, 20 | ||
| :header-rows: 1 | ||
|
|
||
| * - Property name | ||
|
|
@@ -134,7 +134,7 @@ The following configuration properties control the thresholds at which | |
| queries/tasks are no longer retried in the event of repeated failures: | ||
|
|
||
| .. list-table:: Fault tolerance retry limit configuration properties | ||
| :widths: 30, 40, 30, 30 | ||
| :widths: 30, 50, 20, 30 | ||
| :header-rows: 1 | ||
|
|
||
| * - Property name | ||
|
|
@@ -192,7 +192,7 @@ configuration properties to manually control task sizing. These configuration | |
| properties only apply to a ``TASK`` retry policy. | ||
|
|
||
| .. list-table:: Task sizing configuration properties | ||
| :widths: 30, 40, 30 | ||
| :widths: 30, 50, 20 | ||
| :header-rows: 1 | ||
|
|
||
| * - Property name | ||
|
|
@@ -253,7 +253,7 @@ the ``fault-tolerant-task-memory`` configuration property. This property only | |
| applies to a ``TASK`` retry policy. | ||
|
|
||
| .. list-table:: Node allocation configuration properties | ||
| :widths: 30, 40, 30 | ||
| :widths: 30, 50, 20 | ||
| :header-rows: 1 | ||
|
|
||
| * - Property name | ||
|
|
@@ -273,7 +273,7 @@ The following additional configuration property can be used to manage | |
| fault-tolerant execution: | ||
|
|
||
| .. list-table:: Other fault-tolerant execution configuration properties | ||
| :widths: 30, 40, 30, 30 | ||
| :widths: 30, 50, 20, 30 | ||
| :header-rows: 1 | ||
|
|
||
| * - Property name | ||
|
|
@@ -323,68 +323,94 @@ propertry to ``filesystem``, and additional configuration properties as needed | |
| for your storage solution. | ||
|
|
||
| .. list-table:: Exchange manager configuration properties | ||
| :widths: 30, 40, 30 | ||
| :widths: 30, 50, 20, 30 | ||
| :header-rows: 1 | ||
|
|
||
| * - Property name | ||
| - Description | ||
| - Default value | ||
| - Filesystem | ||
| * - ``exchange.base-directories`` | ||
| - Comma-separated list of URI locations that the exchange manager uses to | ||
| store spooling data. Only supports S3 and local filesystems. | ||
| - | ||
| - All | ||
| * - ``exchange.encryption-enabled`` | ||
| - Enable encrypting of spooling data. | ||
| - ``true`` | ||
| - All | ||
| * - ``exchange.sink-buffer-pool-min-size`` | ||
| - The minimum buffer pool size for an exchange sink. The larger the buffer | ||
| pool size, the larger the write parallelism and memory usage. | ||
| - ``10`` | ||
| - All | ||
| * - ``exchange.sink-buffers-per-partition`` | ||
| - The number of buffers per partition in the buffer pool. The larger the | ||
| buffer pool size, the larger the write parallelism and memory usage. | ||
| - ``2`` | ||
| - All | ||
| * - ``exchange.sink-max-file-size`` | ||
| - Max size of files written by exchange sinks. | ||
| - ``1GB`` | ||
| - All | ||
| * - ``exchange.source-concurrent-reader`` | ||
| - The number of concurrent readers to read from spooling storage. The | ||
| larger the number of concurrent readers, the larger the read parallelism | ||
| and memory usage. | ||
| - ``4`` | ||
| - All | ||
| * - ``exchange.s3.aws-access-key`` | ||
| - AWS access key to use. Required for a connection to AWS S3, can be | ||
| ignored for other S3 storage systems. | ||
| - AWS access key to use. Required for a connection to AWS S3 and GCS, can | ||
| be ignored for other S3 storage systems. | ||
| - | ||
| - AWS S3 and GCS | ||
| * - ``exchange.s3.aws-secret-key`` | ||
| - AWS secret key to use. Required for a connection to AWS S3, can be | ||
| ignored for other S3 storage systems. | ||
| - AWS secret key to use. Required for a connection to AWS S3 and GCS, can | ||
| be ignored for other S3 storage systems. | ||
| - | ||
| - AWS S3 and GCS | ||
| * - ``exchange.s3.region`` | ||
| - Region of the S3 bucket. | ||
| - | ||
| - AWS S3 and GCS | ||
| * - ``exchange.s3.endpoint`` | ||
| - S3 storage endpoint server if using an S3-compatible storage system that | ||
| is not AWS. If using AWS S3, can be ignored. | ||
| is not AWS. If using AWS S3, this can be ignored. If using GCS, set it | ||
| to ``https://storage.googleapis.com``. | ||
| - | ||
| - S3-compatible Storage | ||
| * - ``exchange.s3.max-error-retries`` | ||
| - Maximum number of times the exchange manager's S3 client should retry | ||
| a request. | ||
| - ``3`` | ||
| - S3-compatible Storage | ||
| * - ``exchange.s3.upload.part-size`` | ||
| - Part size for S3 multi-part upload. | ||
| - ``5MB`` | ||
| - S3-compatible Storage | ||
| * - ``exchange.gcs.json-key-file-path`` | ||
| - The path to the JSON file that contains your GCP service account key, | ||
| only applicable to using GCS as exchange spooling storage. | ||
| - | ||
| - GCS | ||
| * - ``exchange.azure.connection-string`` | ||
| - The connection string used to access the spooling container. | ||
| - | ||
| - Azure Blob Storage | ||
| * - ``exchange.azure.block-size`` | ||
| - Block size for Azure block blob parallel upload. | ||
| - ``4MB`` | ||
| - Azure Blob Storage | ||
|
|
||
| It is also recommended to set the ``exchange.compression-enabled`` property to | ||
| ``true`` in the cluster's ``config.properties`` file, to reduce the exchange | ||
| manager's overall I/O load. | ||
|
|
||
| S3 bucket storage | ||
| ~~~~~~~~~~~~~~~~~ | ||
| It is recommended to configure a bucket lifecycle rule to automatically expire | ||
| abandoned objects in the event of a node crash. | ||
|
|
||
| If using an AWS S3 bucket, it is recommended to configure a bucket lifecycle | ||
| rule in AWS to automatically expire abandoned objects in the event of a node | ||
| crash. | ||
| AWS S3 | ||
| ~~~~~~ | ||
|
|
||
| The following example ``exchange-manager.properties`` configuration specifies an | ||
| AWS S3 bucket as the spooling storage destination. Note that the destination | ||
|
|
@@ -393,8 +419,7 @@ does not have to be in AWS, but can be any S3-compatible storage system. | |
| .. code-block:: properties | ||
|
|
||
| exchange-manager.name=filesystem | ||
| exchange.base-directories=s3n://exchange-spooling-bucket | ||
| exchange.encryption-enabled=true | ||
| exchange.base-directories=s3://exchange-spooling-bucket | ||
| exchange.s3.region=us-west-1 | ||
| exchange.s3.aws-access-key=example-access-key | ||
| exchange.s3.aws-secret-key=example-secret-key | ||
|
|
@@ -409,7 +434,43 @@ load: | |
|
|
||
| .. code-block:: properties | ||
|
|
||
| exchange.base-directories=s3n://exchange-spooling-bucket-1,s3n://exchange-spooling-bucket-2 | ||
| exchange.base-directories=s3://exchange-spooling-bucket-1,s3://exchange-spooling-bucket-2 | ||
|
|
||
| Google Cloud Storage | ||
| ~~~~~~~~~~~~~~~~~~~~ | ||
|
|
||
| GCS is mostly S3-compatible, with a few caveats. To enable exchange spooling on GCS in | ||
| Trino, change the request endpoint to the Google Storage URI ``https://storage.googleapis.com``, | ||
| and then configure your AWS access/secret keys to use the GCS HMAC keys. If you deploy | ||
| Trino on GCP, you need to create a service account with access permission to your spooling | ||
| bucket; otherwise, you need to configure the key path to your GCS credential file. For | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I do not get the
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. No, if you GCS deployment is associated with a service account that has access to the spooling bucket, then default authentication will just work out of the box.
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. You mean GCP deployment, right? Makes sense. |
||
| more information on GCS's S3 compatibility, please refer to `Google's Cloud's documentation | ||
| on S3 migration <https://cloud.google.com/storage/docs/aws-simple-migration>`_. | ||
|
|
||
| The following example ``exchange-manager.properties`` configuration specifies a GCS bucket | ||
| as the spooling storage destination. | ||
|
|
||
| .. code-block:: properties | ||
|
|
||
| exchange-manager.name=filesystem | ||
| exchange.base-directories=abfs://container_name@account_name.dfs.core.windows.net | ||
| exchange.azure.connection-string=connection-string | ||
|
|
||
| Azure Blob Storage | ||
| ~~~~~~~~~~~~~~~~~~~~~~~ | ||
|
|
||
| The following example ``exchange-manager.properties`` configuration specifies an Azure | ||
| Blob Storage container as the spooling storage destination. | ||
|
|
||
| .. code-block:: properties | ||
|
|
||
| exchange-manager.name=filesystem | ||
| exchange.base-directories=gs://exchange-spooling-bucket | ||
| exchange.s3.region=us-west-1 | ||
| exchange.s3.aws-access-key=example-access-key | ||
| exchange.s3.aws-secret-key=example-secret-key | ||
| exchange.s3.endpoint=https://storage.googleapis.com | ||
| exchange.gcs.json-key-file-path=/path/to/gcs_keyfile.json | ||
|
|
||
| Local filesystem storage | ||
| ~~~~~~~~~~~~~~~~~~~~~~~~ | ||
|
|
||
Uh oh!
There was an error while loading. Please reload this page.