Skip to content
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
99 changes: 80 additions & 19 deletions docs/src/main/sphinx/admin/fault-tolerant-execution.rst
Original file line number Diff line number Diff line change
Expand Up @@ -34,7 +34,7 @@ The following configuration properties control the behavior of fault-tolerant
execution on a Trino cluster:

.. list-table:: Fault-tolerant execution configuration properties
:widths: 30, 40, 30
:widths: 30, 50, 20
:header-rows: 1

* - Property name
Expand Down Expand Up @@ -134,7 +134,7 @@ The following configuration properties control the thresholds at which
queries/tasks are no longer retried in the event of repeated failures:

.. list-table:: Fault tolerance retry limit configuration properties
:widths: 30, 40, 30, 30
:widths: 30, 50, 20, 30
:header-rows: 1

* - Property name
Expand Down Expand Up @@ -192,7 +192,7 @@ configuration properties to manually control task sizing. These configuration
properties only apply to a ``TASK`` retry policy.

.. list-table:: Task sizing configuration properties
:widths: 30, 40, 30
:widths: 30, 50, 20
:header-rows: 1

* - Property name
Expand Down Expand Up @@ -253,7 +253,7 @@ the ``fault-tolerant-task-memory`` configuration property. This property only
applies to a ``TASK`` retry policy.

.. list-table:: Node allocation configuration properties
:widths: 30, 40, 30
:widths: 30, 50, 20
:header-rows: 1

* - Property name
Expand All @@ -273,7 +273,7 @@ The following additional configuration property can be used to manage
fault-tolerant execution:

.. list-table:: Other fault-tolerant execution configuration properties
:widths: 30, 40, 30, 30
:widths: 30, 50, 20, 30
:header-rows: 1

* - Property name
Expand Down Expand Up @@ -323,68 +323,94 @@ propertry to ``filesystem``, and additional configuration properties as needed
for your storage solution.

.. list-table:: Exchange manager configuration properties
:widths: 30, 40, 30
:widths: 30, 50, 20, 30
:header-rows: 1

* - Property name
- Description
- Default value
- Filesystem
* - ``exchange.base-directories``
- Comma-separated list of URI locations that the exchange manager uses to
store spooling data. Only supports S3 and local filesystems.
-
- All
* - ``exchange.encryption-enabled``
- Enable encrypting of spooling data.
- ``true``
- All
* - ``exchange.sink-buffer-pool-min-size``
- The minimum buffer pool size for an exchange sink. The larger the buffer
pool size, the larger the write parallelism and memory usage.
- ``10``
- All
* - ``exchange.sink-buffers-per-partition``
- The number of buffers per partition in the buffer pool. The larger the
buffer pool size, the larger the write parallelism and memory usage.
- ``2``
- All
* - ``exchange.sink-max-file-size``
- Max size of files written by exchange sinks.
- ``1GB``
- All
* - ``exchange.source-concurrent-reader``
- The number of concurrent readers to read from spooling storage. The
larger the number of concurrent readers, the larger the read parallelism
and memory usage.
- ``4``
- All
* - ``exchange.s3.aws-access-key``
- AWS access key to use. Required for a connection to AWS S3, can be
ignored for other S3 storage systems.
- AWS access key to use. Required for a connection to AWS S3 and GCS, can
be ignored for other S3 storage systems.
-
- AWS S3 and GCS
* - ``exchange.s3.aws-secret-key``
- AWS secret key to use. Required for a connection to AWS S3, can be
ignored for other S3 storage systems.
- AWS secret key to use. Required for a connection to AWS S3 and GCS, can
be ignored for other S3 storage systems.
-
- AWS S3 and GCS
* - ``exchange.s3.region``
- Region of the S3 bucket.
-
- AWS S3 and GCS
* - ``exchange.s3.endpoint``
- S3 storage endpoint server if using an S3-compatible storage system that
is not AWS. If using AWS S3, can be ignored.
is not AWS. If using AWS S3, this can be ignored. If using GCS, set it
to ``https://storage.googleapis.com``.
-
- S3-compatible Storage
* - ``exchange.s3.max-error-retries``
- Maximum number of times the exchange manager's S3 client should retry
a request.
- ``3``
- S3-compatible Storage
* - ``exchange.s3.upload.part-size``
- Part size for S3 multi-part upload.
- ``5MB``
- S3-compatible Storage
* - ``exchange.gcs.json-key-file-path``
- The path to the JSON file that contains your GCP service account key,
only applicable to using GCS as exchange spooling storage.
-
- GCS
* - ``exchange.azure.connection-string``
- The connection string used to access the spooling container.
Comment thread
linzebing marked this conversation as resolved.
Outdated
-
- Azure Blob Storage
* - ``exchange.azure.block-size``
- Block size for Azure block blob parallel upload.
- ``4MB``
- Azure Blob Storage

It is also recommended to set the ``exchange.compression-enabled`` property to
``true`` in the cluster's ``config.properties`` file, to reduce the exchange
manager's overall I/O load.

S3 bucket storage
~~~~~~~~~~~~~~~~~
It is recommended to configure a bucket lifecycle rule to automatically expire
abandoned objects in the event of a node crash.

If using an AWS S3 bucket, it is recommended to configure a bucket lifecycle
rule in AWS to automatically expire abandoned objects in the event of a node
crash.
AWS S3
~~~~~~

The following example ``exchange-manager.properties`` configuration specifies an
AWS S3 bucket as the spooling storage destination. Note that the destination
Expand All @@ -393,8 +419,7 @@ does not have to be in AWS, but can be any S3-compatible storage system.
.. code-block:: properties

exchange-manager.name=filesystem
exchange.base-directories=s3n://exchange-spooling-bucket
exchange.encryption-enabled=true
exchange.base-directories=s3://exchange-spooling-bucket
exchange.s3.region=us-west-1
exchange.s3.aws-access-key=example-access-key
exchange.s3.aws-secret-key=example-secret-key
Expand All @@ -409,7 +434,43 @@ load:

.. code-block:: properties

exchange.base-directories=s3n://exchange-spooling-bucket-1,s3n://exchange-spooling-bucket-2
exchange.base-directories=s3://exchange-spooling-bucket-1,s3://exchange-spooling-bucket-2

Google Cloud Storage
~~~~~~~~~~~~~~~~~~~~

GCS is mostly S3-compatible, with a few caveats. To enable exchange spooling on GCS in
Trino, change the request endpoint to the Google Storage URI ``https://storage.googleapis.com``,
and then configure your AWS access/secret keys to use the GCS HMAC keys. If you deploy
Trino on GCP, you need to create a service account with access permission to your spooling
bucket; otherwise, you need to configure the key path to your GCS credential file. For
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not get the otherwise, you need to configure the key path to your GCS credential file part. Don't we always need to do that?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, if you GCS deployment is associated with a service account that has access to the spooling bucket, then default authentication will just work out of the box.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You mean GCP deployment, right? Makes sense.

more information on GCS's S3 compatibility, please refer to `Google's Cloud's documentation
on S3 migration <https://cloud.google.com/storage/docs/aws-simple-migration>`_.

The following example ``exchange-manager.properties`` configuration specifies a GCS bucket
as the spooling storage destination.

.. code-block:: properties

exchange-manager.name=filesystem
exchange.base-directories=abfs://container_name@account_name.dfs.core.windows.net
exchange.azure.connection-string=connection-string

Azure Blob Storage
~~~~~~~~~~~~~~~~~~~~~~~

The following example ``exchange-manager.properties`` configuration specifies an Azure
Blob Storage container as the spooling storage destination.

.. code-block:: properties

exchange-manager.name=filesystem
exchange.base-directories=gs://exchange-spooling-bucket
exchange.s3.region=us-west-1
exchange.s3.aws-access-key=example-access-key
exchange.s3.aws-secret-key=example-secret-key
exchange.s3.endpoint=https://storage.googleapis.com
exchange.gcs.json-key-file-path=/path/to/gcs_keyfile.json

Local filesystem storage
~~~~~~~~~~~~~~~~~~~~~~~~
Expand Down