diff --git a/docs/src/main/sphinx/connector/hive.rst b/docs/src/main/sphinx/connector/hive.rst index 439d6fd54c09..69c5411e2d9f 100644 --- a/docs/src/main/sphinx/connector/hive.rst +++ b/docs/src/main/sphinx/connector/hive.rst @@ -326,6 +326,10 @@ Property Name Description ``hive.create-empty-bucket-files`` Should empty files be created for buckets that have no data? ``false`` +``hive.validate-bucketing`` Verify that data is in the correct bucket file when reading ``true`` + bucketed tables. This check is enabled by default, as + incorrect bucketing can cause incorrect query results. + ``hive.partition-statistics-sample-size`` Specifies the number of partitions to analyze when 100 computing table statistics. @@ -414,25 +418,27 @@ Property Name Description with a leading 0. If set to 'skip', permissions of newly created directories will not be set by Trino. -``hive.fs.cache.max-size`` Maximum number of cached file system objects. 1000 +``hive.fs.cache.max-size`` Maximum number of cached file system objects. 1000 -``hive.query-partition-filter-required`` Set to ``true`` to force a query to use a partition filter. ``false`` +``hive.query-partition-filter-required`` Set to ``true`` to force a query to use a partition filter. ``false`` You can use the ``query_partition_filter_required`` catalog session property for temporary, catalog specific use. -``hive.table-statistics-enabled`` Enables :doc:`/optimizer/statistics`. The equivalent ``true`` +``hive.table-statistics-enabled`` Enables :doc:`/optimizer/statistics`. The equivalent ``true`` :doc:`catalog session property ` is ``statistics_enabled`` for session specific use. Set to ``false`` to disable statistics. Disabling statistics means that :doc:`/optimizer/cost-based-optimizations` can not make smart decisions about the query plan. -``hive.auto-purge`` Set the default value for the auto_purge table property for ``false`` +``hive.auto-purge`` Set the default value for the auto_purge table property for ``false`` managed tables. See the :ref:`hive_table_properties` for more information on auto_purge. -``hive.partition-projection-enabled`` Enables Athena partition projection support ``false`` +``hive.partition-projection-enabled`` Enables Athena partition projection support ``false`` + +``hive.single-statement-writes`` Require auto-commit mode for individual DML statements ``false`` ================================================== ============================================================ ============ ORC format configuration properties @@ -571,7 +577,38 @@ Property Name Description - Hive metastore client keytab location. * - ``hive.metastore.thrift.delete-files-on-drop`` - Actively delete the files for drop table operations, for cases when the - metastore does not delete the files. Default is ``false``. + metastore does not delete the files. + This setting can be considered as a fallback in the + case when the Hive metastore completed the drop operation + without removing the files of the table. + Default is ``false``. + * - ``hive.metastore.thrift.assume-canonical-partition-keys`` + - Allow the metastore to assume that the values of partition + columns can be converted to string values. This can lead to + performance improvements in the queries which apply filters + on the partition columns. + Note that the partition keys of type ``timestamp`` do + not get canonicalized. Default is ``false``. + * - ``hive.metastore.thrift.client.socks-proxy`` + - SOCKS proxy to use for the Thrift Hive metastore. + * - ``hive.metastore.thrift.client.max-retries`` + - Maximum number of retry attempts for metastore requests. + Default is ``9``. + * - ``hive.metastore.thrift.client.backoff-scale-factor`` + - Scale factor for metastore request retry delay. + Default is ``2.0``. + * - ``hive.metastore.thrift.client.max-retry-time`` + - Total time limit for a metastore request to be retried. + Default is ``30`` seconds. + * - ``hive.metastore.thrift.client.min-backoff-delay`` + - Minimum delay between metastore request retries. + Default is ``1`` second. + * - ``hive.metastore.thrift.client.max-backoff-delay`` + - Maximum delay between metastore request retries. + Default is ``1`` second. + * - ``hive.metastore.thrift.txn-lock-max-wait`` + - Maximum time to wait to acquire hive transaction lock. + Default is ``10`` minutes. .. _hive-glue-metastore: @@ -582,63 +619,70 @@ In order to use a Glue catalog, ensure to configure the metastore with ``hive.metastore=glue`` and provide further details with the following properties: -==================================================== ============================================================ -Property Name Description -==================================================== ============================================================ -``hive.metastore.glue.region`` AWS region of the Glue Catalog. This is required when not - running in EC2, or when the catalog is in a different region. - Example: ``us-east-1`` +======================================================= ============================================================ +Property Name Description +======================================================= ============================================================ +``hive.metastore.glue.region`` AWS region of the Glue Catalog. This is required when not + running in EC2, or when the catalog is in a different region. + Example: ``us-east-1`` + +``hive.metastore.glue.endpoint-url`` Glue API endpoint URL (optional). + Example: ``https://glue.us-east-1.amazonaws.com`` -``hive.metastore.glue.endpoint-url`` Glue API endpoint URL (optional). - Example: ``https://glue.us-east-1.amazonaws.com`` +``hive.metastore.glue.pin-client-to-current-region`` Pin Glue requests to the same region as the EC2 instance + where Trino is running, defaults to ``false``. -``hive.metastore.glue.pin-client-to-current-region`` Pin Glue requests to the same region as the EC2 instance - where Trino is running, defaults to ``false``. +``hive.metastore.glue.max-connections`` Max number of concurrent connections to Glue. + Defaults to ``30``. -``hive.metastore.glue.max-connections`` Max number of concurrent connections to Glue, - defaults to ``30``. +``hive.metastore.glue.max-error-retries`` Maximum number of error retries for the Glue client. + Defaults to ``10``. -``hive.metastore.glue.max-error-retries`` Maximum number of error retries for the Glue client, - defaults to ``10``. +``hive.metastore.glue.default-warehouse-dir`` Default warehouse directory for schemas created without an + explicit ``location`` property. -``hive.metastore.glue.default-warehouse-dir`` Default warehouse directory for schemas created without an - explicit ``location`` property. +``hive.metastore.glue.aws-credentials-provider`` Fully qualified name of the Java class to use for obtaining + AWS credentials. Can be used to supply a custom credentials + provider. -``hive.metastore.glue.aws-credentials-provider`` Fully qualified name of the Java class to use for obtaining - AWS credentials. Can be used to supply a custom credentials - provider. +``hive.metastore.glue.aws-access-key`` AWS access key to use to connect to the Glue Catalog. If + specified along with ``hive.metastore.glue.aws-secret-key``, + this parameter takes precedence over + ``hive.metastore.glue.iam-role``. -``hive.metastore.glue.aws-access-key`` AWS access key to use to connect to the Glue Catalog. If - specified along with ``hive.metastore.glue.aws-secret-key``, - this parameter takes precedence over - ``hive.metastore.glue.iam-role``. +``hive.metastore.glue.aws-secret-key`` AWS secret key to use to connect to the Glue Catalog. If + specified along with ``hive.metastore.glue.aws-access-key``, + this parameter takes precedence over + ``hive.metastore.glue.iam-role``. -``hive.metastore.glue.aws-secret-key`` AWS secret key to use to connect to the Glue Catalog. If - specified along with ``hive.metastore.glue.aws-access-key``, - this parameter takes precedence over - ``hive.metastore.glue.iam-role``. +``hive.metastore.glue.catalogid`` The ID of the Glue Catalog in which the metadata database + resides. -``hive.metastore.glue.catalogid`` The ID of the Glue Catalog in which the metadata database - resides. +``hive.metastore.glue.iam-role`` ARN of an IAM role to assume when connecting to the Glue + Catalog. -``hive.metastore.glue.iam-role`` ARN of an IAM role to assume when connecting to the Glue - Catalog. +``hive.metastore.glue.external-id`` External ID for the IAM role trust policy when connecting + to the Glue Catalog. -``hive.metastore.glue.external-id`` External ID for the IAM role trust policy when connecting - to the Glue Catalog. +``hive.metastore.glue.partitions-segments`` Number of segments for partitioned Glue tables. + Defaults to ``5``. -``hive.metastore.glue.partitions-segments`` Number of segments for partitioned Glue tables, defaults - to ``5``. +``hive.metastore.glue.get-partition-threads`` Number of threads for parallel partition fetches from Glue. + Defaults to ``20``. -``hive.metastore.glue.get-partition-threads`` Number of threads for parallel partition fetches from Glue, - defaults to ``20``. +``hive.metastore.glue.read-statistics-threads`` Number of threads for parallel statistic fetches from Glue. + Defaults to ``5``. -``hive.metastore.glue.read-statistics-threads`` Number of threads for parallel statistic fetches from Glue, - defaults to ``5``. +``hive.metastore.glue.write-statistics-threads`` Number of threads for parallel statistic writes to Glue. + Defaults to ``5``. -``hive.metastore.glue.write-statistics-threads`` Number of threads for parallel statistic writes to Glue, - defaults to ``5``. -==================================================== ============================================================ +``hive.metastore.glue.assume-canonical-partition-keys`` Allow the metastore to assume that the values of partition + columns can be converted to string values. This can lead to + performance improvements in the queries which apply filters + on the partition columns. + Note that the partition keys of type ``timestamp`` do + not get canonicalized. Defaults to ``false``. +======================================================= ============================================================ .. _hive-google-cloud-storage-configuration: