Skip to content
Closed
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
140 changes: 92 additions & 48 deletions docs/src/main/sphinx/connector/hive.rst
Original file line number Diff line number Diff line change
Expand Up @@ -326,6 +326,10 @@ Property Name Description

``hive.create-empty-bucket-files`` Should empty files be created for buckets that have no data? ``false``

``hive.validate-bucketing`` Verify that data is in the correct bucket file when reading ``true``
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here and in other places below .. only one space character before code or between sentences

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this specific case true is part of of the "Default" column and not the description sentence.

bucketed tables. This check is enabled by default, as
incorrect bucketing can cause incorrect query results.

``hive.partition-statistics-sample-size`` Specifies the number of partitions to analyze when 100
computing table statistics.

Expand Down Expand Up @@ -414,25 +418,27 @@ Property Name Description
with a leading 0. If set to 'skip', permissions of newly
created directories will not be set by Trino.

``hive.fs.cache.max-size`` Maximum number of cached file system objects. 1000
``hive.fs.cache.max-size`` Maximum number of cached file system objects. 1000

``hive.query-partition-filter-required`` Set to ``true`` to force a query to use a partition filter. ``false``
``hive.query-partition-filter-required`` Set to ``true`` to force a query to use a partition filter. ``false``
You can use the ``query_partition_filter_required`` catalog
session property for temporary, catalog specific use.

``hive.table-statistics-enabled`` Enables :doc:`/optimizer/statistics`. The equivalent ``true``
``hive.table-statistics-enabled`` Enables :doc:`/optimizer/statistics`. The equivalent ``true``
:doc:`catalog session property </sql/set-session>`
is ``statistics_enabled`` for session specific use.
Set to ``false`` to disable statistics. Disabling statistics
means that :doc:`/optimizer/cost-based-optimizations` can
not make smart decisions about the query plan.

``hive.auto-purge`` Set the default value for the auto_purge table property for ``false``
``hive.auto-purge`` Set the default value for the auto_purge table property for ``false``
managed tables.
See the :ref:`hive_table_properties` for more information
on auto_purge.

``hive.partition-projection-enabled`` Enables Athena partition projection support ``false``
``hive.partition-projection-enabled`` Enables Athena partition projection support ``false``

``hive.single-statement-writes`` Require auto-commit mode for individual DML statements ``false``
================================================== ============================================================ ============

ORC format configuration properties
Expand Down Expand Up @@ -571,7 +577,38 @@ Property Name Description
- Hive metastore client keytab location.
* - ``hive.metastore.thrift.delete-files-on-drop``
- Actively delete the files for drop table operations, for cases when the
metastore does not delete the files. Default is ``false``.
metastore does not delete the files.
This setting can be considered as a fallback in the
case when the Hive metastore completed the drop operation
without removing the files of the table.
Default is ``false``.
* - ``hive.metastore.thrift.assume-canonical-partition-keys``
- Allow the metastore to assume that the values of partition
columns can be converted to string values. This can lead to
performance improvements in the queries which apply filters
on the partition columns.
Note that the partition keys of type ``timestamp`` do
not get canonicalized. Default is ``false``.
* - ``hive.metastore.thrift.client.socks-proxy``
- SOCKS proxy to use for the Thrift Hive metastore.
* - ``hive.metastore.thrift.client.max-retries``
- Maximum number of retry attempts for metastore requests.
Default is ``9``.
* - ``hive.metastore.thrift.client.backoff-scale-factor``
- Scale factor for metastore request retry delay.
Default is ``2.0``.
* - ``hive.metastore.thrift.client.max-retry-time``
- Total time limit for a metastore request to be retried.
Default is ``30`` seconds.
* - ``hive.metastore.thrift.client.min-backoff-delay``
- Minimum delay between metastore request retries.
Default is ``1`` second.
* - ``hive.metastore.thrift.client.max-backoff-delay``
- Maximum delay between metastore request retries.
Default is ``1`` second.
* - ``hive.metastore.thrift.txn-lock-max-wait``
- Maximum time to wait to acquire hive transaction lock.
Default is ``10`` minutes.

.. _hive-glue-metastore:

Expand All @@ -582,63 +619,70 @@ In order to use a Glue catalog, ensure to configure the metastore with
``hive.metastore=glue`` and provide further details with the following
properties:

==================================================== ============================================================
Property Name Description
==================================================== ============================================================
``hive.metastore.glue.region`` AWS region of the Glue Catalog. This is required when not
running in EC2, or when the catalog is in a different region.
Example: ``us-east-1``
======================================================= ============================================================
Comment thread
findinpath marked this conversation as resolved.
Outdated
Property Name Description
======================================================= ============================================================
``hive.metastore.glue.region`` AWS region of the Glue Catalog. This is required when not
running in EC2, or when the catalog is in a different region.
Example: ``us-east-1``

``hive.metastore.glue.endpoint-url`` Glue API endpoint URL (optional).
Example: ``https://glue.us-east-1.amazonaws.com``

``hive.metastore.glue.endpoint-url`` Glue API endpoint URL (optional).
Example: ``https://glue.us-east-1.amazonaws.com``
``hive.metastore.glue.pin-client-to-current-region`` Pin Glue requests to the same region as the EC2 instance
where Trino is running, defaults to ``false``.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

running. Defaults to

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same for the various other ones like that below


``hive.metastore.glue.pin-client-to-current-region`` Pin Glue requests to the same region as the EC2 instance
where Trino is running, defaults to ``false``.
``hive.metastore.glue.max-connections`` Max number of concurrent connections to Glue.
Defaults to ``30``.

``hive.metastore.glue.max-connections`` Max number of concurrent connections to Glue,
defaults to ``30``.
``hive.metastore.glue.max-error-retries`` Maximum number of error retries for the Glue client.
Defaults to ``10``.

``hive.metastore.glue.max-error-retries`` Maximum number of error retries for the Glue client,
defaults to ``10``.
``hive.metastore.glue.default-warehouse-dir`` Default warehouse directory for schemas created without an
explicit ``location`` property.

``hive.metastore.glue.default-warehouse-dir`` Default warehouse directory for schemas created without an
explicit ``location`` property.
``hive.metastore.glue.aws-credentials-provider`` Fully qualified name of the Java class to use for obtaining
AWS credentials. Can be used to supply a custom credentials
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Related question not to be answered here...

how to you add access to that class.. drop the related jars into the Hive connector plugin folder?

If this is a common use case we might need to document specifically .. or generically somewhere

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

private static AWSCredentialsProvider getCustomAWSCredentialsProvider(String providerClass)
{
try {
Object instance = Class.forName(providerClass).getConstructor().newInstance();
if (!(instance instanceof AWSCredentialsProvider)) {
throw new RuntimeException("Invalid credentials provider class: " + instance.getClass().getName());
}
return (AWSCredentialsProvider) instance;
}
catch (ReflectiveOperationException e) {
throw new RuntimeException(format("Error creating an instance of %s", providerClass), e);
}
}

I'm assuming that this setting was thought to deal with one of the many credential providers which come with the aws-java-sdk-core library.

If the class is not in the library aws-java-sdk-core then it would need to be probably added to hive's plugin folder.

Related PR: #1363

provider.

``hive.metastore.glue.aws-credentials-provider`` Fully qualified name of the Java class to use for obtaining
AWS credentials. Can be used to supply a custom credentials
provider.
``hive.metastore.glue.aws-access-key`` AWS access key to use to connect to the Glue Catalog. If
specified along with ``hive.metastore.glue.aws-secret-key``,
this parameter takes precedence over
``hive.metastore.glue.iam-role``.

``hive.metastore.glue.aws-access-key`` AWS access key to use to connect to the Glue Catalog. If
specified along with ``hive.metastore.glue.aws-secret-key``,
this parameter takes precedence over
``hive.metastore.glue.iam-role``.
``hive.metastore.glue.aws-secret-key`` AWS secret key to use to connect to the Glue Catalog. If
specified along with ``hive.metastore.glue.aws-access-key``,
this parameter takes precedence over
``hive.metastore.glue.iam-role``.

``hive.metastore.glue.aws-secret-key`` AWS secret key to use to connect to the Glue Catalog. If
specified along with ``hive.metastore.glue.aws-access-key``,
this parameter takes precedence over
``hive.metastore.glue.iam-role``.
``hive.metastore.glue.catalogid`` The ID of the Glue Catalog in which the metadata database
resides.

``hive.metastore.glue.catalogid`` The ID of the Glue Catalog in which the metadata database
resides.
``hive.metastore.glue.iam-role`` ARN of an IAM role to assume when connecting to the Glue
Catalog.

``hive.metastore.glue.iam-role`` ARN of an IAM role to assume when connecting to the Glue
Catalog.
``hive.metastore.glue.external-id`` External ID for the IAM role trust policy when connecting
to the Glue Catalog.

``hive.metastore.glue.external-id`` External ID for the IAM role trust policy when connecting
to the Glue Catalog.
``hive.metastore.glue.partitions-segments`` Number of segments for partitioned Glue tables.
Defaults to ``5``.

``hive.metastore.glue.partitions-segments`` Number of segments for partitioned Glue tables, defaults
to ``5``.
``hive.metastore.glue.get-partition-threads`` Number of threads for parallel partition fetches from Glue.
Defaults to ``20``.

``hive.metastore.glue.get-partition-threads`` Number of threads for parallel partition fetches from Glue,
defaults to ``20``.
``hive.metastore.glue.read-statistics-threads`` Number of threads for parallel statistic fetches from Glue.
Defaults to ``5``.

``hive.metastore.glue.read-statistics-threads`` Number of threads for parallel statistic fetches from Glue,
defaults to ``5``.
``hive.metastore.glue.write-statistics-threads`` Number of threads for parallel statistic writes to Glue.
Defaults to ``5``.

``hive.metastore.glue.write-statistics-threads`` Number of threads for parallel statistic writes to Glue,
defaults to ``5``.
==================================================== ============================================================
``hive.metastore.glue.assume-canonical-partition-keys`` Allow the metastore to assume that the values of partition
columns can be converted to string values. This can lead to
performance improvements in the queries which apply filters
on the partition columns.
Note that the partition keys of type ``timestamp`` do
not get canonicalized. Defaults to ``false``.
======================================================= ============================================================

.. _hive-google-cloud-storage-configuration:

Expand Down