diff --git a/docs/src/main/sphinx/connector/hive-alluxio.rst b/docs/src/main/sphinx/connector/hive-alluxio.rst index 744cf5588699..b084daf99970 100644 --- a/docs/src/main/sphinx/connector/hive-alluxio.rst +++ b/docs/src/main/sphinx/connector/hive-alluxio.rst @@ -7,85 +7,15 @@ System `_, leveraging Alluxio's distributed block-level read/write caching functionality. The tables must be created in the Hive metastore with the ``alluxio://`` location prefix (see `Running Apache Hive with Alluxio -`_ +`_ for details and examples). Trino queries will then transparently retrieve and cache files or objects from a variety of disparate storage systems including HDFS and S3. -Alluxio client-side configuration ---------------------------------- +Setting up Alluxio with Trino +----------------------------- -To configure Alluxio client-side properties on Trino, append the Alluxio -configuration directory (``${ALLUXIO_HOME}/conf``) to the Trino JVM classpath, -so that the Alluxio properties file ``alluxio-site.properties`` can be loaded as -a resource. Update the Trino :ref:`jvm-config` file ``etc/jvm.config`` -to include the following: - -.. code-block:: text - - -Xbootclasspath/a: - -The advantage of this approach is that all the Alluxio properties are set in -the single ``alluxio-site.properties`` file. For details, see `Customize Alluxio Presto Properties -`_. - -Alternatively, add Alluxio configuration properties to the Hadoop configuration -files (``core-site.xml``, ``hdfs-site.xml``) and configure the Hive connector -to use the `Hadoop configuration files <#hdfs-configuration>`__ via the -``hive.config.resources`` connector property. - -Deploy Alluxio with Trino --------------------------- - -To achieve the best performance running Trino on Alluxio, it is recommended -to collocate Trino workers with Alluxio workers. This allows reads and writes -to bypass the network (*short-circuit*). See `Performance Tuning Tips for Presto with Alluxio -`_ -for more details. - -.. _alluxio-catalog-service: - -Alluxio catalog service ------------------------ - -An alternative way for Trino to interact with Alluxio is via the -`Alluxio catalog service `_. -The primary benefits for using the Alluxio catalog service are simpler -deployment of Alluxio with Trino, and enabling schema-aware optimizations -such as transparent caching and transformations. Currently, the catalog service -supports read-only workloads. - -The Alluxio catalog service is a metastore that can cache the information -from different underlying metastores. It currently supports the Hive metastore -as an underlying metastore. In order for the Alluxio catalog to manage the metadata -of other existing metastores, the other metastores must be "attached" to the -Alluxio catalog. To attach an existing Hive metastore to the Alluxio -catalog, simply use the -`Alluxio CLI attachdb command `_. -The appropriate Hive metastore location and Hive database name need to be -provided. - -.. code-block:: text - - ./bin/alluxio table attachdb hive thrift://HOSTNAME:9083 hive_db_name - -Once a metastore is attached, the Alluxio catalog can manage and serve the -information to Trino. To configure the Hive connector for Alluxio -catalog service, simply configure the connector to use the Alluxio -metastore type, and provide the location to the Alluxio cluster. -For example, your ``etc/catalog/alluxio.properties`` should include -the following: - -.. code-block:: text - - connector.name=hive - hive.metastore=alluxio-deprecated - hive.metastore.alluxio.master.address=HOSTNAME:PORT - -Replace ``HOSTNAME`` with the Alluxio master hostname, and replace ``PORT`` -with the Alluxio master port. -An example of an Alluxio master address is ``master-node:19998``. -Now, Trino queries can take advantage of the Alluxio catalog service, such as -transparent caching and transparent transformations, without any modifications -to existing Hive metastore deployments. +For information on how to setup, configure, and use Alluxio, refer to `Alluxio's +documentation on using their platform with Trino +`_.