trinodb · martint · Oct 12, 2022 · Aug 19, 2022 · findinpath · Aug 19, 2022
diff --git a/docs/src/main/sphinx/connector.rst b/docs/src/main/sphinx/connector.rst
@@ -19,6 +19,7 @@ from different data sources.
     Elasticsearch   <connector/elasticsearch>
     Google Sheets   <connector/googlesheets>
     Hive            <connector/hive>
+    Hudi            <connector/hudi>
     Iceberg         <connector/iceberg>
     JMX             <connector/jmx>
     Kafka           <connector/kafka>

diff --git a/docs/src/main/sphinx/connector/hudi.rst b/docs/src/main/sphinx/connector/hudi.rst
@@ -0,0 +1,167 @@
+==============
+Hudi connector
+==============
+
+.. raw:: html
+
+  <img src="../_static/img/hudi.png" class="connector-logo">
+
+The Hudi connector enables querying `Hudi <https://hudi.apache.org/docs/overview/>`_ tables.
+
+Requirements
+------------
+
+To use the Hudi connector, you need:
+
+* Network access from the Trino coordinator and workers to the Hudi storage.
+* Access to the Hive metastore service (HMS).
+* Network access from the Trino coordinator to the HMS.
+
+Configuration
+-------------
+
+The connector requires a Hive metastore for table metadata and supports the same
+metastore configuration properties as the :doc:`Hive connector
+</connector/hive>`. At a minimum, ``hive.metastore.uri`` must be configured.
+The connector recognizes Hudi tables synced to the metastore by the
+`Hudi sync tool <https://hudi.apache.org/docs/syncing_metastore>`_.
+
+To create a catalog that uses the Hudi connector, create a catalog properties file,
+for example ``etc/catalog/example.properties``, that references the ``hudi``
+connector. Update the ``hive.metastore.uri`` with the URI of your Hive metastore
+Thrift service:
+
+.. code-block:: properties
+
+    connector.name=hudi
+    hive.metastore.uri=thrift://example.net:9083
+
+Additionally, following configuration properties can be set depending on the use-case.
+
+.. list-table:: Hudi configuration properties
+    :widths: 30, 55, 15
+    :header-rows: 1
+
+    * - Property name
+      - Description
+      - Default
+    * - ``hudi.metadata-enabled``
+      - Fetch the list of file names and sizes from metadata rather than storage.
+      - ``false``
+    * - ``hudi.columns-to-hide``
+      - List of column names that are hidden from the query output.
+        It can be used to hide Hudi meta fields. By default, no fields are hidden.
+      -
+    * - ``hudi.parquet.use-column-names``
+      - Access Parquet columns using names from the file. If disabled, then columns
+        are accessed using the index. Only applicable to Parquet file format.
+      - ``true``
+    * - ``hudi.min-partition-batch-size``
+      - Minimum number of partitions returned in a single batch.
+      - ``10``
+    * - ``hudi.max-partition-batch-size``
+      - Maximum number of partitions returned in a single batch.
+      - ``100``
+    * - ``hudi.size-based-split-weights-enabled``
+      - Unlike uniform splitting, size-based splitting ensures that each batch of splits
+        has enough data to process. By default, it is enabled to improve performance.
+      - ``true``
+    * - ``hudi.standard-split-weight-size``
+      - The split size corresponding to the standard weight (1.0)
+        when size-based split weights are enabled.
+      - ``128MB``
+    * - ``hudi.minimum-assigned-split-weight``
+      - Minimum weight that a split can be assigned
+        when size-based split weights are enabled.
+      - ``0.05``
+    * - ``hudi.max-splits-per-second``
+      - Rate at which splits are queued for processing.
+        The queue is throttled if this rate limit is breached.
+      - ``Integer.MAX_VALUE``
+    * - ``hudi.max-outstanding-splits``
+      - Maximum outstanding splits in a batch enqueued for processing.
+      - ``1000``
+
+Supported file types
+--------------------
+
+The connector supports Parquet file type.
+
+SQL support
+-----------
+
+The connector provides read access to data in the Hudi table that has been synced to
+Hive metastore. The :ref:`globally available <sql-globally-available>`
+and :ref:`read operation <sql-read-operations>` statements are supported.
+
+Supported query types
+^^^^^^^^^^^^^^^^^^^^^
+
+Hudi supports `two types of tables <https://hudi.apache.org/docs/table_types>`_
+depending on how the data is indexed and laid out on the file system. The following
+table displays a support matrix of tables types and query types for the connector.
+
+=========================== =============================================
+Table type                  Supported query type
+=========================== =============================================
+Copy on write               Snapshot queries
+
+Merge on read               Read optimized queries
+=========================== =============================================
+
+Examples queries
+^^^^^^^^^^^^^^^^
+
+In the queries below, ``stock_ticks_cow`` is a Hudi copy-on-write table that we refer
+in the Hudi `quickstart <https://hudi.apache.org/docs/docker_demo/>`_ documentation.
+
+Here are some sample queries:
+
+.. code-block:: sql
+
+    USE a-catalog.myschema;
+
+    SELECT symbol, max(ts)
+    FROM stock_ticks_cow
+    GROUP BY symbol
+    HAVING symbol = 'GOOG';
+
+.. code-block:: text
+
+      symbol   |        _col1         |
+    -----------+----------------------+
+     GOOG      | 2018-08-31 10:59:00  |
+    (1 rows)
+
+.. code-block:: sql
+
+    SELECT dt, symbol
+    FROM stock_ticks_cow
+    WHERE symbol = 'GOOG';
+
+.. code-block:: text
+
+        dt      | symbol |
+    ------------+--------+
+     2018-08-31 |  GOOG  |
+    (1 rows)
+
+.. code-block:: sql
+
+    SELECT dt, count(*)
+    FROM stock_ticks_cow
+    GROUP BY dt;
+.. code-block:: text
+
+        dt      | _col1 |
+    ------------+--------+
+     2018-08-31 |  99  |
+    (1 rows)
+
+Reading Hudi tables with the Hive connector
+-------------------------------------------
+
+Hudi tables can also be accessed with a catalog using the Hive connector. The supported query types
+with the Hive connector are same as that of Hudi connector. To query Hudi tables on Trino, place
+the `hudi-trino-bundle <https://repo.maven.apache.org/maven2/org/apache/hudi/hudi-trino-bundle>`_
+JAR file into the Hive connector installation ``<trino_install>/plugin/hive``.
diff --git a/docs/src/main/sphinx/static/img/hudi.png b/docs/src/main/sphinx/static/img/hudi.png