Add hive iceberg support based on Trino created iceberg table. by Imbruced · Pull Request #10779 · trinodb/trino

Imbruced · 2022-01-25T07:37:13Z

This PR aims to fix issue with reading trino created iceberg table registered in hms. The code changes are small but I need more sophisticated tests. Especially tests in products section should be created. As far I know you dont have docker file fot iceberg+spark+hive and that integration should be verified. Should I add this changes here https://github.com/trinodb/docker-images/blob/master/testing/spark3.0-iceberg/Dockerfile ? Or it is better to create new one and based on that image create additional e2e and integration tests ?

We need tests like (pseudo code)

OnTrino(CREATE TABLE table_a ...)
OnTrino(INSERT INTO table_a VALUES ...)
OnHive(SELECT * FROM table_a ...)
assert cnt_trino ==cnt_hive
assert trino_elements == hive_elements

@findepi Help for your support.

cla-bot · 2022-01-25T07:37:15Z

Thank you for your pull request and welcome to our community. We require contributors to sign our Contributor License Agreement, and we don't seem to have you on file. In order for us to review and merge your code, please submit the signed CLA to cla@trino.io. For more information, see https://github.com/trinodb/cla.

Imbruced · 2022-01-25T07:38:53Z

Thats definitely a draft I need to understand whats the approach for integration tests. IMHO integration should be tested for iceberg + trino + hive

cla-bot · 2022-01-26T23:02:53Z

Thank you for your pull request and welcome to our community. We require contributors to sign our Contributor License Agreement, and we don't seem to have you on file. In order for us to review and merge your code, please submit the signed CLA to cla@trino.io. For more information, see https://github.com/trinodb/cla.

cla-bot · 2022-01-26T23:05:23Z

Thank you for your pull request and welcome to our community. We require contributors to sign our Contributor License Agreement, and we don't seem to have you on file. In order for us to review and merge your code, please submit the signed CLA to cla@trino.io. For more information, see https://github.com/trinodb/cla.

cla-bot · 2022-01-28T00:16:22Z

Thank you for your pull request and welcome to our community. We require contributors to sign our Contributor License Agreement, and we don't seem to have you on file. In order for us to review and merge your code, please submit the signed CLA to cla@trino.io. For more information, see https://github.com/trinodb/cla.

cla-bot · 2022-01-31T22:06:20Z

Thank you for your pull request and welcome to our community. We require contributors to sign our Contributor License Agreement, and we don't seem to have you on file. In order for us to review and merge your code, please submit the signed CLA to cla@trino.io. For more information, see https://github.com/trinodb/cla.

findepi

(haven't reviewed the test yet)

findepi · 2022-02-01T08:43:15Z

plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/IcebergTableProperties.java

                        false))
+                .add(booleanProperty(
+                        HIVE_ENABLED,
+                        "Enable hive support",


I'd call this "hive compatibility enabled".
However, to the best of my knowledge, the only downside to setting this flag is that it blows up completely, unless HMS has Iceberg jars added. Thus i don't see a need to make this a table property. Make it a config toggle: iceberg.hive-compatibility.enabled (boolean).

for the record, Iceberg has this on per-table basis
https://github.com/apache/iceberg/blob/7fcc71da65a47ca3c9f6eb6e862a238389b8bdc5/hive-metastore/src/main/java/org/apache/iceberg/hive/HiveTableOperations.java#L526-L547
@pvary @rdblue would you want to comment on the rationale?
is it because HiveIcebergMetaHook is used as a vehicle for propagating the flag to the Catalog?

The preference is for setting this as an environment property -- your Hive install should have the Iceberg Jar or not. But you may want to set it at the table level for some tables because you can add the Iceberg runtime Jar on the client side. I would probably not include this in Trino unless you need to later.

What exactly you mean by this ? "I would probably not include this in Trino unless you need to later". Keep the table property as it is ? Or change to as config variable ? I tried to do the same thing as it is done in spark iceberg jars, when creating table compatibile with hive appropriate table property is assigned.
Or you mean that chnage at all shouldnt be included in trino ?

I wouldn't expose this in Trino unless it is necessary. People can set this in the HiveConf used by the catalog, so hopefully you can just ignore these settings and make that the way to control it. Hopefully Hive will have a better story around this soon as well, so we don't need to use fake InputFormat and Serde classes to avoid it breaking. Normally, I wouldn't expect needing to do so much in other engines to avoid Hive's user experience from breaking.

Do you have any idea how to achieve that on table level ? I mean at the moment only way is to create table using spark or hive to be able to use on those 3 engines. I mean Hive, Trino, Spark.

Iceberg has the same logic in the code, I mean by using table property.

I wouldn't expose this in Trino unless it is necessary. People can set this in the HiveConf used by the catalog,

@rdblue you mean hive-site.xml files (trino's hive.config.resources config)?
or you mean some HMS configuration change?

thanks for confirming we don't need this to be a table property.

Yeah, Iceberg will also check for that property to be configured. I'd leave it in Hive config rather than using session config or table config.

findepi · 2022-02-01T08:45:27Z

plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/IcebergUtil.java

        ImmutableMap.Builder<String, String> propertiesBuilder = ImmutableMap.builderWithExpectedSize(2);
        FileFormat fileFormat = IcebergTableProperties.getFileFormat(tableMetadata.getProperties());
        propertiesBuilder.put(DEFAULT_FILE_FORMAT, fileFormat.toString());
+        propertiesBuilder.put(HIVE_ENABLED, isHiveEnabled.toString());


Why store this information in the table properties?
It seems that what matters is serde and input/output format classes we use when registering the table in the metastore?

Thats right, but whats the other approach you are thinking of ? I didnt find btter solution to make it happen on table level ?

Thats the similar approach whats is done on spark side when iceberg jar is added.

Thats right, but whats the other approach you are thinking of ?

Add a boolean entry in some Iceberg config class.
This should be a new config class bound in IcebergHiveMetastoreCatalogModule, because it's going to be HMS-specific. See IcebergConfig & TestIcebergConfig for how to create a config class.
See #10845 for upcoming Glue support, which won't need this.

Then, pass the config value to HiveMetastoreTableOperationsProvider > HiveMetastoreTableOperations > AbstractMetastoreTableOperations and use there.

findepi · 2022-02-01T08:48:29Z

...-iceberg/src/main/java/io/trino/plugin/iceberg/catalog/AbstractMetastoreTableOperations.java

    public static final String METADATA_LOCATION = "metadata_location";
    public static final String PREVIOUS_METADATA_LOCATION = "previous_metadata_location";
    protected static final String METADATA_FOLDER_NAME = "metadata";
+    protected static final String HIVE_ENABLED_FLAG = "engine.hive.enabled";


we should directly reuse org.apache.iceberg.TableProperties#ENGINE_HIVE_ENABLED.

findepi · 2022-02-01T08:49:11Z

...-iceberg/src/main/java/io/trino/plugin/iceberg/catalog/AbstractMetastoreTableOperations.java

            FileInputFormat.class.getName(),
            FileOutputFormat.class.getName());

+    protected static final StorageFormat HIVE_ICEBERG_STORAGE_FORMAT = StorageFormat.create(


just ICEBERG_STORAGE_FORMAT

and rename the above one: STORAGE_FORMAT -> DUMMY_STORAGE_FORMAT

also, add a code comment why we cannot use ICEBERG_STORAGE_FORMAT unconditionally.

findepi · 2022-02-01T08:49:22Z

plugin/trino-iceberg/src/test/java/io/trino/plugin/iceberg/BaseIcebergConnectorTest.java

-        assertUpdate("CREATE TABLE test_iceberg_get_table_props (x BIGINT)");
-        assertThat(query("SELECT * FROM \"test_iceberg_get_table_props$properties\""))
-                .matches(format("VALUES (VARCHAR 'write.format.default', VARCHAR '%s')", format.name()));
-        dropTable("test_iceberg_get_table_props");


not sure why this test is removed?

by mistake I will bring it back.

findepi · 2022-02-01T08:55:30Z

...ain/resources/docker/presto-product-tests/conf/tempto/tempto-configuration-for-hms-only.yaml

    hive:
        # Make hive server configuration invalid to make sure the hms_only tests do not accidentally depend on HS2.
-        jdbc_url: jdbc:hive2://${databases.hive.host}:12345
+        jdbc_url: jdbc:hive2://${databases.hive.host}:10000


This directly contradicts the comment above. Why do you want the change here?

forget to chnage that to default value, the jars which were added to hive runtime is by default pushed to hive on port 10000. Do you have an idea where should I change that to be applicable in singlenode-spark-iceberg env only or even on my test class level ?

tempto-configuration-for-hms-only shouldn't be used in singlenode-spark-iceberg if we're going to use HS2 there.
(and i don't think it is used there)

findepi · 2022-02-01T08:56:29Z

...sto-product-tests/conf/environment/singlenode-spark-iceberg/apply-hive-config-for-iceberg.sh

 echo "Applying hive-site configuration overrides for Spark"

+wget https://repo1.maven.org/maven2/org/apache/iceberg/iceberg-hive-runtime/0.12.1/iceberg-hive-runtime-0.12.1.jar
+cp iceberg-hive-runtime-0.12.1.jar /usr/hdp/current/hive-client/auxlib


the echo above documents the apply-site-xml-override below. let them stay together
add code above of echo statement, with it's own documentation-like echo

(see also #10879)

findepi · 2022-02-01T08:57:28Z

...ino-product-tests/src/main/java/io/trino/tests/product/iceberg/TestHiveReadingTheTables.java

+import static io.trino.tests.product.utils.QueryExecutors.onTrino;
+import static java.lang.String.format;
+
+public class TestHiveReadingTheTables


TestIcebergHiveCompatibility

and add javadoc describing what it does, to discern from TestIcebergHiveTablesCompatibility

(see also #10880)

mosabua · 2022-11-03T23:51:17Z

👋 @Imbruced - this PR has become inactive. If you're still interested in working on it, please let us know, and we can try to get reviewers to help with that.

We're working on closing out old and inactive PRs, so if you're too busy or this has too many merge conflicts to be worth picking back up, we'll be making another pass to close it out in a few weeks.

colebow · 2023-03-30T18:02:46Z

Closing this one out due to inactivity, but please reopen if you would like to pick this back up.

Imbruced added 2 commits January 25, 2022 08:27

Add integration between trino iceberg created tables on hive.

181ec4d

Add hive support on iceberg trino created tables.

3f71051

Add additional tests.

bce4556

Fix Hive type.

7bb27ad

github-actions bot added the tests:hive label Jan 26, 2022

Imbruced added 2 commits January 28, 2022 01:00

Remove not used test.

a8b7b9a

Remove not used import.

1c72bcd

Use only Iceberg env.

a2345eb

findepi reviewed Feb 1, 2022

View reviewed changes

kokosing force-pushed the master branch from 3f05134 to 58d6356 Compare March 14, 2023 11:34

colebow closed this Mar 30, 2023

Conversation

Imbruced commented Jan 25, 2022

Uh oh!

cla-bot bot commented Jan 25, 2022

Uh oh!

Imbruced commented Jan 25, 2022

Uh oh!

cla-bot bot commented Jan 26, 2022

Uh oh!

cla-bot bot commented Jan 26, 2022

Uh oh!

cla-bot bot commented Jan 28, 2022

Uh oh!

cla-bot bot commented Jan 31, 2022

Uh oh!

findepi left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mosabua commented Nov 3, 2022

Uh oh!

colebow commented Mar 30, 2023

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

5 participants