Skip to content

Filter out Iceberg Materialized view from Hive View Tables#15221

Closed
jklamer wants to merge 1 commit intotrinodb:masterfrom
jklamer:jklamer/ViewReaderUtilBugFix
Closed

Filter out Iceberg Materialized view from Hive View Tables#15221
jklamer wants to merge 1 commit intotrinodb:masterfrom
jklamer:jklamer/ViewReaderUtilBugFix

Conversation

@jklamer
Copy link
Copy Markdown
Member

@jklamer jklamer commented Nov 28, 2022

Description

The Iceberg plugin uses the Hive Utils when using with Iceberg with a hive metastore backed TrinoIcebergCatalog. The implementation to allow IcebergMaterialized views to be used in this implementation requires the putting the flag ICEBERG_MATERIALIZED_VIEW_COMMENT into the table comment. Hive metastore tables with this flag cannot be decoded into ConnectorViewDefinitions.

Additional context and related issues

Release notes

(x) This is not user-visible or docs only and no release notes are required.
( ) Release notes are required, please propose a release note for me.
( ) Release notes are required, with the following suggested text:

@cla-bot cla-bot bot added the cla-signed label Nov 28, 2022
@jklamer jklamer requested a review from findepi November 28, 2022 17:13
@jklamer jklamer force-pushed the jklamer/ViewReaderUtilBugFix branch from 4bb17f0 to de6c547 Compare November 28, 2022 17:25
@alexjo2144
Copy link
Copy Markdown
Member

Can you add a test to TestIcebergHiveViewsCompatibility?

@jklamer jklamer force-pushed the jklamer/ViewReaderUtilBugFix branch from de6c547 to ff1d4d3 Compare November 28, 2022 22:04
}

@Test(groups = {ICEBERG, STORAGE_FORMATS, HMS_ONLY})
public void testIcebergMaterializedViewHiveViewsCompatibility()
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test succeeds without the above change. Please make sure that new tests fail without the change when fixing a bug.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh! Does SHOW TABLES not delegate to getView or some derivative? Would I need to query from information schema?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SHOW TABLES presents a different information that information_schema.tables, so it may or may not be following same exec path

Copy link
Copy Markdown
Member Author

@jklamer jklamer Nov 30, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Turns out, I'm struggling to trigger this. Something that might be making it hard for me to trigger this is that the getViews method for HiveMetadata swallows all sorts of view errors

ImmutableMap.Builder<SchemaTableName, ConnectorViewDefinition> views = ImmutableMap.builder();
        for (SchemaTableName name : listViews(session, schemaName)) {
            try {
                getView(session, name).ifPresent(view -> views.put(name, view));
            }
            catch (TrinoException e) {
                if (e.getErrorCode().equals(HIVE_VIEW_TRANSLATION_ERROR.toErrorCode())) {
                    // Ignore hive views for which translation fails
                }
                else if (e.getErrorCode().equals(HIVE_INVALID_VIEW_DATA.toErrorCode())) {
                    // Ignore views that are not valid
                }
                else if (e.getErrorCode().equals(TABLE_NOT_FOUND.toErrorCode())) {
                    // Ignore view that was dropped during query execution (race condition)
                }
                else {
                    throw e;
                }
            }
        }
        return views.buildOrThrow();

And the condition that this change stops from throwing is one of the ones caught here: HIVE_INVALID_VIEW_DATA

.map(list -> row(list.toArray()))
.collect(toList());

//create base tables
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Add a space after //

Comment on lines +138 to +139
onTrino().executeQuery("CREATE VIEW hive.default.hive_view AS SELECT * FROM hive.default.hive_table");
onTrino().executeQuery("CREATE VIEW iceberg.default.iceberg_view AS SELECT * FROM iceberg.default.iceberg_table");
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose that the test purpose is materialized views and Trino views are already covered by testIcebergHiveViewsCompatibility. Can we remove these CREATE VIEW statements and focus on materialized views in this test?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this case, we need to assert that views can be accessed and decoded and that materialized views are filtered out.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ive simplified the test though to reduce replicating too much of the above test

assertThat(onTrino().executeQuery("SELECT * FROM iceberg.default.iceberg_materialized_view")).containsOnly(row(2));
}
finally {
cleanupMaterializedViews();
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

inline -- typically a test's setup and cleanup is done directly in the test.

// we can decode Hive or Presto view
return table.getTableType().equals(VIRTUAL_VIEW.name());
return table.getTableType().equals(VIRTUAL_VIEW.name()) &&
!isTrinoMaterializedView(table.getTableType(), table.getParameters());
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I replied offline, I would recommend improving the error message when executing SELECT query. The past error message wasn't nice, but the latest error is definitely confusing to users.

SHOW TABLES IN hive.tpch;
                Table
-------------------------------------
 customer
 lineitem
 nation
 orders
 part
 partsupp
 region
 st_4895c7d3b34545bba114c36d756da40d
 supplier
 test

-- Before this change
SELECT * FROM hive.tpch.test;
Query 20221130_231311_00042_62rkt failed: View data missing prefix: /* Presto Materialized View: eyJvcmlnaW5hbFNxbCI6IlNFTEVDVCAxIGFcblxuIiwiY29sdW1ucyI6W3sibmFtZSI6ImEiLCJ0eXBlIjoiaW50ZWdlciJ9XX0= */

-- After this change
SELECT * FROM hive.tpch.test;
Query 20221130_232023_00030_9azzm failed: Table 'tpch.test' not found

@jklamer
Copy link
Copy Markdown
Member Author

jklamer commented Jan 19, 2023

Could not replicate issue in test .

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Development

Successfully merging this pull request may close these issues.

4 participants