Skip to content

Added custom materialization for materialized view#198

Closed
Jay-code0 wants to merge 3 commits intostarburstdata:masterfrom
Jay-code0:feature/new-materialized-view
Closed

Added custom materialization for materialized view#198
Jay-code0 wants to merge 3 commits intostarburstdata:masterfrom
Jay-code0:feature/new-materialized-view

Conversation

@Jay-code0
Copy link
Copy Markdown
Contributor

Overview

This new materialization will allow users to create models in dbt which will send the relevant command to starburst to create and schedule a materialized view, current stage of code will do the common config items and also requires the model to send a config item for dropping an existing item with the same name if not the same type (view.cte.table)

Checklist

  • [ x] I have run this code in development and it appears to resolve the stated issue
  • [ x] This PR includes tests, or tests are not required/relevant for this PR
  • [ x] README.md updated and added information about my change
  • [ x] I have run changie new to create a changelog entry

Copy link
Copy Markdown
Contributor

@mdesmet mdesmet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please also add tests. You can have a look in the tests directory and the contribution guide on how to get started with testing.

Comment thread dbt/include/trino/macros/materializations/materialized_view.sql Outdated
Comment thread dbt/include/trino/macros/materializations/materialized_view.sql Outdated
{%-set existing_type = 'CTE'%}
{% endif %}

{%- if existing_type is not none and config_drop_any_existing == 'False' %}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

config flags should use boolean type and not strings.

Comment thread dbt/include/trino/macros/materializations/materialized_view.sql Outdated
{%- set build_sql = build_materialized_view(target_relation,config_grace_period,config_max_import_duration,config_cron,config_refresh_interval) %}

{% elif existing_type is not none and config_drop_any_existing == 'True' and existing_type != 'table'%} -- theres an issue with starburst thinking that the materialised view is a table in the information schema, hence the logic on this line
{%- set drop_existing_sql = "DROP " ~ existing_type ~ " IF EXISTS " ~ existing_relation %}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes me wonder if this should not be more generally applied. For example a MV can be replaced by a table.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line is something i need to look into further at some point, theres an issue where it thinks an existing mat view is a table coming from starburst, hence its excluding table on this line, so causes an error due to recieving the wrong type, the error flow looks like this

  • Finds an existing item
    -E xisting item is a materialised view, however schema tables in starburst have marked it as a table incorrectly
  • Decides to run this code to drop a table
  • Error occurs as it runs DROP TABLE against a MATERIALIZED VIEW

So this currently ignores table types and will display an error in console informing user a table already exists and they need to manually handle it (drop, move etc..), otherwise itll run fine if its a materialized view incorrectly typed, due to the create or replace part of the SQL statement further on

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are all good cases that should be integration tested.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See following PR trinodb/trino#15350 where support in system table tables will be improved with the type.

Copy link
Copy Markdown
Member

@damian3031 damian3031 Mar 3, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That mentioned PR was reverted here
issue about TABLE_TYPE for materialized views is reopened here


{%- set sqlcode= "CREATE OR REPLACE MATERIALIZED VIEW " ~ target_relation ~ " WITH (
" ~ schedule_to_use ~
"grace_period = \'" ~ config_grace_period ~ "\',
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why the single quotes are escaped? I would think it is not required.

Copy link
Copy Markdown
Contributor Author

@Jay-code0 Jay-code0 Dec 15, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Its so they land correct in starburst, with the escape it looks like this
WITH(
config_item = '1234',
)

without the single quotes it'll produce something like
WITH(
config_item=1234,
)

Ive chose this format due to it working consistently and noticed issues with different values when the code is run manually on starburst, for example a value of 5.33h, so thought its safer keeping the '' in every query sent

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My point is within a double quoted string, I don't think it's needed to escape single quotes. We should definitely include the single quotes based on the type of the property.

Comment thread dbt/include/trino/macros/materializations/materialized_view.sql Outdated
Comment thread .changes/unreleased/Features-20221215-112009.yaml Outdated
Comment thread .changes/unreleased/Features-20221215-112009.yaml Outdated
@mdesmet
Copy link
Copy Markdown
Contributor

mdesmet commented Dec 15, 2022

@damian3031, @hovaesco, @findinpath :

As a general question: should we also do a REFRESH of the materialized view within the model exectuion. I think it would be expected or at least should be configurable with a sensible default.

@hovaesco
Copy link
Copy Markdown
Contributor

Thanks @Jay-code0 for opening this PR.

As a general question: should we also do a REFRESH of the materialized view within the model exectuion. I think it would be expected or at least should be configurable with a sensible default.

Yes, just put it under a new config flag.

@findinpath
Copy link
Copy Markdown
Collaborator

@Jay-code0 thank you for your contribution.

It would be helpful to understand what problem are we trying to solve with this new materialization.

As far as I know, a materialized view is supposedly created once and then eventually refreshed, if needed, either explicitly by the user or on the fly while doing a SELECT on the materialized view by the engine in case the engine notices that the MV is stale (it happens in Iceberg).

If the problem that you're tackling is being able to refresh MVs at the end of the dbt flow, shouldn't this be part of a post hook in case that the dbt transformations completed successfully ? This could be modelled in airflow / Argo.

Another approach to refresh materialized views could be dbt hooks

https://docs.getdbt.com/reference/resource-configs/pre-hook-post-hook

In any case, let's think what kind of problem we are trying to solve and whether a new materialization is the answer for this problem before proceeding.

@mdesmet
Copy link
Copy Markdown
Contributor

mdesmet commented Dec 15, 2022

The use case @Jay-code0 mentioned on dbt slack is to being able to do MV refreshes without having to do it through an external orchestration framework.

I think in most cases people will want to refresh the MV when they do a dbt run, so I would still include it in the config in that case opposed to a post hook (which is rather for unexpected customizations).

@Jay-code0, @findinpath :

PTAL at https://github.com/dbt-labs/dbt-labs-experimental-features/tree/main/materialized-views for inspiration

@Jay-code0 Jay-code0 force-pushed the feature/new-materialized-view branch from d08d755 to 463ea85 Compare January 3, 2023 14:32
@hovaesco
Copy link
Copy Markdown
Contributor

Hi @Jay-code0, do you need any help to move this PR forward?

@Jay-code0
Copy link
Copy Markdown
Contributor Author

Hi @hovaesco, I cant progress at the moment due to lack of access to the tools on my side, id appreciate if someone can pickup and develop the tests for it though to move it forward :) (i believe only tests are left to be done)

@mdesmet
Copy link
Copy Markdown
Contributor

mdesmet commented Mar 9, 2023

We have added support for MV in current master and added you as a co-author. Thanks!

@mdesmet mdesmet closed this Mar 9, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants