-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
get_column_schema_from_query macro #6986
Changes from all commits
e8399cd
0a9391a
02d2c5c
bef70fc
e1ed713
80d622a
343d95b
425520a
266bdec
02ad29d
b629e34
d3a4412
5a796f2
99a4ef7
de39489
02cd444
5858095
60fa84e
a630793
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,6 @@ | ||
kind: Features | ||
body: get_column_schema_from_query_macro | ||
time: 2023-02-22T13:06:32.583743-05:00 | ||
custom: | ||
Author: jtcohen6 michelleark | ||
Issue: "6751" |
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -17,23 +17,55 @@ | |
{% endmacro %} | ||
|
||
|
||
{% macro get_empty_subquery_sql(select_sql) -%} | ||
{{ return(adapter.dispatch('get_empty_subquery_sql', 'dbt')(select_sql)) }} | ||
{% endmacro %} | ||
|
||
{# | ||
Builds a query that results in the same schema as the given select_sql statement, without necessitating a data scan. | ||
Useful for running a query in a 'pre-flight' context, such as model contract enforcement (assert_columns_equivalent macro). | ||
#} | ||
{% macro default__get_empty_subquery_sql(select_sql) %} | ||
MichelleArk marked this conversation as resolved.
Show resolved
Hide resolved
|
||
select * from ( | ||
{{ select_sql }} | ||
) as __dbt_sbq | ||
where false | ||
limit 0 | ||
{% endmacro %} | ||
|
||
|
||
{% macro get_empty_schema_sql(columns) -%} | ||
{{ return(adapter.dispatch('get_empty_schema_sql', 'dbt')(columns)) }} | ||
{% endmacro %} | ||
|
||
{% macro default__get_empty_schema_sql(columns) %} | ||
select | ||
{% for i in columns %} | ||
{%- set col = columns[i] -%} | ||
cast(null as {{ col['data_type'] }}) as {{ col['name'] }}{{ ", " if not loop.last }} | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We talked about this. This can possibly lead to some weird type resolution outcomes...we think. This is so far the "best" option and so far looks promising. I just know SQLs typing mechanisms can get a mind of their own for the worse. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @VersusFacit I had the same concern, and discussed it with Michelle synchronously. On the plus side, this approach will automatically account for any new types which appear, so if it works in practice it will be a lot easier than trying to maintain our own list of type aliases. I like that. I was also reassured that this code path will only affect people using contracts, so there isn't much regression risk and we should hear pretty quickly if there are databses/drivers that this approach doesn't work for. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. One more thing this approach has going for it is that there's a non-opaque definition of what There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. this discussion should be seen as a context dump for posterity and is not blocking. |
||
{%- endfor -%} | ||
{% endmacro %} | ||
|
||
{% macro get_column_schema_from_query(select_sql) -%} | ||
{% set columns = [] %} | ||
{# -- Using an 'empty subquery' here to get the same schema as the given select_sql statement, without necessitating a data scan.#} | ||
{% set sql = get_empty_subquery_sql(select_sql) %} | ||
{% set column_schema = adapter.get_column_schema_from_query(sql) %} | ||
{{ return(column_schema) }} | ||
{% endmacro %} | ||
|
||
-- here for back compat | ||
{% macro get_columns_in_query(select_sql) -%} | ||
{{ return(adapter.dispatch('get_columns_in_query', 'dbt')(select_sql)) }} | ||
{% endmacro %} | ||
|
||
{% macro default__get_columns_in_query(select_sql) %} | ||
{% call statement('get_columns_in_query', fetch_result=True, auto_begin=False) -%} | ||
select * from ( | ||
{{ select_sql }} | ||
) as __dbt_sbq | ||
where false | ||
limit 0 | ||
{{ get_empty_subquery_sql(select_sql) }} | ||
{% endcall %} | ||
|
||
{{ return(load_result('get_columns_in_query').table.columns | map(attribute='name') | list) }} | ||
{% endmacro %} | ||
|
||
|
||
{% macro alter_column_type(relation, column_name, new_column_type) -%} | ||
{{ return(adapter.dispatch('alter_column_type', 'dbt')(relation, column_name, new_column_type)) }} | ||
{% endmacro %} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was interested in seeing a tangible example of how data type codes map to a string representation for a database connector.
This table from the Snowflake docs was useful to me:
https://docs.snowflake.com/en/user-guide/python-connector-api#label-python-connector-type-codes
(Side note: I suspect there is a typo for code 8 and
TIMESTAMP_TZ
there should beTIMESTAMP_NTZ
instead.)