Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for bigger varchar field in surrogate key #14

Closed
wants to merge 1 commit into from

Conversation

jkock
Copy link

@jkock jkock commented Dec 30, 2020

Changed line 33 to allow for bigger varchar fields.
Original:

cast(" ~ field ~ " as " ~ dbt_utils.string() ~ ")

which translates to

CAST( {{ field }} AS varchar(30))

New

CAST( {{ field }} AS varchar(max))

@dataders
Copy link
Contributor

newb question, but length 30 is what is used by the other adapters (BigQuery Redshift and Snowflake)? My guess is these databases don't have a max option?

@jkock
Copy link
Author

jkock commented Dec 30, 2020

newb question, but length 30 is what is used by the other adapters (BigQuery Redshift and Snowflake)? My guess is these databases don't have a max option?

These databases generally do not have a length, since they are column store based and use compression which is independent of length

For SQL server, the default length is 30 when not specified when using CAST or CONVERT, see also this link.

To be even more precise, the length is not characters, but bytes, which with an encoding of for example UTF-8 can be different than the number of characters

I would have liked to change the dbt_utils.string() function, but it does not accept any parameters, so I think this is the best bet.

@dataders
Copy link
Contributor

thanks for the explanation!

Just curious what parameters would you like dbt_utils.string() to accept?

in a normal PR, I'd have you delete this this line so the test passes, but this won't work until we get dbt-labs/dbt-utils#312 through so that assert_equal is working.

test_surrogate_key: *disabled

@alittlesliceoftom
Copy link
Contributor

This change looks excellent in that it solves the problem we have.

It does slightly increase the complexity of the project as we're changing deep in the file, very customised.

We could instead consider setting in the datatypes of tsql-utils - https://github.com/dbt-msft/tsql-utils/blob/master/macros/dbt_utils/cross_db_utils/datatypes.sql

Simply changing VARCHAR to VARCHAR(MAX).

I'd slightly prefer at this point to set to a smaller number like 8000 (max without offering risk of going into LOB) or 900 (max whilst still allowing indexes to set on it.) - https://stackoverflow.com/questions/2091284/varcharmax-everywhere

What do you think of setting our default string to VARCHAR(900)

@alittlesliceoftom
Copy link
Contributor

Added the alternative for your consideration.
#15

@dataders
Copy link
Contributor

dataders commented Jan 5, 2021

moving forward with #15 instead, until we think of a better way to override dbt_utils.string()

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants