-
Notifications
You must be signed in to change notification settings - Fork 54
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use Snowflake provider to build connection string #98
Use Snowflake provider to build connection string #98
Conversation
82e2741
to
19845a5
Compare
# which is necessary for temp tables. | ||
hook.schema = self.schema or hook.schema | ||
|
||
conn = hook.get_connection(self.conn_id) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How is this different than the existing self.conn
? Same with hook.schema
above -- it's already the same as self.conn.schema if one isn't provided.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The hook doesn't know about the operator's schema
argument that is why it needs to be explicitly set on line 264. Using the hook to get the connection just seemed more in line with the reasoning behind this change, which is to use the Snowflake provider and avoid duplicating the code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Really appreciate this PR, did have a few specific questions here I'd like addressed before merging. Test coverage looks good, have you tried this in a dev environment?
except ImportError: | ||
self.log.warning( | ||
( | ||
"Snowflake provider package not available, " | ||
"attempt to manually build connection. " | ||
"Key-based auth is not supported." | ||
) | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the Snowflake Provider isn't provided, a Snowflake connection can't be made, so the conn_type should not possibly be able to be Snowflake anyway.
This import error is also misleading, because the import error could also be one of the cryptography packages, which should be specified somewhere that they're needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left out the other packages from the warning message because cryptography
is a snowflake dependency.
It is technically possible to create a connection of any type without using its provider's package like so:
from airflow.models.connection import Connection
connection = Connection(
conn_type="snowflake",
conn_id="foo",
# ..etc
)
but, if you want to assume the provider package must be available. I can reduce the code and remove the entire except
block with the existing behaviour. Please let me know what you want me to do.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's keep the try/except because of the example you provided, I'm convinced it's worthwhile. I just want to make sure that the block is in the right place, I still think it's incorrect to go into the try/except block first then ask whether the user is using a private key or not.
The ImportError warning message could also be a bit more clear: attempt to manually build connection.
could be something like attempting to build connection uri from connection components...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I updated the log message and after moving the hook-logic into its own function, I unindented the original code. I think it is more clear now. Please let me know what you think.
snowflake_account = ( | ||
self.conn.extra_dejson.get("account") or self.conn.extra_dejson["extra__snowflake__account"] | ||
) | ||
snowflake_region = ( | ||
self.conn.extra_dejson.get("region") or self.conn.extra_dejson["extra__snowflake__region"] | ||
) | ||
snowflake_database = ( | ||
self.conn.extra_dejson.get("database") or self.conn.extra_dejson["extra__snowflake__database"] | ||
) | ||
snowflake_warehouse = ( | ||
self.conn.extra_dejson.get("warehouse") or self.conn.extra_dejson["extra__snowflake__warehouse"] | ||
) | ||
snowflake_role = self.conn.extra_dejson.get("role") or self.conn.extra_dejson["extra__snowflake__role"] | ||
|
||
uri_string = f"snowflake://{self.conn.login}:{self.conn.password}@{snowflake_account}.{snowflake_region}/{snowflake_database}/{self.schema}?warehouse={snowflake_warehouse}&role={snowflake_role}" # noqa |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this path only going to be taken if an ImportError occurs? I think the check for a private key file needs to happen first, then the try
for importing packages makes sense.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't it better to use the SnowflakeHook
if it is available? This isn't just covering the private key scenario. There are other cases that the hook handles gracefully. For example, if the connection has no role defined.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So for this case I do see that the url is different than the connection uri, so it does make sense to use SnowflakeHook. This is getting complicated enough that I think the Snowflake portion can be moved out to it's own function, but I don't think the logic has to change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I moved this into its own function.
Thank you for your feedback @denimalpaca . Yes this has been running on our development environments since I opened this PR. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a few changes and I think this is good to go
snowflake_account = ( | ||
self.conn.extra_dejson.get("account") or self.conn.extra_dejson["extra__snowflake__account"] | ||
) | ||
snowflake_region = ( | ||
self.conn.extra_dejson.get("region") or self.conn.extra_dejson["extra__snowflake__region"] | ||
) | ||
snowflake_database = ( | ||
self.conn.extra_dejson.get("database") or self.conn.extra_dejson["extra__snowflake__database"] | ||
) | ||
snowflake_warehouse = ( | ||
self.conn.extra_dejson.get("warehouse") or self.conn.extra_dejson["extra__snowflake__warehouse"] | ||
) | ||
snowflake_role = self.conn.extra_dejson.get("role") or self.conn.extra_dejson["extra__snowflake__role"] | ||
|
||
uri_string = f"snowflake://{self.conn.login}:{self.conn.password}@{snowflake_account}.{snowflake_region}/{snowflake_database}/{self.schema}?warehouse={snowflake_warehouse}&role={snowflake_role}" # noqa |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So for this case I do see that the url is different than the connection uri, so it does make sense to use SnowflakeHook. This is getting complicated enough that I think the Snowflake portion can be moved out to it's own function, but I don't think the logic has to change.
except ImportError: | ||
self.log.warning( | ||
( | ||
"Snowflake provider package not available, " | ||
"attempt to manually build connection. " | ||
"Key-based auth is not supported." | ||
) | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's keep the try/except because of the example you provided, I'm convinced it's worthwhile. I just want to make sure that the block is in the right place, I still think it's incorrect to go into the try/except block first then ask whether the user is using a private key or not.
The ImportError warning message could also be a bit more clear: attempt to manually build connection.
could be something like attempting to build connection uri from connection components...
19845a5
to
9d414d0
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok last two small nits then I think we're done! Appreciate your patience here.
self.log.warning( | ||
( | ||
"Snowflake provider package could not be imported, " | ||
"attempt to build connection uri from %s " |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: attempt --> attempting
I think the "Key-based auth is not supported" should also come with a note of how to make it supported.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I reworded the message.
9d414d0
to
8c08886
Compare
|
||
def test_great_expectations_operator__make_connection_string_snowflake(mocker): | ||
test_conn_conf = { | ||
"url": URL.create( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@denimalpaca here is another reason why I used url
instead of connection_string
. The SQL Alchemy URL
allows to build the exact same string which makes the test less brittle. I could of course change this to {"connection_string": URL.create(...)}
if you want me to.
This change uses the SnowflakeHook from the
airflow.providers.snowflake.hooks.snowflake
module to build the connection string.Great Expectations expects a different configuration structure for a datasource that uses Snowflake and private key authentication. PR #93 does not cover all possible ways a Snowflake connection can be configured in Airflow.
I thought it would be best to optionally rely on the Snowflake provider package and fall back to the existing functionality if the provider package is not available (tho I doubt anyone would use Snowflake without the provider package)
Instead of using the
SnowflakeHook
, I could duplicate the behaviour from the hook into the GE Operator? It is up to the maintainer imho.Please let me know if you have any questions/concerns or would like me to change this in any way.