Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use Snowflake provider to build connection string #98

Merged

Conversation

ivanstillfront
Copy link
Contributor

@ivanstillfront ivanstillfront commented Mar 1, 2023

This change uses the SnowflakeHook from the airflow.providers.snowflake.hooks.snowflake module to build the connection string.

Great Expectations expects a different configuration structure for a datasource that uses Snowflake and private key authentication. PR #93 does not cover all possible ways a Snowflake connection can be configured in Airflow.

I thought it would be best to optionally rely on the Snowflake provider package and fall back to the existing functionality if the provider package is not available (tho I doubt anyone would use Snowflake without the provider package)

Instead of using the SnowflakeHook, I could duplicate the behaviour from the hook into the GE Operator? It is up to the maintainer imho.

Please let me know if you have any questions/concerns or would like me to change this in any way.

@ivanstillfront ivanstillfront force-pushed the better_snowflake_connections branch 2 times, most recently from 82e2741 to 19845a5 Compare March 1, 2023 18:54
# which is necessary for temp tables.
hook.schema = self.schema or hook.schema

conn = hook.get_connection(self.conn_id)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How is this different than the existing self.conn? Same with hook.schema above -- it's already the same as self.conn.schema if one isn't provided.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The hook doesn't know about the operator's schema argument that is why it needs to be explicitly set on line 264. Using the hook to get the connection just seemed more in line with the reasoning behind this change, which is to use the Snowflake provider and avoid duplicating the code.

Copy link
Contributor

@denimalpaca denimalpaca left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really appreciate this PR, did have a few specific questions here I'd like addressed before merging. Test coverage looks good, have you tried this in a dev environment?

Comment on lines 303 to 266
except ImportError:
self.log.warning(
(
"Snowflake provider package not available, "
"attempt to manually build connection. "
"Key-based auth is not supported."
)
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the Snowflake Provider isn't provided, a Snowflake connection can't be made, so the conn_type should not possibly be able to be Snowflake anyway.

This import error is also misleading, because the import error could also be one of the cryptography packages, which should be specified somewhere that they're needed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left out the other packages from the warning message because cryptography is a snowflake dependency.

It is technically possible to create a connection of any type without using its provider's package like so:

from airflow.models.connection import Connection

connection = Connection(
  conn_type="snowflake",
  conn_id="foo",
  # ..etc
)

but, if you want to assume the provider package must be available. I can reduce the code and remove the entire except block with the existing behaviour. Please let me know what you want me to do.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's keep the try/except because of the example you provided, I'm convinced it's worthwhile. I just want to make sure that the block is in the right place, I still think it's incorrect to go into the try/except block first then ask whether the user is using a private key or not.

The ImportError warning message could also be a bit more clear: attempt to manually build connection. could be something like attempting to build connection uri from connection components...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I updated the log message and after moving the hook-logic into its own function, I unindented the original code. I think it is more clear now. Please let me know what you think.

Comment on lines 312 to 326
snowflake_account = (
self.conn.extra_dejson.get("account") or self.conn.extra_dejson["extra__snowflake__account"]
)
snowflake_region = (
self.conn.extra_dejson.get("region") or self.conn.extra_dejson["extra__snowflake__region"]
)
snowflake_database = (
self.conn.extra_dejson.get("database") or self.conn.extra_dejson["extra__snowflake__database"]
)
snowflake_warehouse = (
self.conn.extra_dejson.get("warehouse") or self.conn.extra_dejson["extra__snowflake__warehouse"]
)
snowflake_role = self.conn.extra_dejson.get("role") or self.conn.extra_dejson["extra__snowflake__role"]

uri_string = f"snowflake://{self.conn.login}:{self.conn.password}@{snowflake_account}.{snowflake_region}/{snowflake_database}/{self.schema}?warehouse={snowflake_warehouse}&role={snowflake_role}" # noqa
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this path only going to be taken if an ImportError occurs? I think the check for a private key file needs to happen first, then the try for importing packages makes sense.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't it better to use the SnowflakeHook if it is available? This isn't just covering the private key scenario. There are other cases that the hook handles gracefully. For example, if the connection has no role defined.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So for this case I do see that the url is different than the connection uri, so it does make sense to use SnowflakeHook. This is getting complicated enough that I think the Snowflake portion can be moved out to it's own function, but I don't think the logic has to change.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I moved this into its own function.

@ivanstillfront
Copy link
Contributor Author

Really appreciate this PR, did have a few specific questions here I'd like addressed before merging. Test coverage looks good, have you tried this in a dev environment?

Thank you for your feedback @denimalpaca . Yes this has been running on our development environments since I opened this PR.

Copy link
Contributor

@denimalpaca denimalpaca left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a few changes and I think this is good to go

Comment on lines 312 to 326
snowflake_account = (
self.conn.extra_dejson.get("account") or self.conn.extra_dejson["extra__snowflake__account"]
)
snowflake_region = (
self.conn.extra_dejson.get("region") or self.conn.extra_dejson["extra__snowflake__region"]
)
snowflake_database = (
self.conn.extra_dejson.get("database") or self.conn.extra_dejson["extra__snowflake__database"]
)
snowflake_warehouse = (
self.conn.extra_dejson.get("warehouse") or self.conn.extra_dejson["extra__snowflake__warehouse"]
)
snowflake_role = self.conn.extra_dejson.get("role") or self.conn.extra_dejson["extra__snowflake__role"]

uri_string = f"snowflake://{self.conn.login}:{self.conn.password}@{snowflake_account}.{snowflake_region}/{snowflake_database}/{self.schema}?warehouse={snowflake_warehouse}&role={snowflake_role}" # noqa
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So for this case I do see that the url is different than the connection uri, so it does make sense to use SnowflakeHook. This is getting complicated enough that I think the Snowflake portion can be moved out to it's own function, but I don't think the logic has to change.

Comment on lines 303 to 266
except ImportError:
self.log.warning(
(
"Snowflake provider package not available, "
"attempt to manually build connection. "
"Key-based auth is not supported."
)
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's keep the try/except because of the example you provided, I'm convinced it's worthwhile. I just want to make sure that the block is in the right place, I still think it's incorrect to go into the try/except block first then ask whether the user is using a private key or not.

The ImportError warning message could also be a bit more clear: attempt to manually build connection. could be something like attempting to build connection uri from connection components...

Copy link
Contributor

@denimalpaca denimalpaca left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok last two small nits then I think we're done! Appreciate your patience here.

self.log.warning(
(
"Snowflake provider package could not be imported, "
"attempt to build connection uri from %s "
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: attempt --> attempting

I think the "Key-based auth is not supported" should also come with a note of how to make it supported.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I reworded the message.


def test_great_expectations_operator__make_connection_string_snowflake(mocker):
test_conn_conf = {
"url": URL.create(
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@denimalpaca here is another reason why I used url instead of connection_string. The SQL Alchemy URL allows to build the exact same string which makes the test less brittle. I could of course change this to {"connection_string": URL.create(...)} if you want me to.

@denimalpaca denimalpaca merged commit 0f41977 into astronomer:main Mar 13, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants