Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add ability to specify custom hostname for Snowflake connection #2109

Merged
merged 3 commits into from
Jul 11, 2024

Conversation

whummer
Copy link
Contributor

@whummer whummer commented Jun 26, 2024

Add ability to specify custom host/port for Snowflake connections.

Motivation: At LocalStack, we have recently started building a Snowflake emulator that allows running SF queries entirely on the local machine: https://blog.localstack.cloud/2024-05-22-introducing-localstack-for-snowflake/ . As part of building out a repo with sample applications (see here), we are putting together an illustrative Soda application that connects to the local Snowflake emulator (running on localhost) and performs various data quality checks.

We have been able to get it working already by manually patching the soda-core library in the local Pyhon path.

$ cat configuration.yml
 data_source local_snowflake:
   type: snowflake
   connection:
     host: snowflake.localhost.localstack.cloud
     port: 4566
     username: test
     password: test
     ...

... and then:

$ soda scan --verbose -d local_snowflake -c configuration.yml checks.yml
...
[16:25:11] Scan summary:
[16:25:11] 2/2 queries OK
[16:25:11]   local_snowflake.test_table.aggregation[0] [OK] 0:00:00.034070
[16:25:11]   local_snowflake.test_table.schema[test_table] [OK] 0:00:00.040392
[16:25:11] 1/3 checks PASSED:
[16:25:11]     test_table in local_snowflake
[16:25:11]       No NULL values [checks.yml] [PASSED]
[16:25:11]         check_value: 0
[16:25:11] 1/3 checks FAILED:
[16:25:11]     test_table in local_snowflake
[16:25:11]       Dataset is unreasonably small [checks.yml] [FAILED]
[16:25:11]         check_value: 2
[16:25:11] 1/3 checks NOT EVALUATED:
[16:25:11]     test_table in local_snowflake
[16:25:11]       No changes to schema [checks.yml] [NOT EVALUATED]
...

It would be awesome if this integration was provided by Soda-core out of the box! Looking forward to getting your feedback.

Please let me know where is the best place to add a description for these new attributes (happy to add a short note in the docs), or if this change should be covered by a unit test. Thanks 馃檶

/cc @m1n0

@CLAassistant
Copy link

CLAassistant commented Jun 26, 2024

CLA assistant check
All committers have signed the CLA.

@m1n0
Copy link
Contributor

m1n0 commented Jun 27, 2024

hi @whummer, thanks for this, it's an interesting use case! It is a minor change and I don't feel like we need to cover it by test, but I see one issue - I would only pass the host/port if value is present, we do not want to force everyone to start specifying host with this change.

Copy link
Contributor

@m1n0 m1n0 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please pass the host/port only if present to endure backward compatibility.

@whummer
Copy link
Contributor Author

whummer commented Jun 27, 2024

Good point, thanks @m1n0 (although I think the Snowflake Python connector would also be able to handle None values for host/port, i.e., if they are not specified by the user).

Updated the PR to pass in keyword arguments (kwargs) which are only set if values are specified by the user. 馃憤

@whummer
Copy link
Contributor Author

whummer commented Jul 2, 2024

Hi @m1n0 , wanted to briefly follow up on this PR here. Do the changes look good to you, anything else you'd like me to add to the PR? Would be awesome to get this one merged, so we can further promote Soda to our LocalStack user base! 馃檶 馃殌

@m1n0
Copy link
Contributor

m1n0 commented Jul 9, 2024

@whummer could you please rename the dict from kwargs to something else, like connection_parameters or something like that so that it does not get confused with the standard python kwargs?

@whummer
Copy link
Contributor Author

whummer commented Jul 9, 2024

Done @m1n0 - renamed to connection_parameters 馃憤

Copy link
Contributor

@m1n0 m1n0 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks for the contrib!

@whummer
Copy link
Contributor Author

whummer commented Jul 11, 2024

Thanks for the review @m1n0 !

Looks like the last CI run failed with the following error:

ERROR    soda.scan:log.py:101 [20:04:27] Could not connect to data source "snowflake": 251005: 251005: User is empty
ERROR    soda.scan:log.py:105   | 251005: 251005: User is empty
ERROR    soda.scan:log.py:112   | Stacktrace:
ERROR    soda.scan:log.py:113   | Traceback (most recent call last):
  |   File "/home/runner/work/soda-core/soda-core/.tox/py/lib/python3.9/site-packages/soda/execution/data_source_manager.py", line 48, in get_data_source
  |     data_source.connect()
  |   File "/home/runner/work/soda-core/soda-core/.tox/py/lib/python3.9/site-packages/soda/data_sources/snowflake_data_source.py", line 110, in connect
  |     self.connection = connector.connect(
  |   File "/home/runner/work/soda-core/soda-core/.tox/py/lib/python3.9/site-packages/snowflake/connector/__init__.py", line 55, in Connect
  |     return SnowflakeConnection(**kwargs)
  |   File "/home/runner/work/soda-core/soda-core/.tox/py/lib/python3.9/site-packages/snowflake/connector/connection.py", line 451, in __init__
  |     self.connect(**kwargs)
  |   File "/home/runner/work/soda-core/soda-core/.tox/py/lib/python3.9/site-packages/snowflake/connector/connection.py", line 716, in connect
  |     self.__config(**kwargs)
  |   File "/home/runner/work/soda-core/soda-core/.tox/py/lib/python3.9/site-packages/snowflake/connector/connection.py", line 1207, in __config
  |     Error.errorhandler_wrapper(
  |   File "/home/runner/work/soda-core/soda-core/.tox/py/lib/python3.9/site-packages/snowflake/connector/errors.py", line 290, in errorhandler_wrapper
  |     handed_over = Error.hand_to_other_handler(
  |   File "/home/runner/work/soda-core/soda-core/.tox/py/lib/python3.9/site-packages/snowflake/connector/errors.py", line 348, in hand_to_other_handler
  |     connection.errorhandler(connection, cursor, error_class, error_value)
  |   File "/home/runner/work/soda-core/soda-core/.tox/py/lib/python3.9/site-packages/snowflake/connector/errors.py", line 221, in default_errorhandler
  |     raise error_class(
  | snowflake.connector.errors.ProgrammingError: 251005: 251005: User is empty

This error seems unrelated to the changes in this PR, 馃 do you have any advice? I've rebased the branch onto latest main - can you please approve the latest workflow run, to see if it is fixed now? Thanks!

@m1n0
Copy link
Contributor

m1n0 commented Jul 11, 2024

The reason snowflake and BQ tests failed is that non-mainteiner issued PRs do not have access to repository secrets for security reasons, it's fine I merged it

@m1n0 m1n0 merged commit 3ace4c9 into sodadata:main Jul 11, 2024
2 checks passed
@whummer whummer deleted the snowflake-host branch July 12, 2024 09:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants