Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue in connecting the GX operator with Snowflake data #87

Closed
Bowrna opened this issue Jan 30, 2023 · 12 comments
Closed

Issue in connecting the GX operator with Snowflake data #87

Bowrna opened this issue Jan 30, 2023 · 12 comments

Comments

@Bowrna
Copy link

Bowrna commented Jan 30, 2023

I am currently using the following version:
airflow-provider-great-expectations==0.2.0

I am trying to run Great Expectations operator and pass the snowflake connection id and execute the query.

My code throws out the following error trace:

AIRFLOW_CTX_DAG_RUN_ID=manual__2023-01-30T10:18:21.654701+00:00
[2023-01-30, 10:18:36 UTC] {great_expectations.py:470} INFO - Running validation with Great Expectations...
[2023-01-30, 10:18:36 UTC] {great_expectations.py:472} INFO - Instantiating Data Context...
[2023-01-30, 10:18:36 UTC] {base.py:71} INFO - Using connection ID 'snowflake_conn' for task execution.
[2023-01-30, 10:18:36 UTC] {taskinstance.py:1851} ERROR - Task failed with exception
Traceback (most recent call last):
  File "/usr/local/lib/python3.8/site-packages/great_expectations_provider/operators/great_expectations.py", line 474, in execute
    self.build_runtime_datasources()
  File "/usr/local/lib/python3.8/site-packages/great_expectations_provider/operators/great_expectations.py", line 368, in build_runtime_datasources
    self.build_runtime_sql_datasource_config_from_conn_id()
  File "/usr/local/lib/python3.8/site-packages/great_expectations_provider/operators/great_expectations.py", line 304, in build_runtime_sql_datasource_config_from_conn_id
    "connection_string": self.make_connection_string(),
  File "/usr/local/lib/python3.8/site-packages/great_expectations_provider/operators/great_expectations.py", line 251, in make_connection_string
    uri_string = f"snowflake://{self.conn.login}:{self.conn.password}@{self.conn.extra_dejson['extra__snowflake__account']}.{self.conn.extra_dejson['extra__snowflake__region']}/{self.conn.extra_dejson['extra__snowflake__database']}/{self.conn.schema}?warehouse={self.conn.extra_dejson['extra__snowflake__warehouse']}&role={self.conn.extra_dejson['extra__snowflake__role']}"  # noqa
KeyError: 'extra__snowflake__account'

On logging the conn in Airflow using the below code:

  @task
  def test_conn():
    from airflow.hooks.base import BaseHook
    conn = BaseHook.get_connection(SNOWFLAKE_CONN_ID)
    task_logger.info(f"connection info {conn.extra_dejson} %s")
Below is the information logged:
[2023-01-30, 10:18:28 UTC] {base.py:71} INFO - Using connection ID 'snowflake_conn' for task execution.
[2023-01-30, 10:18:28 UTC] {test_gx_snowflake.py:265} INFO - connection info {'account': 'rtb82372.us-east-1', 'insecure_mode': False, 'database': 'SFSALES_SFC_SAMPLES_VA3_SAMPLE_DATA', 'warehouse': 'XLARGEWH'} %s
[2023-01-30, 10:18:28 UTC] {python.py:177} INFO - Done. Returned value was: None

Reference issue at airflow side:
apache/airflow#26764

@Bowrna
Copy link
Author

Bowrna commented Jan 31, 2023

The above issue occur in the latest version of the great-expectations-operator too:
Following is the log trace for it

Traceback (most recent call last):
  File "/usr/local/lib/python3.8/site-packages/great_expectations_provider/operators/great_expectations.py", line 474, in execute
    self.build_runtime_datasources()
  File "/usr/local/lib/python3.8/site-packages/great_expectations_provider/operators/great_expectations.py", line 383, in build_runtime_datasources
    self.datasource = self.build_runtime_sql_datasource_config_from_conn_id()
  File "/usr/local/lib/python3.8/site-packages/great_expectations_provider/operators/great_expectations.py", line 320, in build_runtime_sql_datasource_config_from_conn_id
    "connection_string": self.make_connection_string(),
  File "/usr/local/lib/python3.8/site-packages/great_expectations_provider/operators/great_expectations.py", line 255, in make_connection_string
    "account", self.conn.extra_dejson["extra__snowflake__account"]
KeyError: 'extra__snowflake__account'

@denimalpaca
Copy link
Contributor

#84 should have closed this issue - can you please confirm you're using the operator version 0.2.4?

@denimalpaca
Copy link
Contributor

@Bowrna can you confirm this is still an issue with the newest version? Thanks

@Bowrna
Copy link
Author

Bowrna commented Feb 8, 2023

@denimalpaca this issue is persisting in the version airflow-provider-great-expectations==0.2.4

I have verified it again

@Bowrna
Copy link
Author

Bowrna commented Feb 8, 2023

I have verified the code in this link https://github.com/astronomer/airflow-provider-great-expectations/blob/main/great_expectations_provider/operators/great_expectations.py#L253-L266

snowflake_account = self.conn.extra_dejson.get(
                "account", self.conn.extra_dejson["extra__snowflake__account"]
            )

in latest version of airflow, self.conn.extra_dejson dict doesn't have the key "extra__snowflake__account". so the above code starts to throw error.

We could handle it by changing the code like below

snowflake_account = self.conn.extra_dejson.get(
                "extra__snowflake__account", self.conn.extra_dejson.get("account", None)
            )

@denimalpaca
Copy link
Contributor

I think that code should throw and error when no account is specified; do you think the wrong error is being thrown? The key error KeyError: 'extra__snowflake__account' should only occur now if no account is given. I don't think your fix would provide the correct behavior, as the error would then be propagated to the Great Expectations libraries when an invalid URI is passed.

Are you unable to use the operator because of the current behavior?

@mpgreg
Copy link
Contributor

mpgreg commented Feb 14, 2023

As Tamara noted... snowflake connector changed in 4.0.2: https://airflow.apache.org/docs/apache-airflow-providers-snowflake/stable/index.html#id2. Will post a PR here shortly.

@Bowrna
Copy link
Author

Bowrna commented Feb 14, 2023

I think that code should throw and error when no account is specified; do you think the wrong error is being thrown? The key error KeyError: 'extra__snowflake__account' should only occur now if no account is given. I don't think your fix would provide the correct behavior, as the error would then be propagated to the Great Expectations libraries when an invalid URI is passed.

Are you unable to use the operator because of the current behavior?

Yes, i am unable to use the operator because of the current behavior. I had to run the snowflake query in separate task and pass the result to the GX operator rather than passing the snowflake query directly to GX operator

@mpgreg
Copy link
Contributor

mpgreg commented Feb 15, 2023

PR for this #95

denimalpaca pushed a commit that referenced this issue Feb 16, 2023
* Update great_expectations.py

* Update great_expectations.py

* Update great_expectations.py

* Update great_expectations.py

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
@denimalpaca
Copy link
Contributor

@Bowrna I'm releasing a fix from @mpgreg 's PR, this should resolve the issue.

I don't feel we've come to a resolution about the account key error. I do think that as long as a Snowflake account is provided to the connection, this shouldn't be a problem in any Airflow version.

@Bowrna
Copy link
Author

Bowrna commented Feb 17, 2023

thanks @denimalpaca
yes this fix should solve the issue

@denimalpaca
Copy link
Contributor

Hi @Bowrna I just released version 0.2.5 with this fix. Will close the issue!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants