-
Notifications
You must be signed in to change notification settings - Fork 57
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Athena Connection Support #91
Conversation
Add an Athena URI builder to make_connection_string(), assuming for now that Athena is the only connection when an AWS connection type is given. This is an incorrect assumption, but we currently do not have asks for other use cases figuring out how to differentiate these may be a non-trivial issue. Signed-off-by: Benji Lampel <[email protected]>
for more information, see https://pre-commit.ci
…sure if this is correct use of params... Signed-off-by: Benji Lampel <[email protected]>
for more information, see https://pre-commit.ci
@diman82 let me know if this is working for you and I'll merge+release |
@denimalpaca no problem, it'll just take some time, as I'm facing another issue, that blocks me from testing (and I need to create a new environment for testing) |
Hello, thank you for this PR! Just when I needed to add Athena validation to a project. I have a particular use-case that is conflicting with my combination of parameters: GreatExpectationsOperator(
task_id="my_gx_validation",
data_asset_name="some_data_asset_name_not_important",
query_to_validate="SELECT * FROM my_db.my_table WHERE dt = '2023-02-07-03",
conn_id="aws_default",
checkpoint_name="my_checkpoint",
data_context_root_dir=ge_root_dir,
) I have the athena connection string specified in my datasources in the great_expectations config: datasources:
awsathena_datasource:
module_name: great_expectations.datasource
data_connectors:
default_runtime_data_connector_name:
module_name: great_expectations.datasource.data_connector
batch_identifiers:
- default_identifier_name
class_name: RuntimeDataConnector
default_inferred_data_connector_name:
module_name: great_expectations.datasource.data_connector
include_schema_name: true
class_name: InferredAssetSqlDataConnector
execution_engine:
module_name: great_expectations.execution_engine
connection_string: awsathena+rest://@athena.us-east-1.amazonaws.com?s3_staging_dir=s3://my-athena-results-bucket
class_name: SqlAlchemyExecutionEngine
class_name: Datasource My checkpoint config: name: my_checkpoint
config_version: 1.0
template_name:
module_name: great_expectations.checkpoint
class_name: Checkpoint
run_name_template: '%Y%m%d-%H%M%S-my-run-name-template'
expectation_suite_name: my_expectation_suite_name
batch_request: {}
action_list:
- name: store_validation_result
action:
class_name: StoreValidationResultAction
- name: store_evaluation_params
action:
class_name: StoreEvaluationParametersAction
- name: update_data_docs
action:
class_name: UpdateDataDocsAction
site_names: []
evaluation_parameters: {}
runtime_configuration: {}
validations:
- batch_request:
datasource_name: awsathena_datasource
data_connector_name: default_inferred_data_connector_name
data_asset_name: my_data_asset_name
data_connector_query:
index: -1
expectation_suite_name: my_expectation_suite_name
profilers: []
ge_cloud_id:
expectation_suite_ge_cloud_id: So, if I try to remove the GreatExpectationsOperator(
task_id="my_gx_validation",
#data_asset_name="some_data_asset_name_not_important",
query_to_validate="SELECT * FROM my_db.my_table WHERE dt = '2023-02-07-03",
conn_id="aws_default",
checkpoint_name="my_checkpoint",
data_context_root_dir=ge_root_dir,
) in order to instead let the operator pick the existing datasource from the checkpoint, the operator fails at line 199 during the constructor validation: # A data asset name is also used to determine if a runtime env will be used; if it is not passed in,
# then the data asset name is assumed to be configured in the data context passed in.
if (self.is_dataframe or self.query_to_validate or self.conn_id) and not self.data_asset_name:
raise ValueError("A data_asset_name must be specified with a runtime_data_source or conn_id.") Is it possible to have this issue addressed in this PR ? Or maybe I'm not using the correct combination of parameters 😄 Thanks again for your work denimalpaca! |
@denimalpaca OK, so I've setup the following code in my dag:
And I get the following error message:
|
@diman82 can you try installing the package |
Hey @deathwebo , in this case you should remove the Or, conversely, remove the |
@diman82 have you had a chance to test this? |
@denimalpaca Sorry, awas too busy last 2 weeks. |
@diman82 any news here? Would love to merge this PR and do a release. |
Add an Athena URI builder to make_connection_string(), assuming for now that Athena is the only connection when an AWS connection type is given. This is an incorrect assumption, but we currently do not have asks for other use cases figuring out how to differentiate these may be a non-trivial issue. Signed-off-by: Benji Lampel <[email protected]>
for more information, see https://pre-commit.ci
…sure if this is correct use of params... Signed-off-by: Benji Lampel <[email protected]>
for more information, see https://pre-commit.ci
…ovider-great-expectations into add_athena_support
Signed-off-by: Benji Lampel <[email protected]>
Signed-off-by: Benji Lampel <[email protected]>
for more information, see https://pre-commit.ci
Signed-off-by: Benji Lampel <[email protected]>
…ovider-great-expectations into add_athena_support
Add an Athena URI builder to make_connection_string(), assuming for now that Athena is the only connection when an AWS connection type is given. This is an incorrect assumption, but we currently do not have asks for other use cases figuring out how to differentiate these may be a non-trivial issue.
Closes: #90