-
Notifications
You must be signed in to change notification settings - Fork 17.3k
feat(db): add dynamic schema support for athena #36003
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change | ||||
|---|---|---|---|---|---|---|
|
|
@@ -21,6 +21,7 @@ | |||||
|
|
||||||
| from flask_babel import gettext as __ | ||||||
| from sqlalchemy import types | ||||||
| from sqlalchemy.engine.url import URL | ||||||
|
|
||||||
| from superset.constants import TimeGrain | ||||||
| from superset.db_engine_specs.base import BaseEngineSpec | ||||||
|
|
@@ -38,6 +39,7 @@ class AthenaEngineSpec(BaseEngineSpec): | |||||
| disable_ssh_tunneling = True | ||||||
| # Athena doesn't support IS true/false syntax, use = true/false instead | ||||||
| use_equality_for_boolean_filters = True | ||||||
| supports_dynamic_schema = True | ||||||
|
|
||||||
| _time_grain_expressions = { | ||||||
| None: "{col}", | ||||||
|
|
@@ -92,3 +94,41 @@ def _mutate_label(label: str) -> str: | |||||
| :return: Conditionally mutated label | ||||||
| """ | ||||||
| return label.lower() | ||||||
|
|
||||||
| @classmethod | ||||||
| def adjust_engine_params( | ||||||
| cls, | ||||||
| uri: URL, | ||||||
| connect_args: dict[str, Any], | ||||||
| catalog: str | None = None, | ||||||
| schema: str | None = None, | ||||||
| ) -> tuple[URL, dict[str, Any]]: | ||||||
| """ | ||||||
| Adjust the SQLAlchemy URI for Athena with a provided catalog and schema. | ||||||
|
|
||||||
| For AWS Athena the SQLAlchemy URI looks like this: | ||||||
|
|
||||||
| awsathena+rest://athena.{region_name}.amazonaws.com:443/{schema_name}?catalog_name={catalog_name}&s3_staging_dir={s3_staging_dir} | ||||||
| """ | ||||||
| if catalog: | ||||||
| uri = uri.update_query_dict({"catalog_name": catalog}) | ||||||
|
|
||||||
| if schema: | ||||||
| uri = uri.set(database=schema) | ||||||
|
|
||||||
| return uri, connect_args | ||||||
|
|
||||||
| @classmethod | ||||||
| def get_schema_from_engine_params( | ||||||
| cls, | ||||||
| sqlalchemy_uri: URL, | ||||||
| connect_args: dict[str, Any], | ||||||
| ) -> str | None: | ||||||
| """ | ||||||
| Return the configured schema. | ||||||
|
|
||||||
| For AWS Athena the SQLAlchemy URI looks like this: | ||||||
|
|
||||||
| awsathena+rest://athena.{region_name}.amazonaws.com:443/{schema_name}?catalog_name={catalog_name}&s3_staging_dir={s3_staging_dir} | ||||||
| """ | ||||||
| return sqlalchemy_uri.database | ||||||
|
||||||
| return sqlalchemy_uri.database | |
| return sqlalchemy_uri.database or None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do not see the same pattern over the codebase and don't think it's needed. Could you point me out to the code?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The
get_schema_from_engine_paramsmethod should handle the case wheresqlalchemy_uri.databaseisNoneor an empty string. Without this check, accessing.databaseon a URL without a database component could lead to issues. Consider adding a guard to returnNoneifdatabaseis falsy, similar to how Presto and Snowflake handle this (they checkif \"/\" not in database).There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In case
sqlalchemy_uri.databaseisNonetheNoneis returned, no need to returnNoneexplicitly. No additional checks or guards needed.Presto and Snowflake checks whether database contains
/. Because they have different sqlalchemy connection strings and as a result adatabasecould look like{caatalog}/{schema}. In casesqlalchemy_uri.databasedoesn't contain/it means no schemas provided andNoneis returned.