-
Notifications
You must be signed in to change notification settings - Fork 294
Ensure local docs operators generate necessary files for datahub integration #655
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
564e3da
c118fc1
35b630e
9a6b346
6225db8
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -538,13 +538,85 @@ class DbtDocsLocalOperator(DbtLocalBaseOperator): | |
|
|
||
| ui_color = "#8194E0" | ||
|
|
||
| required_files = ["index.html", "manifest.json", "graph.gpickle", "catalog.json"] | ||
| required_files = ["index.html", "manifest.json", "graph.gpickle", "catalog.json", "run_results.json"] | ||
|
|
||
| def __init__(self, **kwargs: Any) -> None: | ||
| super().__init__(**kwargs) | ||
| self.base_cmd = ["docs", "generate"] | ||
|
|
||
|
|
||
| class DbtFreshnessLocalOperator(DbtLocalBaseOperator): | ||
| """ | ||
| Executes `dbt source freshness` command. | ||
| Use the `callback parameter to specify a callback function to run after the command completes. | ||
| """ | ||
|
|
||
| ui_color = "#8194E0" | ||
|
|
||
| def __init__(self, **kwargs: Any) -> None: | ||
| super().__init__(**kwargs) | ||
| self.base_cmd = ["source", "freshness"] | ||
|
|
||
|
|
||
| class DbtFreshnessS3LocalOperator(DbtFreshnessLocalOperator): | ||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Similar here, I wonder if |
||
| """ | ||
| Executes `dbt source freshness` command and upload to S3 storage. Returns the S3 path to the generated documentation. | ||
|
|
||
| :param aws_conn_id: S3's Airflow connection ID | ||
| :param bucket_name: S3's bucket name | ||
| :param folder_dir: This can be used to specify under which directory the generated DBT documentation should be | ||
| uploaded. | ||
| """ | ||
|
|
||
| ui_color = "#FF9900" | ||
|
|
||
| def __init__( | ||
| self, | ||
| connection_id: str, | ||
| bucket_name: str, | ||
| folder_dir: str | None = None, | ||
| **kwargs: str, | ||
| ) -> None: | ||
| "Initializes the operator." | ||
| self.connection_id = connection_id | ||
| self.bucket_name = bucket_name | ||
| self.folder_dir = folder_dir | ||
|
|
||
| super().__init__(**kwargs) | ||
|
|
||
| # override the callback with our own | ||
| self.callback = self.upload_to_s3 | ||
|
|
||
| def upload_to_s3(self, project_dir: str) -> None: | ||
| "Uploads the generated documentation to S3." | ||
| logger.info( | ||
| 'Attempting to upload generated docs to S3 using S3Hook("%s")', | ||
| self.connection_id, | ||
| ) | ||
|
|
||
| from airflow.providers.amazon.aws.hooks.s3 import S3Hook | ||
|
|
||
| target_dir = f"{project_dir}/target" | ||
|
|
||
| hook = S3Hook( | ||
| self.connection_id, | ||
| extra_args={ | ||
| "ContentType": "text/html", | ||
| }, | ||
| ) | ||
|
|
||
| logger.info("Uploading %s to %s", "sources.json", f"s3://{self.bucket_name}/sources.json") | ||
|
|
||
| key = f"{self.folder_dir}/sources.json" if self.folder_dir else "sources.json" | ||
|
|
||
| hook.load_file( | ||
| filename=f"{target_dir}/sources.json", | ||
| bucket_name=self.bucket_name, | ||
| key=key, | ||
| replace=True, | ||
| ) | ||
|
|
||
|
|
||
| class DbtDocsCloudLocalOperator(DbtDocsLocalOperator, ABC): | ||
| """ | ||
| Abstract class for operators that upload the generated documentation to cloud storage. | ||
|
|
||
| Original file line number | Diff line number | Diff line change | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
|
@@ -10,6 +10,7 @@ Many users choose to generate and serve these docs on a static website. This is | |||||||||||
| Cosmos offers two pre-built ways of generating and uploading dbt docs and a fallback option to run custom code after the docs are generated: | ||||||||||||
|
|
||||||||||||
| - :class:`~cosmos.operators.DbtDocsS3Operator`: generates and uploads docs to a S3 bucket. | ||||||||||||
| - :class:`~cosmos.operators.DbtFreshnessS3Operator`: generates and uploads `sources.json <https://docs.getdbt.com/reference/artifacts/sources-json>`_ doc to an S3 bucket | ||||||||||||
| - :class:`~cosmos.operators.DbtDocsAzureStorageOperator`: generates and uploads docs to an Azure Blob Storage. | ||||||||||||
| - :class:`~cosmos.operators.DbtDocsGCSOperator`: generates and uploads docs to a GCS bucket. | ||||||||||||
| - :class:`~cosmos.operators.DbtDocsOperator`: generates docs and runs a custom callback. | ||||||||||||
|
|
@@ -41,6 +42,15 @@ You can use the :class:`~cosmos.operators.DbtDocsS3Operator` to generate and upl | |||||||||||
| bucket_name="test_bucket", | ||||||||||||
| ) | ||||||||||||
|
|
||||||||||||
| generate_dbt_freshness_aws = DbtFreshnessS3Operator( | ||||||||||||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It may be worth to add a comment on why someone would like to send the source freshness files to S3, and perhaps a link to your blog post: https://parakeet.solutions/ingest-cosmos-dbt-into-datahub/, so people can have more |
||||||||||||
| task_id="generate_dbt_freshness_aws", | ||||||||||||
| project_dir="path/to/jaffle_shop", | ||||||||||||
| profile_config=profile_config, | ||||||||||||
| # docs-specific arguments | ||||||||||||
| connection_id="test_aws", | ||||||||||||
| bucket_name="test_bucket", | ||||||||||||
| ) | ||||||||||||
|
|
||||||||||||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I just realised we're adding the code manually in this doc code. WDYT if we added a reference to the example DAG, similar to what we did in:
astronomer-cosmos/docs/getting_started/execution-modes.rst Lines 58 to 61 in 188fe56
This way, the documentation will be up-to-date if the operator changes without additional effort. |
||||||||||||
| Upload to Azure Blob Storage | ||||||||||||
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | ||||||||||||
|
|
||||||||||||
|
|
||||||||||||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor comment: what do you think about renaming
DbtFreshnessLocalOperatortoDbtSourceFreshnessLocalOperator?