Skip to content

feat(operators): support dbt docs on Kubernetes via DbtDocsS3KubernetesOperator#2058

Merged
pankajastro merged 15 commits into
astronomer:mainfrom
jx2lee:kube-upload-to-cloudstorage
Apr 15, 2026
Merged

feat(operators): support dbt docs on Kubernetes via DbtDocsS3KubernetesOperator#2058
pankajastro merged 15 commits into
astronomer:mainfrom
jx2lee:kube-upload-to-cloudstorage

Conversation

@jx2lee
Copy link
Copy Markdown
Contributor

@jx2lee jx2lee commented Oct 27, 2025

Description

Add a new operator DbtDocsS3KubernetesOperator that runs dbt docs generate on Kubernetes and uploads the generated documentation to AWS S3.

Key changes

  • A new base class DbtDocsKubernetesOperator, extending DbtKubernetesBaseOperator, to handle dbt docs generate execution.
  • Added DbtDocsCloudKubernetesOperator as an abstract class for cloud-specific doc upload implementations.
  • Implemented DbtDocsS3KubernetesOperator, which:
    • Accepts connection_id, bucket_name, and optional folder_dir parameters.
    • Uses S3Hook to upload output files (index.html, manifest.json, catalog.json, static_index.html) to the specified S3 bucket.
  • Updated operator registry to expose the new operator.
  • Added comprehensive unit tests verifying:
    • Correct file selection and upload behavior.
    • Expected dbt docs command execution.
    • Validation of required fields and failure cases.

Example usage

generate_dbt_docs_aws = DbtDocsS3KubernetesOperator(
    task_id="generate_dbt_docs_aws",
    project_dir=DBT_ROOT_PATH / "jaffle_shop",
    profile_config=profile_config,
    connection_id="aws_s3_conn",
    bucket_name="cosmos-ci-docs",
    install_deps=True,
    image="dbt-jaffle-shop:1.0.0",
    get_logs=True,
    is_delete_operator_pod=False,
)

Related Issue(s)

closes #1906

Breaking Change?

Checklist

  • I have made corresponding changes to the documentation (if required)
  • I have added tests that prove my fix is effective or that my feature works

@codecov
Copy link
Copy Markdown

codecov Bot commented Oct 27, 2025

Codecov Report

❌ Patch coverage is 94.31818% with 5 lines in your changes missing coverage. Please review.
✅ Project coverage is 98.04%. Comparing base (513f0ed) to head (2345af8).
⚠️ Report is 4 commits behind head on main.

Files with missing lines Patch % Lines
cosmos/operators/kubernetes.py 94.31% 5 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #2058      +/-   ##
==========================================
- Coverage   98.08%   98.04%   -0.05%     
==========================================
  Files         103      103              
  Lines        7484     7571      +87     
==========================================
+ Hits         7341     7423      +82     
- Misses        143      148       +5     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@tatiana tatiana added this to the Cosmos 1.12.0 milestone Oct 29, 2025
@jx2lee jx2lee marked this pull request as ready for review November 3, 2025 11:26
Copy link
Copy Markdown
Collaborator

@tatiana tatiana left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jx2lee, this looks great, thanks for your contribution!

Would it be possible to update the PR description to make it more informative and include an example DAG/code showing how you tested the feature? Something similar to:

    generate_dbt_docs_aws = DbtDocsS3KubernetesOperator(
        # Cosmos arguments:
        task_id="generate_dbt_docs_aws",
        project_dir=DBT_ROOT_PATH / "jaffle_shop",
        profile_config=profile_config,
        connection_id="aws_s3_conn",
        bucket_name="cosmos-ci-docs",
        install_deps=True,
        # Specific to the K8s Pod Operator:
        image= "dbt-jaffle-shop:1.0.0",
        get_logs=True,
        is_delete_operator_pod=False,
    )

Also, could you extend this example DAG:
https://github.com/astronomer/astronomer-cosmos/blob/main/dev/dags/jaffle_shop_kubernetes.py

So it also uses this newly introduced operator? You could use this example as an inspiration:
https://github.com/astronomer/astronomer-cosmos/blob/main/dev/dags/dbt_docs.py

This DAG is run in our CI, and it would help us validating the feature from an integration test perspective.

Comment thread cosmos/operators/kubernetes.py Outdated
@jx2lee jx2lee marked this pull request as draft November 5, 2025 13:37
@jx2lee jx2lee force-pushed the kube-upload-to-cloudstorage branch from 5c4c990 to 72670fc Compare November 8, 2025 10:09
@jx2lee jx2lee force-pushed the kube-upload-to-cloudstorage branch from e1a871f to 8867487 Compare November 8, 2025 10:56
@jx2lee
Copy link
Copy Markdown
Contributor Author

jx2lee commented Nov 8, 2025

@jx2lee, this looks great, thanks for your contribution!

Would it be possible to update the PR description to make it more informative and include an example DAG/code showing how you tested the feature? Something similar to:

    generate_dbt_docs_aws = DbtDocsS3KubernetesOperator(
        # Cosmos arguments:
        task_id="generate_dbt_docs_aws",
        project_dir=DBT_ROOT_PATH / "jaffle_shop",
        profile_config=profile_config,
        connection_id="aws_s3_conn",
        bucket_name="cosmos-ci-docs",
        install_deps=True,
        # Specific to the K8s Pod Operator:
        image= "dbt-jaffle-shop:1.0.0",
        get_logs=True,
        is_delete_operator_pod=False,
    )

Also, could you extend this example DAG: https://github.com/astronomer/astronomer-cosmos/blob/main/dev/dags/jaffle_shop_kubernetes.py

So it also uses this newly introduced operator? You could use this example as an inspiration: https://github.com/astronomer/astronomer-cosmos/blob/main/dev/dags/dbt_docs.py

This DAG is run in our CI, and it would help us validating the feature from an integration test perspective.

Thanks for pointing that out!
I added the DbtDocsS3KubernetesOperator example to jaffle_shop_kubernetes.py, since it already serves as the Kubernetes example DAG.
This should demonstrate the operator usage as intended. Anything else need to add example ? 👀

@jx2lee jx2lee marked this pull request as ready for review November 8, 2025 11:13
@jx2lee jx2lee force-pushed the kube-upload-to-cloudstorage branch from 8867487 to 5ca0e0c Compare November 10, 2025 12:41
@netlify
Copy link
Copy Markdown

netlify Bot commented Nov 10, 2025

Deploy Preview for sunny-pastelito-5ecb04 ready!

Name Link
🔨 Latest commit d2bb978
🔍 Latest deploy log https://app.netlify.com/projects/sunny-pastelito-5ecb04/deploys/691462b783c61b00095db0d2
😎 Deploy Preview https://deploy-preview-2058--sunny-pastelito-5ecb04.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

Comment thread dev/dags/jaffle_shop_kubernetes.py Outdated
Comment thread dev/dags/jaffle_shop_kubernetes.py Outdated
@tatiana
Copy link
Copy Markdown
Collaborator

tatiana commented Nov 10, 2025

@jx2lee thanks a lot for addressing the feedback!

Currently the code coverage check is failing, but I believe that if you accept the changes I just proposed, the task will be run as part of our CI/CD and the coverage issue should be solved.

The details for the code coverage are here:
https://app.codecov.io/gh/astronomer/astronomer-cosmos/pull/2058?dropdown=coverage&src=pr&el=continue&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=astronomer

Screenshot 2025-11-10 at 13 20 13

Happy for us to merge it after you apply the changes and the coverage check is green.

Comment thread dev/dags/jaffle_shop_kubernetes.py Outdated
@tatiana
Copy link
Copy Markdown
Collaborator

tatiana commented Nov 18, 2025

Hi @jx2lee, while trying to run the example DAG, I noticed it is not working. Have you tried to run this example DAG and seen the files being uploaded to the S3 bucket from the K8s pod?

I was curious why the CI was still complaining about code coverage. Only today I found some time to run your code locally to understand what was going wrong.

There are a few issues:

  1. There is an issue in the CI that it is saying the DAG is running successful when it is actually failing. Ideally, we'd investigate and create a separate PR to address this.

  2. Cosmos ExecutionMode.KUBERNETES does not work with Cosmos profile mappings. It relies on users using profiles.yml, which, in this case, retrieves sensitive database credentials from environment variables. The existing example of the operator was very misleading; we should also fix it. I just corrected to set the K8s secrets in your example, which would allow this task to run in K8s and access the Postgres database: https://github.com/astronomer/astronomer-cosmos/pull/2058/files#r2537014072

  3. After making these two changes, I noticed that the feature still doesn't work. If I delete all files in the S3 bucket, they are not being created after "successfully" running this DAG (the DAG runs with success state). The main issue is that the logic for uploading the files resides in the Airflow worker node, while the actual doc files are generated and stored in the Kubernetes Pod generated via the KubernetesPodOperator.

For (3), I can see two main approaches to solving it:

a) Defer to the K8s Pod operator to upload the files to S3. An example of how you could accomplish this is to invoke aws s3 cp from within the pod, after running the dbt command. Cosmos could integrate this as part of the newly introduced operator logic. A downside with this approach is that you wouldn't be leveraging the Airflow S3 connection "automatically", so you'd need to find a way to expose those credentials ot the K8S pod.

b) Copy the dbt docs files generated in the K8s Pod back to the Airflow worker, e.g., via XCom or by printing and parsing them, and then leverage the hook - like you're doing.

A final point to keep in mind is that Cosmos callbacks currently only work for ExecutionMode.LOCAL, ExecutionMode.VIRTUALENV, and ExecutionMode.WATCHER. Again, we need to improve Cosmos documentation to be upfront about this, in a dedicated PR.

We'd also be happy to accept contributions for other execution modes, including ExecutionMode.KUBERNETES to support callback-in dedicated PRs.

I'd highly encourage you to run locally following the steps described in:
https://astronomer.github.io/astronomer-cosmos/getting_started/kubernetes.html

For local development of the K8s features, I also find it particularly useful to run these commands:

hatch run tests.py3.12-3.0-1.10:test-integration-setup
hatch run tests.py3.12-3.0-1.10:test-kubernetes

By running these commands locally, we can use Python's debugger, via breakpoint(), which is very helpful.

Comment thread cosmos/operators/kubernetes.py Outdated
Comment thread cosmos/operators/kubernetes.py
@pankajastro
Copy link
Copy Markdown
Contributor

Looks good to me. I tested it, and it worked as expected.
Screenshot 2026-04-15 at 9 24 41 AM

@pankajastro
Copy link
Copy Markdown
Contributor

@jx2lee Could you please add documentation for this in a follow-up PR?

@pankajastro pankajastro force-pushed the kube-upload-to-cloudstorage branch from 2345af8 to 155de54 Compare April 15, 2026 04:03
@jx2lee
Copy link
Copy Markdown
Contributor Author

jx2lee commented Apr 15, 2026

@jx2lee Could you please add documentation for this in a follow-up PR?

@pankajastro Sure. I'll make PR soon. thanks !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Generating Docs using DbtDocsS3Operator

4 participants