Skip to content

Run dbt deps before running the dbt ls command#481

Merged
tatiana merged 48 commits into
astronomer:mainfrom
DanMawdsleyBA:run-dbt-deps-before-dbt-ls
Aug 31, 2023
Merged

Run dbt deps before running the dbt ls command#481
tatiana merged 48 commits into
astronomer:mainfrom
DanMawdsleyBA:run-dbt-deps-before-dbt-ls

Conversation

@DanMawdsleyBA
Copy link
Copy Markdown
Contributor

@DanMawdsleyBA DanMawdsleyBA commented Aug 21, 2023

Description

Runs the dbt deps command before running the dbt ls command

Related Issue(s)

Closes #445

Breaking Change?

Checklist

  • I have made corresponding changes to the documentation (if required)
  • I have added tests that prove my fix is effective or that my feature works

@DanMawdsleyBA DanMawdsleyBA requested a review from a team as a code owner August 21, 2023 20:08
@DanMawdsleyBA DanMawdsleyBA requested a review from a team August 21, 2023 20:08
@netlify
Copy link
Copy Markdown

netlify Bot commented Aug 21, 2023

👷 Deploy Preview for amazing-pothos-a3bca0 processing.

Name Link
🔨 Latest commit 3103c30
🔍 Latest deploy log https://app.netlify.com/sites/amazing-pothos-a3bca0/deploys/64f07a9f6fe79a00081d0261

@pre-commit-ci pre-commit-ci Bot temporarily deployed to internal August 21, 2023 20:09 Inactive
@pre-commit-ci pre-commit-ci Bot temporarily deployed to internal August 21, 2023 20:21 Inactive
@DanMawdsleyBA DanMawdsleyBA temporarily deployed to internal August 22, 2023 06:36 — with GitHub Actions Inactive
Copy link
Copy Markdown
Collaborator

@tatiana tatiana left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great. Thanks a lot for your first contribution, @DanMawdsleyBA ! I'm hopeful this could be part of the upcoming 1.1 release.

I can see two areas for improvement:

  1. A test that represents the original issue this PR solves, that breaks before the change and passes after this modification. Ideally, it would highlight what happens when a profile depends on a specific dbt adapter that is not installed and which error would be raised with dbt ls.
  2. Do you think all users would be happy for, during DAG parsing time, to have dbt deps installed? What would happen if they had a read-only filesystem? A possibility to address this could be to have a new parameter added to RenderConfig, something like install_deps, set to True by default, but users could opt out (@jlaneve may have other ideas/suggestions)

Please do reach out if you'd like support writing the test.

@DanMawdsleyBA
Copy link
Copy Markdown
Contributor Author

DanMawdsleyBA commented Aug 22, 2023

This looks great. Thanks a lot for your first contribution, @DanMawdsleyBA ! I'm hopeful this could be part of the upcoming 1.1 release.

I can see two areas for improvement:

  1. A test that represents the original issue this PR solves, that breaks before the change and passes after this modification. Ideally, it would highlight what happens when a profile depends on a specific dbt adapter that is not installed and which error would be raised with dbt ls.
  2. Do you think all users would be happy for, during DAG parsing time, to have dbt deps installed? What would happen if they had a read-only filesystem? A possibility to address this could be to have a new parameter added to RenderConfig, something like install_deps, set to True by default, but users could opt out (@jlaneve may have other ideas/suggestions)

Please do reach out if you'd like support writing the test.

@tatiana
I've added a dbt_deps into RenderConfig with a default value as True but can be set to False to not run dbt deps this only applies when loading via dbt ls . I've added a test to catch the failure by having a package file with dbt_utils and manually setting the dbt deps to False. I also improved the errors to catch this specific case. Let me know if you have any more thoughts. If you're happy if you could add it to a pre-release version I can test it on an actual airflow instance

@DanMawdsleyBA DanMawdsleyBA requested a review from tatiana August 22, 2023 14:20
@pre-commit-ci pre-commit-ci Bot temporarily deployed to internal August 22, 2023 14:31 Inactive
@harels harels added this to the 1.1.0 milestone Aug 22, 2023
@pre-commit-ci pre-commit-ci Bot temporarily deployed to internal August 22, 2023 15:11 Inactive
@DanMawdsleyBA
Copy link
Copy Markdown
Contributor Author

One thing I have realised is that this fix won't work on systems that have the read only issue as by default the packages are stored in dbt project path/ packages . There is a variable that can be set in the dbt_project.yml packages_install_path which may help however I don't think there is a way to specify that via a flag or env_var (Although happy to be wrong on that). It might be possible to do some kind of project.yml update to that variable and override the current but feels a janky solution. Not sure if @harels or @tatiana do you have any better ideas?
I'm starting to feel it might be easier to just copy the whole dbt project into a new directory for read only systems and have everything point to that new directory.

@tatiana
Copy link
Copy Markdown
Collaborator

tatiana commented Aug 23, 2023

One thing I have realised is that this fix won't work on systems that have the read only issue as by default the packages are stored in dbt project path/ packages . There is a variable that can be set in the dbt_project.yml packages_install_path which may help however I don't think there is a way to specify that via a flag or env_var (Although happy to be wrong on that). It might be possible to do some kind of project.yml update to that variable and override the current but feels a janky solution. Not sure if @harels or @tatiana do you have any better ideas? I'm starting to feel it might be easier to just copy the whole dbt project into a new directory for read only systems and have everything point to that new directory.

Hi @DanMawdsleyBA , that's a very important point.
I refactored a bit the logic in the dbt ls and hopefully it will be able to address the issue of running dbt deps on project directories that were originally read only: #488

Comment thread cosmos/dbt/graph.py Outdated
Comment thread dev/dags/dbt/jaffle_shop/packages.yml
Comment thread tests/dbt/test_graph.py Outdated
@tatiana tatiana force-pushed the run-dbt-deps-before-dbt-ls branch from f484d49 to f2c06d7 Compare August 31, 2023 09:32
@tatiana tatiana temporarily deployed to external August 31, 2023 09:33 — with GitHub Actions Inactive
@tatiana tatiana temporarily deployed to external August 31, 2023 09:48 — with GitHub Actions Inactive
@tatiana tatiana temporarily deployed to external August 31, 2023 10:34 — with GitHub Actions Inactive
@tatiana tatiana changed the title Added deps before ls command Run dbt deps before running the dbt ls command Aug 31, 2023
@codecov
Copy link
Copy Markdown

codecov Bot commented Aug 31, 2023

Codecov Report

Patch coverage: 92.85% and project coverage change: -0.08% ⚠️

Comparison is base (8c9e768) 91.59% compared to head (3103c30) 91.51%.

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #481      +/-   ##
==========================================
- Coverage   91.59%   91.51%   -0.08%     
==========================================
  Files          50       50              
  Lines        1856     1874      +18     
==========================================
+ Hits         1700     1715      +15     
- Misses        156      159       +3     
Files Changed Coverage Δ
cosmos/dbt/graph.py 98.67% <92.30%> (-1.33%) ⬇️
cosmos/config.py 92.85% <100.00%> (+0.08%) ⬆️
cosmos/converter.py 95.77% <100.00%> (-1.37%) ⬇️

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@tatiana tatiana temporarily deployed to external August 31, 2023 11:34 — with GitHub Actions Inactive
@tatiana
Copy link
Copy Markdown
Collaborator

tatiana commented Aug 31, 2023

@DanMawdsleyBA We managed to get all the tests to pass 🎉 Let's see after the rebase. Once this is passing, we'll merge this. It will be released as part of 1.1.

Copy link
Copy Markdown
Collaborator

@tatiana tatiana left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Happy for these changes to be merged, I could test it locally.

It will help many Cosmos users with DAGs with no nodes due to dependencies not being installed. We'll improve the coverage in a follow up PR.

@tatiana tatiana merged commit 13087d5 into astronomer:main Aug 31, 2023
@tatiana tatiana mentioned this pull request Sep 6, 2023
tatiana added a commit that referenced this pull request Sep 6, 2023
**Features**

* Support dbt global flags (via dbt_cmd_global_flags in operator_args)
by @tatiana in #469
* Support parsing DAGs when there are no connections by @jlaneve in #489

**Enhancements**

* Hide sensitive field when using BigQuery keyfile_dict profile mapping
by @jbandoro in #471
* Consistent Airflow Dataset URIs, inlets and outlets with `Openlineage
package <https://pypi.org/project/openlineage-integration-common/>`_ by
@tatiana in #485. `Read more
<https://astronomer.github.io/astronomer-cosmos/configuration/lineage.html>`_.
* Refactor ``LoadMethod.DBT_LS`` to run from a temporary directory with
symbolic links by @tatiana in #488
* Run ``dbt deps`` when using ``LoadMethod.DBT_LS`` by @DanMawdsleyBA in
#481
* Update Cosmos log color to purple by @harels in #494
* Change operators to log ``dbt`` commands output as opposed to
recording to XCom by @tatiana in #513

**Bug fixes**

* Fix bug on select node add exclude selector subset ids logic by
@jensenity in #463
* Refactor dbt ls to run from a temporary directory, to avoid Read-only
file system errors during DAG parsing, by @tatiana in #414
* Fix profile_config arg in DbtKubernetesBaseOperator by @david-mag in
#505
* Fix SnowflakePrivateKeyPemProfileMapping private_key reference by
@nacpacheco in #501
* Fix incorrect temporary directory creation in VirtualenvOperator init
by @tatiana in #500
* Fix log propagation issue by @tatiana in #498
* Fix PostgresUserPasswordProfileMapping to retrieve port from
connection by @jlneve in #511

**Others**

* Docs: Fix RenderConfig load argument by @jbandoro in #466
* Enable CI integration tests from external forks by @tatiana in #458
* Improve CI tests runtime by @tatiana in #457
* Change CI to run coverage after tests pass by @tatiana in #461
* Fix forks code revision in code coverage by @tatiana in #472
* [pre-commit.ci] pre-commit autoupdate by @pre-commit-ci in #467
* Drop support to Python 3.7 in the CI test matrix by @harels in #490
* Add Airflow 2.7 to the CI test matrix by @tatiana in #487
* Add MyPy type checks to CI since we exceeded pre-commit disk quota
usage by @tatiana in #510
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Run dbt deps when using load_mode=LoadMode.DBT_LS

4 participants