Refactor dbt ls to run from a temporary directory#414
Merged
Conversation
👷 Deploy Preview for amazing-pothos-a3bca0 processing.
|
2353167 to
5821a9c
Compare
Codecov ReportPatch coverage:
Additional details and impacted files@@ Coverage Diff @@
## main #414 +/- ##
==========================================
+ Coverage 91.43% 91.51% +0.07%
==========================================
Files 50 50
Lines 1752 1768 +16
==========================================
+ Hits 1602 1618 +16
Misses 150 150
☔ View full report in Codecov by Sentry. |
jlaneve
reviewed
Jul 28, 2023
jlaneve
reviewed
Jul 28, 2023
jlaneve
reviewed
Jul 28, 2023
29171f9 to
e2ecb39
Compare
As of Cosmos 1.0.0, `LoadMode.DBT_LS` ran `dbt ls` from within the original dbt project directory. The `dbt ls` outputs files to the directory it's running from unless the environment variables `DBT_LOG_PATH` and `DBT_TARGET_PATH` are specified. Depending on the deployment, the Airflow worker does not have write permissions to the dbt project directory. This PR changes the behavior of `dbt ls` to make a copy of the original project directory into a temporary directory and run the command `dbt ls` from there. Closes: #411
…a separate dir Unfortunately this does not work in dbt 1.5 or previous versions
e2ecb39 to
4360fd1
Compare
Collaborator
Author
|
Thanks, @jlaneve , I think I addressed all the feedback! |
tatiana
commented
Aug 10, 2023
Collaborator
Author
|
A different approach we could adopt to solve this issue is not to pass Still, this change will not solve the limitation that ATM local operators run things locally. @jlaneve @harels is the scope of this ticket only |
harels
approved these changes
Aug 11, 2023
tatiana
added a commit
that referenced
this pull request
Aug 16, 2023
Feature (pending documentation!) * Support dbt global flags (via dbt_cmd_global_flags in `operator_args` by @tatiana in #469 Enhancements * Hide sensitive field when using BigQuery keyfile_dict profile mapping by @jbandoro in #471 Bug fixes * Fix bug on select node add exclude selector subset ids logic by @jensenity in #463 * Refactor dbt ls to run from a temporary directory, to avoid Read-only file system errors during DAG parsing, by @tatiana in #414 Others * Docs: Fix RenderConfig load argument by @jbandoro in #466 * Enable CI integration tests from external forks by @tatiana in #458 * Improve CI tests runtime by @tatiana in #457 * Change CI to run coverage after tests pass by @tatiana in #461 * Fix forks code revision in code coverage by @tatiana in #472 * [pre-commit.ci] pre-commit autoupdate by @pre-commit-ci in #467" i
Merged
tatiana
added a commit
that referenced
this pull request
Aug 29, 2023
Since Cosmos 1.0, `load_method.DBT_LS` is the default dbt project
parsing method, unless the user gives a manifest.
Using the original dbt project path has been a source of issues when
that path is Read-Only. This issue was faced when running commands that
generate `{project-dir}/target/` and `{project-dir}/logs/`, which was
solved as part of #414.
This issue is particularly problematic if we want to run `dbt deps` from
the original project directory since dbt 1.6 saves adaptors to
`{project_dir}/dbt_packages` unless specified in the user's
`dbt_project.yml`. To our knowledge, dbt currently does not allow users
to define this directory via flags or environment variables, as
discussed in #481.
This change aims to solve these issues, by creating a temporary
directory and creating symbolic links to the original directory.
Finally, during the development of this task, it was observed that when
running dbt ls in a project with `packages.yml`, dbt raises a
'Compilation Error'. Since dbt may raise other errors in stdout, this PR
captures "Errors" more generically - making it more evident potential
issues to the end-users.
Merged
tatiana
added a commit
that referenced
this pull request
Sep 6, 2023
**Features** * Support dbt global flags (via dbt_cmd_global_flags in operator_args) by @tatiana in #469 * Support parsing DAGs when there are no connections by @jlaneve in #489 **Enhancements** * Hide sensitive field when using BigQuery keyfile_dict profile mapping by @jbandoro in #471 * Consistent Airflow Dataset URIs, inlets and outlets with `Openlineage package <https://pypi.org/project/openlineage-integration-common/>`_ by @tatiana in #485. `Read more <https://astronomer.github.io/astronomer-cosmos/configuration/lineage.html>`_. * Refactor ``LoadMethod.DBT_LS`` to run from a temporary directory with symbolic links by @tatiana in #488 * Run ``dbt deps`` when using ``LoadMethod.DBT_LS`` by @DanMawdsleyBA in #481 * Update Cosmos log color to purple by @harels in #494 * Change operators to log ``dbt`` commands output as opposed to recording to XCom by @tatiana in #513 **Bug fixes** * Fix bug on select node add exclude selector subset ids logic by @jensenity in #463 * Refactor dbt ls to run from a temporary directory, to avoid Read-only file system errors during DAG parsing, by @tatiana in #414 * Fix profile_config arg in DbtKubernetesBaseOperator by @david-mag in #505 * Fix SnowflakePrivateKeyPemProfileMapping private_key reference by @nacpacheco in #501 * Fix incorrect temporary directory creation in VirtualenvOperator init by @tatiana in #500 * Fix log propagation issue by @tatiana in #498 * Fix PostgresUserPasswordProfileMapping to retrieve port from connection by @jlneve in #511 **Others** * Docs: Fix RenderConfig load argument by @jbandoro in #466 * Enable CI integration tests from external forks by @tatiana in #458 * Improve CI tests runtime by @tatiana in #457 * Change CI to run coverage after tests pass by @tatiana in #461 * Fix forks code revision in code coverage by @tatiana in #472 * [pre-commit.ci] pre-commit autoupdate by @pre-commit-ci in #467 * Drop support to Python 3.7 in the CI test matrix by @harels in #490 * Add Airflow 2.7 to the CI test matrix by @tatiana in #487 * Add MyPy type checks to CI since we exceeded pre-commit disk quota usage by @tatiana in #510
This was referenced Apr 16, 2025
tatiana
added a commit
that referenced
this pull request
Apr 16, 2025
…` during DAG parsing (#1668) Support running `dbt deps` incrementally to pre-calculated `dbt_packages` during DAG parsing. This was a use case requested by an Astro customer. Before this change, Cosmos supported two types of configuration: * If users choose `RenderConfig. dbt_deps=False` or `ProjectConfig.install_dbt_deps=False`, Cosmos would create a symbolic link for the user's pre-defined `dbt_packages` (background: #488, #600, #730) * If users choose `RenderConfig. dbt_deps=True` or `ProjectConfig.install_dbt_deps=True` (default), Cosmos would ignore any user-predefined ' dbt_packages` and do a run `dbt deps` from scratch from a temporary folder. An Astronomer customer requested to reuse the defined initially `dbt_packages` directory and run `dbt deps` (incrementally). We do not run dbt commands directly in the original dbt project folder with Cosmos because some users use read-only filesystems (#414). We also decided to use symbolic links instead of copying the directory due to performance issues (#488). Since we did not want to introduce a breaking change in a minor Cosmos release by changing the existing Cosmos 1.x behaviour to meet this new use case, this PR supports: * Copying the dbt deps related files (dbt packages folder and symbolic link) to the Cosmos temporary folder; and * Running `dbt deps`. So this is not a breaking change, users must opt into this behaviour by either: - Changing individual `DbtDag` or `DbtTaskGroup` instances, using `ProjectConfig.copy_dbt_packages=True` (new configuration) and `RenderConfig. dbt_deps=True` or `ProjectConfig.install_dbt_deps=True`; or - Changing the behaviour globally, by setting the Airflow configuration either via the environment variable `AIRFLOW__COSMOS__DEFAULT_COPY_DBT_PACKAGES_VALUE=True`. or via the `airflow.cfg`: ``` [cosmos] default_copy_dbt_packages_value=True ``` The following two missing parts are being added as part of a separate PR: - Mimic this behaviour during task execution; - Update documentation to be representative of both changes. Depends on #1669 Related to: #1630
tatiana
added a commit
that referenced
this pull request
Apr 16, 2025
…` during task execution (#1670) Support running `dbt deps` incrementally to pre-calculated `dbt_packages` during the task execution. This was a use case requested by an Astro customer. Before this change, Cosmos supported two types of configuration: * If users choose `operator_args={"install_deps": False}` or `ProjectConfig.install_dbt_deps=False`, Cosmos would create a symbolic link for the user's pre-defined `dbt_packages` (background: #488, #600, #730) * If users choose `operator_args={"install_deps": True}` or `ProjectConfig.install_dbt_deps=True` (default), Cosmos would ignore any user-predefined ' dbt_packages` and do a run `dbt deps` from scratch from a temporary folder. An Astronomer customer requested to reuse the defined initially `dbt_packages` directory and run `dbt deps` (incrementally). # Implementation We do not run dbt commands directly in the original dbt project folder with Cosmos because some users use read-only filesystems (#414). We also decided to use symbolic links instead of copying the directory due to performance issues (#488). Since we did not want to introduce a breaking change in a minor Cosmos release by changing the existing Cosmos 1.x behaviour to meet this new use case, this PR supports: * Copying the dbt deps related files (dbt packages folder and symbolic link) to the Cosmos temporary folder; and * Running `dbt deps`. So this is not a breaking change, users must opt into this behaviour by using `ProjectConfig.copy_dbt_packages=True` (new configuration) and `operator_args={"install_dbt_deps": True}` or `ProjectConfig. install_dbt_deps =True` and one of the following: - Changing the operator to receive the argument `copy_dbt_packages=True` - Changing individual `DbtDag` or `DbtTaskGroup` instances to use the new configuration `ProjectConfig.copy_dbt_packages=True` - Changing the behaviour globally, by setting the Airflow configuration either via the environment variable `AIRFLOW__COSMOS__DEFAULT_COPY_DBT_PACKAGES_VALUE=True` or via the `airflow.cfg`: ``` [cosmos] default_copy_dbt_packages_value=True ``` # How this was tested To validate the end-to-end behaviour, we run the following dag from `dev/dags`: ``` airflow dags test dbt_deps_example ``` # Related tickets This is a follow-up to #1668 and #1669. I'll make a follow-up PR covering the documentation. Closes: #1630
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
As of Cosmos 1.0.0,
LoadMode.DBT_LSrunsdbt lsfrom within the original dbt project directory.The
dbt lsoutputs files to the directory it's running from unless the environment variablesDBT_LOG_PATHandDBT_TARGET_PATHare specified (as of dbt 1.6).Depending on the deployment, the Airflow worker does not have write permissions to the dbt project directory. This can lead to an error message similar to the following:
This PR changes the behavior of
dbt lsto try to make thedbt lsartifacts (logs and target directory) not be written to the original project directory.In addition to the introduced test, this change was validated using airflow 2.6 and dbt 1.6, by following these steps:
(1) Delete folders
logsandtargetfromastronomer-cosmos/dev/dags/dbt/jaffle_shop(2) Add a breakpoint after
stdout, stderr = process.communicate()indbt/graph.py(3) Run a DAG that uses
astronomer-cosmos/dev/dags/dbt/jaffle_shop, e.g.:(4) When the breakpoint happens, check that no
targetorlogsfolder was created after runningdbt lsinastronomer-cosmos/dev/dags/dbt/jaffle_shopA limitation with the current approach is that, although
dbt lsis not creating these directories in the given circumstances, if the user is using the local executor or an earlier version ofdbt, the files will still be written to the project directory.Closes: #411