Skip to content

Fix DbtVirtualenvBaseOperator to use correct virtualenv Python path#1252

Merged
tatiana merged 3 commits into
astronomer:mainfrom
kesompochy:fix-virtualenv-operator-bug
Oct 29, 2024
Merged

Fix DbtVirtualenvBaseOperator to use correct virtualenv Python path#1252
tatiana merged 3 commits into
astronomer:mainfrom
kesompochy:fix-virtualenv-operator-bug

Conversation

@kesompochy
Copy link
Copy Markdown
Contributor

Description

This PR addresses an issue where the DbtVirtualenvBaseOperator was executing dbt commands using the system-wide Python path instead of the virtualenv path. The root cause was that the self reference in the run_subprocess method was bound to a different instance than the one created during initialization, likely due to Airflow's DAG pickling mechanism.
To resolve this, we've refactored the invoke_dbt and handle_exception methods to be properties. This ensures that they dynamically reference the correct method of the current instance at runtime, rather than being bound to a potentially stale instance from initialization.

Related Issue(s)

fix #1246

This may be related to issue #958 in version 1.5.0

Breaking Change?

No

Checklist

  • I have made corresponding changes to the documentation (if required)
  • I have added tests that prove my fix is effective or that my feature works

Additional Notes

I acknowledge that ideally, a test should be added to reproduce the original issue and verify the fix. However, I found it challenging to create an appropriate test, especially considering that this might require an integration test with Airflow to properly simulate Airflow behavior. If there are suggestions for how to effectively test this scenario, I would greatly appreciate the guidance.

I sincerely apologize for introducing this bug in the first place with PR #1200. I understand this has caused inconvenience, and I'm grateful for the opportunity to fix it. I kindly request a thorough review of these changes to ensure we've fully addressed the issue without introducing new problems.

- Log Python binary info before command execution when using virtualenv
- Reorder log statements for better readability and debugging
DAG pickling caused __init__-assigned methods to reference original
instance state instead of newly created instances. Switch to properties
for dynamic method assignment to ensure correct instance reference.
Update tests to cover new behavior.
@dosubot dosubot Bot added the size:M This PR changes 30-99 lines, ignoring generated files. label Oct 12, 2024
@dosubot dosubot Bot added area:execution Related to the execution environment/mode, like Docker, Kubernetes, Local, VirtualEnv, etc area:testing Related to testing, like unit tests, integration tests, etc labels Oct 12, 2024
@netlify
Copy link
Copy Markdown

netlify Bot commented Oct 12, 2024

Deploy Preview for sunny-pastelito-5ecb04 canceled.

Name Link
🔨 Latest commit 25f75ee
🔍 Latest deploy log https://app.netlify.com/sites/sunny-pastelito-5ecb04/deploys/671238e96dbc1e0008bb23ae

@codecov
Copy link
Copy Markdown

codecov Bot commented Oct 24, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 95.73%. Comparing base (ae4ec78) to head (25f75ee).
Report is 9 commits behind head on main.

Additional details and impacted files
@@           Coverage Diff           @@
##             main    #1252   +/-   ##
=======================================
  Coverage   95.73%   95.73%           
=======================================
  Files          67       67           
  Lines        3965     3967    +2     
=======================================
+ Hits         3796     3798    +2     
  Misses        169      169           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Copy Markdown
Collaborator

@tatiana tatiana left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

HI @kesompochy thank you very much for working on this.

When we write code, there is always the risk we'll be introducing bugs, and in this particular case, the reviewer is as accountable as the author. So, please, don't blame yourself.

As you mentioned, I really believe we need to write an integration test to try to reproduce the original issue - or a scenario closer to it, that fails before, and passes with this fix.

I'll take sometime to reproduce the issue and try to give ideas on how we could accomplish this. We'll be releasing this in a patch release of Cosmos, if not 1.7.1, then 1.7.2.

Copy link
Copy Markdown
Collaborator

@tatiana tatiana left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kesompochy I've added a test more representative of the change in #1286. I'll merge your PR and the test I created to validate this change.

@dosubot dosubot Bot added the lgtm This PR has been approved by a maintainer label Oct 29, 2024
@tatiana tatiana merged commit 191c635 into astronomer:main Oct 29, 2024
tatiana added a commit that referenced this pull request Oct 29, 2024
Bug fixes

* Fix ``DbtVirtualenvBaseOperator`` to use correct virtualenv Python path by kesompochy in #1252
* Fix displaying dbt docs as menu item in Astro by @tatiana in #1280
* Fix: Replace login by user for clickhouse profile by @petershenri in #1255

Enhancements

* Improve dbt Docs Hosting Debugging -- Update dbt_docs_not_set_up.html by @johnmcochran in #1250
* Minor refactor on VirtualenvOperators & add test for PR by @tatiana in #1253

Docs

* Add Welcome Section and "What Is Cosmos" Blurb to Home Page by @cmarteepants and @yanmastin-astro in #1251
* Update the URL for sample dbt docs hosted in Astronomer S3 bucket by @pankajkoti in #1283
* Add dedicated scarf tracking pixel to readme by @cmarteepants in #1256

Others

* Update ``CODEOWNERS`` to track all files by @pankajkoti in #1284
* Fix release after the ``raw`` rst directive disabled was disabled in PyPI by @tatiana in #1282
* Update issue template ``bug.yml`` - cosmos version update in the dropdown by @pankajkoti in #1275
* Pre-commit hook updates in #1285, #1274, #1254, #1244"
error: pathspec 'Is' did not match any file(s) known to git
error: pathspec 'Cosmos Blurb to Home Page by @cmarteepants and @yanmastin-astro in #1251
* Update the URL for sample dbt docs hosted in Astronomer S3 bucket by @pankajkoti in #1283
* Add dedicated scarf tracking pixel to readme by @cmarteepants in #1256

Others

* Update CODEOWNERS to track all files by @pankajkoti in #1284
* Fix release after the raw rst directive disabled was disabled in PyPI by @tatiana in #1282
* Update issue template bug.yml - cosmos version update in the dropdown by @pankajkoti in #1275
* Pre-commit hook updates in #1285, #1274, #1254, #1244
@tatiana tatiana mentioned this pull request Oct 29, 2024
tatiana added a commit that referenced this pull request Oct 29, 2024
)

Cosmos virtualenv operators are using the system dbt instead of the
virtualenv dbt.

Create a test case that illustrates issue #1246. This test fails with
Cosmos 1.7 (and the current main branch) and passes when using PR #1252.

This PR also introduces two refactors:
- Reuse the parent class method where applicable, as opposed to
re-writing it completely
- Force the Virtualenv invocation mode to be `SUBPROCESS ` since
Airflow/Cosmos are not able to import dbt as a library if it is not part
of the same Python virtualenv
tatiana added a commit that referenced this pull request Oct 29, 2024
Bug fixes

* Fix ``DbtVirtualenvBaseOperator`` to use correct virtualenv Python path by kesompochy in #1252
* Fix displaying dbt docs as menu item in Astro by @tatiana in #1280
* Fix: Replace login by user for clickhouse profile by @petershenri in #1255

Enhancements

* Improve dbt Docs Hosting Debugging -- Update dbt_docs_not_set_up.html by @johnmcochran in #1250
* Minor refactor on VirtualenvOperators & add test for PR by @tatiana in #1253

Docs

* Add Welcome Section and "What Is Cosmos" Blurb to Home Page by @cmarteepants and @yanmastin-astro in #1251
* Update the URL for sample dbt docs hosted in Astronomer S3 bucket by @pankajkoti in #1283
* Add dedicated scarf tracking pixel to readme by @cmarteepants in #1256

Others

* Update ``CODEOWNERS`` to track all files by @pankajkoti in #1284
* Fix release after the ``raw`` rst directive disabled was disabled in PyPI by @tatiana in #1282
* Update issue template ``bug.yml`` - cosmos version update in the dropdown by @pankajkoti in #1275
* Pre-commit hook updates in #1285, #1274, #1254, #1244"
error: pathspec 'Is' did not match any file(s) known to git
error: pathspec 'Cosmos Blurb to Home Page by @cmarteepants and @yanmastin-astro in #1251
* Update the URL for sample dbt docs hosted in Astronomer S3 bucket by @pankajkoti in #1283
* Add dedicated scarf tracking pixel to readme by @cmarteepants in #1256

Others

* Update CODEOWNERS to track all files by @pankajkoti in #1284
* Fix release after the raw rst directive disabled was disabled in PyPI by @tatiana in #1282
* Update issue template bug.yml - cosmos version update in the dropdown by @pankajkoti in #1275
* Pre-commit hook updates in #1285, #1274, #1254, #1244
tatiana added a commit that referenced this pull request Oct 29, 2024
Bug fixes

* Fix ``DbtVirtualenvBaseOperator`` to use correct virtualenv Python
path by @kesompochy in #1252
* Fix displaying dbt docs as menu item in Astro by @tatiana in #1280
* Fix: Replace login by user for clickhouse profile by @petershenri in
#1255

Enhancements

* Improve dbt Docs Hosting Debugging -- Update dbt_docs_not_set_up.html
by @johnmcochran in #1250
* Minor refactor on VirtualenvOperators & add test for PR by @tatiana in
#1253

Docs

* Add Welcome Section and "What Is Cosmos" Blurb to Home Page by
@cmarteepants and @yanmastin-astro in #1251
* Update the URL for sample dbt docs hosted in Astronomer S3 bucket by
@pankajkoti in #1283
* Add a dedicated scarf tracking pixel to readme by @cmarteepants in #1256

Others

* Update ``CODEOWNERS`` to track all files by @pankajkoti in #1284
* Fix release after the ``raw`` rst directive disabled was disabled in
PyPI by @tatiana in #1282
* Update issue template ``bug.yml`` - cosmos version update in the
dropdown by @pankajkoti in #1275
* Pre-commit hook updates in #1285, #1274, #1254, #1244
@kesompochy
Copy link
Copy Markdown
Contributor Author

@tatiana, Thank you for reviewing and for adding such a comprehensive test case. Using dag.test() in the integration test (#1286) is an excellent way to reproduce the pickling issue in a controlled environment.
Thanks to your help and the community's support, I'm glad the fix for the bug I introduced was properly verified. Really appreciate your thorough review!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area:execution Related to the execution environment/mode, like Docker, Kubernetes, Local, VirtualEnv, etc area:testing Related to testing, like unit tests, integration tests, etc lgtm This PR has been approved by a maintainer size:M This PR changes 30-99 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug] DbtVirtualenvBaseOperator uses system-wide dbt instead of virtualenv-specific in v1.7.0

2 participants