Skip to content

Avoid reading the connection during DAG parsing of the async BigQuery operator#1582

Merged
tatiana merged 5 commits into
astronomer:mainfrom
joppevos:stop_async_from_connecting_when_parsing
Mar 4, 2025
Merged

Avoid reading the connection during DAG parsing of the async BigQuery operator#1582
tatiana merged 5 commits into
astronomer:mainfrom
joppevos:stop_async_from_connecting_when_parsing

Conversation

@joppevos
Copy link
Copy Markdown
Contributor

@joppevos joppevos commented Mar 3, 2025

Description

Removes unused Bigquery async arguments that trigger a validation of the profile at parsing time, which we want to avoid.

Related Issue(s)

closes #1581

Breaking Change?

Checklist

  • I have made corresponding changes to the documentation (if required)
  • I have added tests that prove my fix is effective or that my feature works

@dosubot dosubot Bot added the size:XS This PR changes 0-9 lines, ignoring generated files. label Mar 3, 2025
@netlify
Copy link
Copy Markdown

netlify Bot commented Mar 3, 2025

Deploy Preview for sunny-pastelito-5ecb04 canceled.

Name Link
🔨 Latest commit bb69886
🔍 Latest deploy log https://app.netlify.com/sites/sunny-pastelito-5ecb04/deploys/67c6c125a29164000803a08b

@dosubot dosubot Bot added area:parsing Related to parsing DAG/DBT improvement, issues, or fixes profile:bigquery Related to BigQuery ProfileConfig labels Mar 3, 2025
@joppevos joppevos changed the title remove unused instance arguments from async Bigquery operator remove instance arguments from async Bigquery operator Mar 3, 2025
@joppevos
Copy link
Copy Markdown
Contributor Author

joppevos commented Mar 3, 2025

Hey @pankajkoti , I found the time to try out the new async operator, but encountered this issue. It seems these arguments aren't necessary, but please correct me if I'm wrong. Thanks!

@codecov
Copy link
Copy Markdown

codecov Bot commented Mar 3, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 97.35%. Comparing base (ddea39c) to head (bb69886).
Report is 5 commits behind head on main.

Additional details and impacted files
@@           Coverage Diff           @@
##             main    #1582   +/-   ##
=======================================
  Coverage   97.35%   97.35%           
=======================================
  Files          80       80           
  Lines        4920     4922    +2     
=======================================
+ Hits         4790     4792    +2     
  Misses        130      130           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Copy Markdown
Contributor

@pankajkoti pankajkoti left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me. I believe this breaks nothing, but it’s just that with this we would no longer see the project and dataset values in the rendered template fields in the UI.

Would like to hear if we feel those are crucial @tatiana @pankajastro ?

Copy link
Copy Markdown
Collaborator

@tatiana tatiana left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, @joppevos!

Thanks a lot for trying out this feature and fixing this. After making the changes you shared in this PR, did ExecutionMode.AIRFLOW_ASYNC work as expected?

Is there any chance we could add some tests so that we don't have regressions in the future?

Copy link
Copy Markdown
Contributor

@pankajkoti pankajkoti left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@joppevos instead of removing those template fields, would it be possible to store those values during the execution of the task, similar to how we store the compile_sql at

Probably we could rename the store_compile_sql method to store_template_fields that takes care of these fields as well

@joppevos
Copy link
Copy Markdown
Contributor Author

joppevos commented Mar 3, 2025

@tatiana The ASYNC works as expected. I'll have a look at the tests 👍

@joppevos
Copy link
Copy Markdown
Contributor Author

joppevos commented Mar 3, 2025

@joppevos instead of removing those template fields, would it be possible to store those values during the execution of the task, similar to how we store the compile_sql at

Probably we could rename the store_compile_sql method to store_template_fields that takes care of these fields as well

That seems like a reasonable approach. Thank you for the suggestion!

@tatiana tatiana changed the title remove instance arguments from async Bigquery operator Avoid reading the connection during DAG parsing of the async BigQuery operator Mar 3, 2025
@dosubot dosubot Bot added size:S This PR changes 10-29 lines, ignoring generated files. and removed size:XS This PR changes 0-9 lines, ignoring generated files. labels Mar 3, 2025
Copy link
Copy Markdown
Contributor

@pankajkoti pankajkoti left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. One test is failing, I guess we need to update the method name in the patch for test_execute_complete test. Could you please check it? Thanks for the quick addressing of the suggestion.

@joppevos joppevos force-pushed the stop_async_from_connecting_when_parsing branch from f427f7a to a307a37 Compare March 3, 2025 13:55
Comment thread tests/operators/_asynchronous/test_bigquery.py
@tatiana tatiana merged commit 46912b6 into astronomer:main Mar 4, 2025
@tatiana tatiana added this to the Cosmos 1.9.1 milestone Mar 13, 2025
pankajkoti pushed a commit that referenced this pull request Mar 13, 2025
… operator (#1582)

Removes unused Bigquery async arguments that trigger a validation of the
profile at parsing time, which we want to avoid.

Closes #1581

(cherry picked from commit 46912b6)
@pankajkoti pankajkoti mentioned this pull request Mar 13, 2025
@tatiana tatiana mentioned this pull request Mar 13, 2025
tatiana added a commit that referenced this pull request Mar 17, 2025
Bug Fixes

* Fix import error in dbt bigquery adapter mock for ``dbt-bigquery<1.8``
for ``ExecutionMode.AIRFLOW_ASYNC`` by @pankajkoti in #1548
* Fix ``operator_args`` override configuration by @ghjklw in #1558
* Fix missing ``install_dbt_deps`` in ``ProjectConfig`` ``__init__``
method by @ghjklw in #1556
* Fix dbt project parsing ``dbt_vars`` behavior passed via
``operator_args`` by @AlexandrKhabarov in #1543
* Avoid reading the connection during DAG parsing of the async BigQuery
operator by @joppevos in #1582
* Fix: Workaround to incorrectly raised ``gcsfs.retry.HttpError``
(Invalid Credentials, 401) by @tatiana in #1598
* Fix the async execution mode read sql files for dbt packages by
@pankajastro in #1588
* Improve BQ async error handling by @tatiana in #1597
* Fix path selector when ``manifest.json`` is created using MS Windows
by @tatiana in #1601
* Fix log that prints 'Total filtered nodes' by @tatiana in #1603
* Fix select behaviour using ``LoadMode.MANIFEST`` and a path with star
by @tatiana in #1602
* Support ``on_warning_callback`` with ``TestBehavior.BUILD`` and
``ExecutionMode.LOCAL`` by @corsettigyg in #1571
* Fix ``DbtRunLocalOperator.partial()`` support by @tatiana @ashb in
#1609
* fix: ``container_name`` is null for ecs integration by @nicor88 in
#1592

Docs

* Improve MWAA getting-started docs by removing unused imports by
@jx2lee in #1562

Others

* Disable ``example_cosmos_dbt_build.py`` DAG in CI by @pankajastro in
#1567
* Upgrade GitHub Actions Ubuntu version by @tatiana in #1561
* Update GitHub bug issue template by @pankajastro in #1586
* Enable DAG ``example_cosmos_dbt_build.py`` in CI by @pankajastro in
#1573
* Run async DAG in DAG without setup/teardown task by @pankajastro in
#1599
* Add test case that fully covers recent select issue by @tatiana in
#1604
* Add CI job to test multiple dbt versions for the async DAG by
@pankajkoti in #1535
* Improve unit tests speed from 89s to 14s by @tatiana in #1600
* Pre-commit updates: #1560, #1583, #1596


Closes: #1550

Mergeable version of
#1607

Co-authored-by: Pankaj Singh
<98807258+pankajastro@users.noreply.github.com>
Co-authored-by: Pankaj Koti <pankajkoti699@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area:parsing Related to parsing DAG/DBT improvement, issues, or fixes profile:bigquery Related to BigQuery ProfileConfig size:S This PR changes 10-29 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug] ASYNC execution mode tries to get connection during parsing

3 participants