Skip to content

Release 1.9.1#1607

Closed
pankajkoti wants to merge 38 commits into
mainfrom
release-1.9.1
Closed

Release 1.9.1#1607
pankajkoti wants to merge 38 commits into
mainfrom
release-1.9.1

Conversation

@pankajkoti
Copy link
Copy Markdown
Contributor

@pankajkoti pankajkoti commented Mar 13, 2025

Bug Fixes

Docs

Others

Closes: #1550

pankajkoti and others added 28 commits February 19, 2025 16:16
Co-authored-by: Pankaj Singh <98807258+pankajastro@users.noreply.github.com>
Breaking changes

* When using ``LoadMode.DBT_LS``, Cosmos will now attempt to use the
``dbtRunner`` as opposed to subprocess to run ``dbt ls``.
While this represents significant performance improvements (half the
vCPU usage and some memory consumption improvement), this may not work
in
scenarios where users had multiple Python virtual environments to manage
different versions of dbt and its adaptors. In those cases,
please, set ``RenderConfig(invocation_mode=InvocationMode.SUBPROCESS)``
to have the same behaviour Cosmos had in previous versions.
Additional information `here
<https://astronomer.github.io/astronomer-cosmos/configuration/parsing-methods.html#dbt-ls>`_
and `here
<https://astronomer.github.io/astronomer-cosmos/configuration/render-config.html#how-to-run-dbt-ls-invocation-mode>`_.

Features

* Use ``dbtRunner`` in the DAG Processor when using ``LoadMode.DBT_LS``
if ``dbt-core`` is available by @tatiana in #1484. Additional
information `here
<https://astronomer.github.io/astronomer-cosmos/configuration/parsing-methods.html#dbt-ls>`_.
* Allow users to opt-out of ``dbtRunner`` during DAG parsing with
``InvocationMode.SUBPROCESS`` by @tatiana in #1495. Check out the
`documentation
<https://astronomer.github.io/astronomer-cosmos/configuration/render-config.html#how-to-run-dbt-ls-invocation-mode>`_.
* Add structure to support multiple db for async operator execution by
@pankajastro in #1483
* Support overriding the ``profile_config`` per dbt node or folder using
config by @tatiana in #1492. More information `here
<https://astronomer.github.io/astronomer-cosmos/profiles/#profile-customise-per-node>`_.
* Create and run accurate SQL statements when using
``ExecutionMode.AIRFLOW_ASYNC`` by @pankajkoti, @tatiana and
@pankajastro in #1474
* Add AWS ECS task run execution mode by @CarlosGitto and @aoelvp94 in
* Add support for running ``DbtSourceOperator`` individually by
@victormacaubas in #1510
* Add setup task for async executions by @pankajastro in #1518
* Add teardown task for async executions by @pankajastro in #1529
* Add ``ProjectConfig.install_dbt_deps`` & change operator
``install_deps=True`` as default by @tatiana in #1521
* Extend Virtualenv operator and mock dbt adapters for setup & teardown
tasks in ``ExecutionMode.AIRFLOW_ASYNC`` by @pankajkoti, @tatiana and
@pankajastro in #1544

Bug Fixes

* Fix select complex intersection of three tag-based graph selectors by
@tatiana in #1466
* Fix custom selector behaviour when the model name contains periods by
@yakovlevvs and @60098727 in #1499
* Filter dbt and non-dbt kwargs correctly for async operator by
@pankajastro in #1526

Enhancement

* Fix OpenLineage deprecation warning by @CorsettiS in #1449
* Move ``DbtRunner`` related functions into ``dbt/runner.py`` module by
@tatiana in #1480
* Add ``on_warning_callback`` to ``DbtSourceKubernetesOperator`` and
refactor previous operators by @LuigiCerone in #1501
* Gracefully error when users set incompatible ``RenderConfig.dbt_deps``
and ``operator_args`` ``install_deps`` by @tatiana in #1505
* Store compiled SQL as template field for
``ExecutionMode.AIRFLOW_ASYNC`` by @pankajkoti in #1534

Docs

* Improve ``RenderConfig`` arguments documentation by @tatiana in #1514
* Improve callback documentation by @tatiana in #1516
* Improve partial parsing docs by @tatiana in #1520
* Fix typo in selecting & excluding docs by @pankajastro in #1523
* Document ``async_py_requirements`` added in ``ExecutionConfig`` for
``ExecutionMode.AIRFLOW_ASYNC`` by @pankajkoti in #1545

Others

* Ignore dbt package tests when running Cosmos tests by @tatiana in
* Refactor to consolidate async dbt adapter code by @pankajkoti in #1509
* Log elapsed time for sql file(s) upload/download by @pankajastro in
* Remove the fallback operator for async task by @pankajastro in #1538
* GitHub Actions Dependabot: #1487
* Pre-commit updates: #1473, #1493, #1503, #1531

(cherry picked from commit c7de602)
…for `ExecutionMode.AIRFLOW_ASYNC` (#1548)

A user has reported after testing the `astronomer-cosmos==1.9.0a5` that
they are getting the below error
```
    from dbt_common.clients.agate_helper import empty_table
ModuleNotFoundError: No module named 'dbt_common'
```

They are using `dbt-bigquery==1.7.2`

Upon debugging, I observed that the `dbt_common` module that we rely on
in the current mocking interface is available only in dbt bigquery
adapterversion >= 1.8. For previous versions, to achieve the same, the
helper seems to be available in `dbt.clients`. I tested this on
dbt-bigquery versions 1.5, 1.6, 1.7 and 1.7.2 and the fix in this PR
seems to solve the issue.

closes: #1547
(cherry picked from commit 30019c6)
<!--pre-commit.ci start-->
updates:
- [github.com/astral-sh/ruff-pre-commit: v0.9.6 →
v0.9.7](astral-sh/ruff-pre-commit@v0.9.6...v0.9.7)
<!--pre-commit.ci end-->

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
(cherry picked from commit 4f793f1)
closes: #1564
**Fix unit test error**
```
tests/operators/_asynchronous/test_bigquery.py:6: in <module>
    from airflow.providers.google.cloud.operators.bigquery import BigQueryInsertJobOperator
../../../.local/share/hatch/env/virtual/astronomer-cosmos/Za_bFbg4/tests.py3.12-2.9/lib/python3.12/site-packages/airflow/providers/google/cloud/operators/bigquery.py:32: in <module>
    from airflow.providers.common.sql.operators.sql import (  # type: ignore[attr-defined] # for _parse_boolean
../../../.local/share/hatch/env/virtual/astronomer-cosmos/Za_bFbg4/tests.py3.12-2.9/lib/python3.12/site-packages/airflow/providers/common/sql/operators/sql.py:29: in <module>
    from airflow.providers.common.sql.hooks.sql import DbApiHook, fetch_all_handler, return_single_query_results
../../../.local/share/hatch/env/virtual/astronomer-cosmos/Za_bFbg4/tests.py3.12-2.9/lib/python3.12/site-packages/airflow/providers/common/sql/hooks/sql.py:37: in <module>
    from methodtools import lru_cache
E   ModuleNotFoundError: No module named 'methodtools'
```
**Disable DAG example_cosmos_dbt_build.py in CI because of error**
```
[2025-02-26 13:13:41,578] {taskinstance.py:1851} ERROR - Task failed with exception
Traceback (most recent call last):
  File "/home/runner/work/astronomer-cosmos/astronomer-cosmos/cosmos/operators/base.py", line 278, in execute
    self.build_and_run_cmd(context=context, cmd_flags=self.add_cmd_flags())
  File "/home/runner/work/astronomer-cosmos/astronomer-cosmos/cosmos/operators/local.py", line 708, in build_and_run_cmd
    result = self.run_command(
  File "/home/runner/work/astronomer-cosmos/astronomer-cosmos/cosmos/operators/local.py", line 556, in run_command
    self.handle_exception(result)
  File "/home/runner/work/astronomer-cosmos/astronomer-cosmos/cosmos/operators/local.py", line 229, in handle_exception_dbt_runner
    return dbt_runner.handle_exception_if_needed(result)
  File "/home/runner/work/astronomer-cosmos/astronomer-cosmos/cosmos/dbt/runner.py", line 113, in handle_exception_if_needed
    raise CosmosDbtRunError(f"dbt invocation completed with errors: {error_message}")
cosmos.exceptions.CosmosDbtRunError: dbt invocation completed with errors: relationships_orders_customer_id__customer_id__ref_customers_: Database Error in test relationships_orders_customer_id__customer_id__ref_customers_ (models/schema.yml)
  relation "public.orders" does not exist
  LINE 13:     from "***"."public"."orders"
                    ^
  compiled code at target/run/altered_jaffle_shop/models/schema.yml/relationships_orders_customer_id__customer_id__ref_customers_.sql
```

Created a follow-up issue:
#1568 to enable
DAG example_cosmos_dbt_build.py

(cherry picked from commit 8630cae)
The Ubuntu 20.04 Actions runner image will begin deprecation on
2025-02-01 and will be fully unsupported by 2025-04-01:

actions/runner-images#11101
(cherry picked from commit 7df2fde)
The function `cosmos.converter.override_configuration` had a small
logical issue: the condition to override `install_deps`
could never be reached. This seems to be the root cause of #1557

Closes #1557

(cherry picked from commit dfbef5c)
`install_dbt_deps` is missing from the `ProjectConfig` `__init__`
method, which is inconsistent with the
[documentation](https://astronomer.github.io/astronomer-cosmos/configuration/project-config.html)
and does not make sense.

Closes: #1555
(cherry picked from commit 0811e46)
`DbtToAirflowConverter` can pass dbt_vars to DbtGraph with the help of
ProjectConfig or operator_args. If operator_args is used in
`DbtToAirflowConverter` then it will lead to the issue with absence of
dbt_vars in dbt ls command (faced rendering issue during usage of cosmos
with project level variables)

(cherry picked from commit 7016dd5)
<!--pre-commit.ci start-->
updates:
- [github.com/astral-sh/ruff-pre-commit: v0.9.7 →
v0.9.9](astral-sh/ruff-pre-commit@v0.9.7...v0.9.9)
<!--pre-commit.ci end-->

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
(cherry picked from commit 08b85b6)
Add the missing Execution mode in bug template

(cherry picked from commit b15c5c8)
… operator (#1582)

Removes unused Bigquery async arguments that trigger a validation of the
profile at parsing time, which we want to avoid.

Closes #1581

(cherry picked from commit 46912b6)
…d Credentials, 401) (#1598)

Workaround to fsspec/gcsfs#664

Since upgrading to `gcsfs==2025.3.0` from `2025.2.0`, we started facing
this issue:
```
File "/home/runner/.local/share/hatch/env/virtual/astronomer-cosmos/Za_bFbg4/tests.py3.11-2.9/lib/python3.11/site-packages/fsspec/asyn.py", line 118, in wrapper
    return sync(self.loop, func, *args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/runner/.local/share/hatch/env/virtual/astronomer-cosmos/Za_bFbg4/tests.py3.11-2.9/lib/python3.11/site-packages/fsspec/asyn.py", line 103, in sync
    raise return_result
  File "/home/runner/.local/share/hatch/env/virtual/astronomer-cosmos/Za_bFbg4/tests.py3.11-2.9/lib/python3.11/site-packages/fsspec/asyn.py", line 56, in _runner
    result[0] = await coro
                ^^^^^^^^^^
  File "/home/runner/.local/share/hatch/env/virtual/astronomer-cosmos/Za_bFbg4/tests.py3.11-2.9/lib/python3.11/site-packages/fsspec/asyn.py", line 696, in _exists
    await self._info(path, **kwargs)
  File "/home/runner/.local/share/hatch/env/virtual/astronomer-cosmos/Za_bFbg4/tests.py3.11-2.9/lib/python3.11/site-packages/gcsfs/core.py", line 1024, in _info
    exact = await self._get_object(path)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/runner/.local/share/hatch/env/virtual/astronomer-cosmos/Za_bFbg4/tests.py3.11-2.9/lib/python3.11/site-packages/gcsfs/core.py", line 557, in _get_object
    res = await self._call(
          ^^^^^^^^^^^^^^^^^
  File "/home/runner/.local/share/hatch/env/virtual/astronomer-cosmos/Za_bFbg4/tests.py3.11-2.9/lib/python3.11/site-packages/gcsfs/core.py", line 477, in _call
    status, headers, info, contents = await self._request(
                                      ^^^^^^^^^^^^^^^^^^^^
  File "/home/runner/.local/share/hatch/env/virtual/astronomer-cosmos/Za_bFbg4/tests.py3.11-2.9/lib/python3.11/site-packages/decorator.py", line 224, in fun
    return await caller(func, *(extras + args), **kw)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/runner/.local/share/hatch/env/virtual/astronomer-cosmos/Za_bFbg4/tests.py3.11-2.9/lib/python3.11/site-packages/gcsfs/retry.py", line 165, in retry_request
    raise e
  File "/home/runner/.local/share/hatch/env/virtual/astronomer-cosmos/Za_bFbg4/tests.py3.11-2.9/lib/python3.11/site-packages/gcsfs/retry.py", line 135, in retry_request
    return await func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/runner/.local/share/hatch/env/virtual/astronomer-cosmos/Za_bFbg4/tests.py3.11-2.9/lib/python3.11/site-packages/gcsfs/core.py", line 461, in _request
    headers=self._get_headers(headers),
            ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/runner/.local/share/hatch/env/virtual/astronomer-cosmos/Za_bFbg4/tests.py3.11-2.9/lib/python3.11/site-packages/gcsfs/core.py", line 438, in _get_headers
    self.credentials.apply(out)
  File "/home/runner/.local/share/hatch/env/virtual/astronomer-cosmos/Za_bFbg4/tests.py3.11-2.9/lib/python3.11/site-packages/gcsfs/credentials.py", line 212, in apply
    self.maybe_refresh()
  File "/home/runner/.local/share/hatch/env/virtual/astronomer-cosmos/Za_bFbg4/tests.py3.11-2.9/lib/python3.11/site-packages/gcsfs/credentials.py", line 203, in maybe_refresh
    raise HttpError(
gcsfs.retry.HttpError: Invalid Credentials, 401
```

When I use the same credentials with `2025.2.0` things work as expected.

This problem was spotted while using Apache Airflow in our CI:

https://github.com/astronomer/astronomer-cosmos/actions/runs/13772013607/job/38566202965?pr=1596

We used this script to generate the credentials that work:
```
import json
import urllib.parse

with open("/Users/tati//Downloads/astronomer-dag-authoring-121145ad8a5a.json", "r") as file:
    json_content = json.load(file)

url_encoded_content = urllib.parse.quote(json.dumps(json_content))

print(url_encoded_content)

print(f'google-cloud-platform://?keyfile_dict={url_encoded_content}&scope=https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fcloud-platform')
```

(cherry picked from commit b04717c)
<!--pre-commit.ci start-->
updates:
- [github.com/astral-sh/ruff-pre-commit: v0.9.9 →
v0.9.10](astral-sh/ruff-pre-commit@v0.9.9...v0.9.10)
<!--pre-commit.ci end-->

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
(cherry picked from commit dd8b6c7)
closes: #1585

This PR modifies the DbtNode to include the packages,
allowing us to correctly construct the path when reading
the generated SQL files. In DBT projects with dbt_packages,
the dbt run command generates SQL files within the respective
dbt_packages folder inside the target/run directory, instead of
the main project folder.

<img width="1667" alt="Screenshot 2025-03-06 at 12 01 51 AM"
src="https://github.com/user-attachments/assets/911e0859-327f-49bf-a081-4da7003d7817"
/>

**With Setup task**

<img width="1687" alt="Screenshot 2025-03-06 at 12 02 58 AM"
src="https://github.com/user-attachments/assets/c3c1f066-b19f-4bdd-9358-779845a32f8b"
/>

**Without Setup task**

<img width="1688" alt="Screenshot 2025-03-06 at 12 03 34 AM"
src="https://github.com/user-attachments/assets/b0bb21bd-e0d1-45a4-90ea-cb4857630318"
/>

(cherry picked from commit b309dac)
Currently, if someone attempts to run `simple_dag_async` without
previously installing `apache-airflow-providers-google`, they will face
this very ugly error:

```
Traceback (most recent call last):
  File "/Users/tati/Code/cosmos-fresh/astronomer-cosmos/venv/lib/python3.9/site-packages/airflow/models/dagbag.py", line 383, in parse
    loader.exec_module(new_module)
  File "<frozen importlib._bootstrap_external>", line 850, in exec_module
  File "<frozen importlib._bootstrap>", line 228, in _call_with_frames_removed
  File "/Users/tati/Code/cosmos-fresh/astronomer-cosmos/dags/simple_dag_async.py", line 21, in <module>
    simple_dag_async = DbtDag(
  File "/Users/tati/Code/cosmos-fresh/astronomer-cosmos/cosmos/airflow/dag.py", line 26, in __init__
    DbtToAirflowConverter.__init__(self, *args, **specific_kwargs(**kwargs))
  File "/Users/tati/Code/cosmos-fresh/astronomer-cosmos/cosmos/converter.py", line 328, in __init__
    self.tasks_map = build_airflow_graph(
  File "/Users/tati/Code/cosmos-fresh/astronomer-cosmos/cosmos/airflow/graph.py", line 591, in build_airflow_graph
    task_or_group = conversion_function(  # type: ignore
  File "/Users/tati/Code/cosmos-fresh/astronomer-cosmos/cosmos/airflow/graph.py", line 379, in generate_task_or_group
    task = create_airflow_task(task_meta, dag, task_group=model_task_group)
  File "/Users/tati/Code/cosmos-fresh/astronomer-cosmos/cosmos/core/airflow.py", line 36, in get_airflow_task
    airflow_task = Operator(
  File "/Users/tati/Code/cosmos-fresh/astronomer-cosmos/venv/lib/python3.9/site-packages/airflow/models/baseoperator.py", line 506, in apply_defaults
    result = func(self, **kwargs, default_args=default_args)
  File "/Users/tati/Code/cosmos-fresh/astronomer-cosmos/cosmos/operators/airflow_async.py", line 75, in __init__
    super().__init__(
  File "/Users/tati/Code/cosmos-fresh/astronomer-cosmos/venv/lib/python3.9/site-packages/airflow/models/baseoperator.py", line 506, in apply_defaults
    result = func(self, **kwargs, default_args=default_args)
  File "/Users/tati/Code/cosmos-fresh/astronomer-cosmos/cosmos/operators/_asynchronous/base.py", line 50, in __init__
    async_operator_class = self.create_async_operator()
  File "/Users/tati/Code/cosmos-fresh/astronomer-cosmos/cosmos/operators/_asynchronous/base.py", line 69, in create_async_operator
    async_class_operator = _create_async_operator_class(profile_type, "DbtRun")
  File "/Users/tati/Code/cosmos-fresh/astronomer-cosmos/cosmos/operators/_asynchronous/base.py", line 34, in _create_async_operator_class
    raise ImportError(f"Error in loading class: {class_path}. Unable to find the specified operator class.") from e
ImportError: Error in loading class: cosmos.operators._asynchronous.bigquery.DbtRunAirflowAsyncBigqueryOperator. Unable to find the specified operator class.
```

The goal with this ticket is to give the same error handling as other
parts of Cosmos by raising a more graceful error message.

(cherry picked from commit 09bcb55)
Users who generated the `manifest.json` using MS Windows and attempted
to use Cosmos path selectors after, such as
`path:models/edr/run_results' were unable to do so, because the paths in
Windows were different from the selector:

```
    "model.elementary.model_run_results": {
      "database": "FDH_DEV_DB",
      "schema": "MONITORING",
      "name": "model_run_results",
      "resource_type": "model",
      "package_name": "elementary",
      "path": "edr\\run_results\\model_run_results.sql",
      "original_file_path": "models\\edr\\run_results\\model_run_results.sql",
      "unique_id": "model.elementary.model_run_results",
      "fqn": [
        "elementary",
        "edr",
        "run_results",
        "model_run_results"
      ],
```

As observed in this example, the property `original_file_path` used the
`\\` character as a divider in the path, but the selector checked using
the Posix notation.

Since Cosmos implements path selectors using: path_selection in
str(node.file_path), we have to normalize the input for the filter to
work.

This issue only happened when using `LoadMode.DBT_MANIFEST` and not
`LoadMode.DBT_LS` since dbt normalizes this internally when handling
selectors as part of this command line.

(cherry picked from commit 9a1c8fe)
The log that prints 'Total filtered nodes' printed the incorrect value
(the total nodes instead of the actual filtered nodes).

(cherry picked from commit 674f15c)
…1602)

Let's say the dbt project has a file_path "gen2/models/parent.sql"
```
parent_node = DbtNode(
    unique_id=f"{DbtResourceType.MODEL.value}.{SAMPLE_PROJ_PATH.stem}.parent",
    resource_type=DbtResourceType.MODEL,
    depends_on=[grandparent_node.unique_id, another_grandparent_node.unique_id],
    file_path=SAMPLE_PROJ_PATH / "gen2/models/parent.sql",
    tags=["has_child", "is_child"],
    config={"materialized": "view", "tags": ["has_child", "is_child"]},
)
```

When using Cosmos 1.9.0 with `LoadMode.MANIFEST` and trying to use:

```
RenderConfig(select="gen2/models/*")
```

The selector would not return any results.

It would still work with `LoadMode.DBT_LS`.

The goal of this PR is to solve this issue.

(cherry picked from commit 0e1f81b)
Makes sure the fixes:
- Fix path selector when manifest.json was created in MS Windows (#1601)
- Fix select behaviour using LoadMode.MANIFEST and a path with star
(#1602)

Work from an end-to-end perspective, solving Astro customer's original
issue.

(cherry picked from commit 3138892)
This PR introduces a new CI job named `Run-Integration-Tests-DBT-Async`
to ensure compatibility of the async example DAG with multiple dbt
versions. It achieves this by adding a third dimension to the
`pyproject.toml` matrix, enabling the CI to run the DAG across a list of
dbt versions.

Additionally, this PR includes a new test file:
`tests/test_async_example_dag.py`. While we already have
`tests/test_example_dags.py`, certain dbt versions have shown parsing
issues with some example DAGs, which can cause CI failures unrelated to
the async DAG. To prevent this, the new file is a modified copy that
exclusively tests `simple_async_dag`, with other DAGs ignored via
`.airflowignore`. This ensures that the CI job focuses on validating the
async DAG without being affected by unrelated parsing errors.

closes: #1489
(cherry picked from commit 372d388)
@netlify
Copy link
Copy Markdown

netlify Bot commented Mar 13, 2025

Deploy Preview for sunny-pastelito-5ecb04 canceled.

Name Link
🔨 Latest commit 06bdcec
🔍 Latest deploy log https://app.netlify.com/sites/sunny-pastelito-5ecb04/deploys/67d335ae4abc0900086dbb3e

corsettigyg and others added 2 commits March 13, 2025 20:17
…nMode.LOCAL` (#1571)

As of now, when we set TestBehavior.BUILD, we are not leveraging the
method on_warning_callback that is available for Test nodes and Source
Nodes. I have added the parsing to DbtBuildLocalOperator in order to fix
it. I tested it locally and I got positive results

Related: #1569
(cherry picked from commit ddea39c)
@cloudflare-workers-and-pages
Copy link
Copy Markdown

cloudflare-workers-and-pages Bot commented Mar 13, 2025

Deploying astronomer-cosmos with  Cloudflare Pages  Cloudflare Pages

Latest commit: 06bdcec
Status: ✅  Deploy successful!
Preview URL: https://9219b303.astronomer-cosmos.pages.dev
Branch Preview URL: https://release-1-9-1.astronomer-cosmos.pages.dev

View logs

@codecov
Copy link
Copy Markdown

codecov Bot commented Mar 13, 2025

Codecov Report

Attention: Patch coverage is 95.38462% with 3 lines in your changes missing coverage. Please review.

Project coverage is 97.42%. Comparing base (bbcc9e3) to head (06bdcec).
Report is 33 commits behind head on main.

Files with missing lines Patch % Lines
cosmos/operators/_asynchronous/bigquery.py 84.21% 3 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1607      +/-   ##
==========================================
+ Coverage   97.30%   97.42%   +0.12%     
==========================================
  Files          80       80              
  Lines        4901     4938      +37     
==========================================
+ Hits         4769     4811      +42     
+ Misses        132      127       -5     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@tatiana
Copy link
Copy Markdown
Collaborator

tatiana commented Mar 13, 2025

Thanks a lot, @pankajkoti ! If we could cherry-pick:

If we're happy with this:

tatiana and others added 3 commits March 13, 2025 22:17
A few tests, such as
`test_configure_remote_target_path_no_remote_target`, were taking a long
time when using `hatch run tests.py3.9-2.9:test-cov`.

Time to run this command before these changes: 89.34s
Time to run this command after these changes: 14.50s

Also fix unittest that was failing locally.

(cherry picked from commit d494dcd)
Since Cosmos 1.9.0, users who attempted to use:
```
DbtRunLocalOperator.partial(task_id="foo", project_dir="foo")
```

Started facing the issue:
```
File /usr/local/lib/python3.11/site-packages/airflow/models/baseoperator.py:284, in partial(operator_class, task_id, dag, task_group, start_date, end_date, owner, email, params, resources, trigger_rule, depends_on_past, ignore_first_depends_on_past, wait_for_past_depends_before_skipping, wait_for_downstream, retries, queue, pool, pool_slots, execution_timeout, max_retry_delay, retry_delay, retry_exponential_backoff, priority_weight, weight_rule, sla, map_index_template, max_active_tis_per_dag, max_active_tis_per_dagrun, on_execute_callback, on_failure_callback, on_success_callback, on_retry_callback, on_skipped_callback, run_as_user, executor, executor_config, inlets, outlets, doc, doc_md, doc_json, doc_yaml, doc_rst, task_display_name, logger_name, allow_nested_operators, **kwargs)
281 from airflow.models.dag import DagContext
282 from airflow.utils.task_group import TaskGroupContext
--> 284 validate_mapping_kwargs(operator_class, "partial", kwargs)
286 dag = dag or DagContext.get_current_dag()
287 if dag:

File /usr/local/lib/python3.11/site-packages/airflow/models/mappedoperator.py:123, in validate_mapping_kwargs(op, func, value)
121 names = ", ".join(repr(n) for n in unknown_args)
122 error = f"unexpected keyword arguments {names}"
--> 123 raise TypeError(f"{op.name}.{func}() got {error}")

TypeError: DbtRunLocalOperator.partial() got an unexpected keyword argument 'project_dir'`
```

This was introduced given the changes in how Cosmos operators subclass,
that was introduced to allow to dynamically chose which Airflow operator
is run during DAG rendering time.

Closes: #1546

To validate it, we introduced a new small dbt project and an example
DAG, and it can be tested by running:
```
airflow dags test example_task_mapping
```

Co-authored-by: Ash Berlin-Taylor <ash@astronomer.io>
(cherry picked from commit c8c148b)
@pankajkoti
Copy link
Copy Markdown
Contributor Author

Thanks a lot, @pankajkoti ! If we could cherry-pick:

If we're happy with this:

Thanks @tatiana . I merged #1609

And have cherry-picked here both the mentioned PRs #1600 and #1609

nicor88 and others added 2 commits March 13, 2025 22:33
## Description

### TL/DR
* pas `container_name` to kwargs
* use a default value for **aws_conn_id**

### Long version

The current implementation of ECS integration implies passing the
`container_name` as part of the operator_args.
e.g.
```
operator_args = {
   "container_name": "main",
   ...
}
```
Anyhow, this lead to errors like this:
```
[2025-03-07, 16:40:34 UTC] {ecs.py:515} INFO - EcsOperator overrides: {'containerOverrides': [{'name': None, 'command': ['dbt', '--no-partial-parse', 'run', '--models', 'my_first_dbt_model'], 'environment': [{'name': 'AIRFLOW_CTX_DAG_OWNER', 'value': '***'}, {'name': 'AIRFLOW_CTX_DAG_ID', 'value': 'example_cosmos'}, {'name': 'AIRFLOW_CTX_TASK_ID', 'value': 'dbt_task_group.my_first_dbt_model.run'}, {'name': 'AIRFLOW_CTX_EXECUTION_DATE', 'value': '2025-03-07T16:40:31.716155+00:00'}, {'name': 'AIRFLOW_CTX_TRY_NUMBER', 'value': '1'}, {'name': 'AIRFLOW_CTX_DAG_RUN_ID', 'value': 'manual__2025-03-07T16:40:31.716155+00:00'}, {'name': 'EXTRA_VAR', 'value': 'extra_value'}]}]}
```
The container name is None, leading to a failure in how boto3 invokes
the container.

The issue was due to the fact that `container_name` was not passed to
the kwargs, therefore, the container_name was not set properly to the
value that was set to cosmos.

### Full logs
<pre>
2025-03-10, 12:49:41 UTC] {ecs.py:512} INFO - Running ECS Task - Task
definition: dbt - on cluster dbt
[2025-03-10, 12:49:41 UTC] {ecs.py:515} INFO - EcsOperator overrides:
{'containerOverrides': [{'name': None, 'command': ['dbt',
'--no-partial-parse', 'run', '--models', 'my_first_dbt_model'],
'environment': [{'name': 'AIRFLOW_CTX_DAG_OWNER', 'value': '***'},
{'name': 'AIRFLOW_CTX_DAG_ID', 'value': 'example_cosmos'}, {'name':
'AIRFLOW_CTX_TASK_ID', 'value':
'dbt_task_group.my_first_dbt_model.run'}, {'name':
'AIRFLOW_CTX_EXECUTION_DATE', 'value':
'2025-03-10T12:49:28.878404+00:00'}, {'name': 'AIRFLOW_CTX_TRY_NUMBER',
'value': '1'}, {'name': 'AIRFLOW_CTX_DAG_RUN_ID', 'value':
'manual__2025-03-10T12:49:28.878404+00:00'}, {'name': 'EXTRA_VAR',
'value': 'extra_value'}]}]}
[2025-03-10, 12:49:41 UTC] {base.py:84} INFO - Retrieving connection
'aws_default'
[2025-03-10, 12:49:44 UTC] {credentials.py:1147} INFO - Found
credentials in environment variables.
[2025-03-10, 12:49:44 UTC] {taskinstance.py:3313} ERROR - Task failed
with exception
Traceback (most recent call last):
File
"/home/airflow/.local/lib/python3.12/site-packages/airflow/models/taskinstance.py",
line 768, in _execute_task
result = _execute_callable(context=context, **execute_callable_kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File
"/home/airflow/.local/lib/python3.12/site-packages/airflow/models/taskinstance.py",
line 734, in _execute_callable
    return ExecutionCallableRunner(
           ^^^^^^^^^^^^^^^^^^^^^^^^
File
"/home/airflow/.local/lib/python3.12/site-packages/airflow/utils/operator_helpers.py",
line 252, in run
    return self.func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
File
"/home/airflow/.local/lib/python3.12/site-packages/cosmos/operators/base.py",
line 278, in execute
self.build_and_run_cmd(context=context, cmd_flags=self.add_cmd_flags())
File
"/home/airflow/.local/lib/python3.12/site-packages/cosmos/operators/aws_ecs.py",
line 98, in build_and_run_cmd
    result = EcsRunTaskOperator.execute(self, context)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File
"/home/airflow/.local/lib/python3.12/site-packages/airflow/models/baseoperator.py",
line 424, in wrapper
    return func(self, *args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
File
"/home/airflow/.local/lib/python3.12/site-packages/airflow/providers/amazon/aws/operators/ecs.py",
line 526, in execute
    self._start_task()
File
"/home/airflow/.local/lib/python3.12/site-packages/airflow/providers/amazon/aws/operators/ecs.py",
line 626, in _start_task
    response = self.client.run_task(**run_opts)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File
"/home/airflow/.local/lib/python3.12/site-packages/botocore/client.py",
line 569, in _api_call
    return self._make_api_call(operation_name, kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File
"/home/airflow/.local/lib/python3.12/site-packages/botocore/client.py",
line 980, in _make_api_call
    request_dict = self._convert_to_request_dict(
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File
"/home/airflow/.local/lib/python3.12/site-packages/botocore/client.py",
line 1047, in _convert_to_request_dict
    request_dict = self._serializer.serialize_to_request(
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File
"/home/airflow/.local/lib/python3.12/site-packages/botocore/validate.py",
line 381, in serialize_to_request
    raise ParamValidationError(report=report.generate_report())
botocore.exceptions.ParamValidationError: Parameter validation failed:
Invalid type for parameter overrides.containerOverrides[0].name, value:
None, type: <class 'NoneType'>, valid types: <class 'str'>
</pre>

## Related Issue(s)
I didn't created any issue - but I just thought to propose a fix.

## Breaking Change?

It does because user have to use `dbt_container_name` with ECS, but it's
currently broken.

## Checklist

- [ ] I have made corresponding changes to the documentation (if
required)
- [ ] I have added tests that prove my fix is effective or that my
feature works

(cherry picked from commit 483ca7c)
@pankajkoti
Copy link
Copy Markdown
Contributor Author

pankajkoti commented Mar 13, 2025

Also, cherry-picked PR #1592

cc: @pankajastro @tatiana

@tatiana tatiana added this to the Cosmos 1.9.1 milestone Mar 13, 2025
Comment thread CHANGELOG.rst Outdated
@tatiana
Copy link
Copy Markdown
Collaborator

tatiana commented Mar 13, 2025

We've released from this branch and I created a mergeable PR (upgrading version & changelog) #1612

@tatiana tatiana closed this Mar 13, 2025
tatiana added a commit that referenced this pull request Mar 17, 2025
Bug Fixes

* Fix import error in dbt bigquery adapter mock for ``dbt-bigquery<1.8``
for ``ExecutionMode.AIRFLOW_ASYNC`` by @pankajkoti in #1548
* Fix ``operator_args`` override configuration by @ghjklw in #1558
* Fix missing ``install_dbt_deps`` in ``ProjectConfig`` ``__init__``
method by @ghjklw in #1556
* Fix dbt project parsing ``dbt_vars`` behavior passed via
``operator_args`` by @AlexandrKhabarov in #1543
* Avoid reading the connection during DAG parsing of the async BigQuery
operator by @joppevos in #1582
* Fix: Workaround to incorrectly raised ``gcsfs.retry.HttpError``
(Invalid Credentials, 401) by @tatiana in #1598
* Fix the async execution mode read sql files for dbt packages by
@pankajastro in #1588
* Improve BQ async error handling by @tatiana in #1597
* Fix path selector when ``manifest.json`` is created using MS Windows
by @tatiana in #1601
* Fix log that prints 'Total filtered nodes' by @tatiana in #1603
* Fix select behaviour using ``LoadMode.MANIFEST`` and a path with star
by @tatiana in #1602
* Support ``on_warning_callback`` with ``TestBehavior.BUILD`` and
``ExecutionMode.LOCAL`` by @corsettigyg in #1571
* Fix ``DbtRunLocalOperator.partial()`` support by @tatiana @ashb in
#1609
* fix: ``container_name`` is null for ecs integration by @nicor88 in
#1592

Docs

* Improve MWAA getting-started docs by removing unused imports by
@jx2lee in #1562

Others

* Disable ``example_cosmos_dbt_build.py`` DAG in CI by @pankajastro in
#1567
* Upgrade GitHub Actions Ubuntu version by @tatiana in #1561
* Update GitHub bug issue template by @pankajastro in #1586
* Enable DAG ``example_cosmos_dbt_build.py`` in CI by @pankajastro in
#1573
* Run async DAG in DAG without setup/teardown task by @pankajastro in
#1599
* Add test case that fully covers recent select issue by @tatiana in
#1604
* Add CI job to test multiple dbt versions for the async DAG by
@pankajkoti in #1535
* Improve unit tests speed from 89s to 14s by @tatiana in #1600
* Pre-commit updates: #1560, #1583, #1596


Closes: #1550

Mergeable version of
#1607

Co-authored-by: Pankaj Singh
<98807258+pankajastro@users.noreply.github.com>
Co-authored-by: Pankaj Koti <pankajkoti699@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Release Cosmos 1.9.1

9 participants