Release 1.9.1#1607
Closed
pankajkoti wants to merge 38 commits into
Closed
Conversation
Co-authored-by: Pankaj Singh <98807258+pankajastro@users.noreply.github.com>
Breaking changes * When using ``LoadMode.DBT_LS``, Cosmos will now attempt to use the ``dbtRunner`` as opposed to subprocess to run ``dbt ls``. While this represents significant performance improvements (half the vCPU usage and some memory consumption improvement), this may not work in scenarios where users had multiple Python virtual environments to manage different versions of dbt and its adaptors. In those cases, please, set ``RenderConfig(invocation_mode=InvocationMode.SUBPROCESS)`` to have the same behaviour Cosmos had in previous versions. Additional information `here <https://astronomer.github.io/astronomer-cosmos/configuration/parsing-methods.html#dbt-ls>`_ and `here <https://astronomer.github.io/astronomer-cosmos/configuration/render-config.html#how-to-run-dbt-ls-invocation-mode>`_. Features * Use ``dbtRunner`` in the DAG Processor when using ``LoadMode.DBT_LS`` if ``dbt-core`` is available by @tatiana in #1484. Additional information `here <https://astronomer.github.io/astronomer-cosmos/configuration/parsing-methods.html#dbt-ls>`_. * Allow users to opt-out of ``dbtRunner`` during DAG parsing with ``InvocationMode.SUBPROCESS`` by @tatiana in #1495. Check out the `documentation <https://astronomer.github.io/astronomer-cosmos/configuration/render-config.html#how-to-run-dbt-ls-invocation-mode>`_. * Add structure to support multiple db for async operator execution by @pankajastro in #1483 * Support overriding the ``profile_config`` per dbt node or folder using config by @tatiana in #1492. More information `here <https://astronomer.github.io/astronomer-cosmos/profiles/#profile-customise-per-node>`_. * Create and run accurate SQL statements when using ``ExecutionMode.AIRFLOW_ASYNC`` by @pankajkoti, @tatiana and @pankajastro in #1474 * Add AWS ECS task run execution mode by @CarlosGitto and @aoelvp94 in * Add support for running ``DbtSourceOperator`` individually by @victormacaubas in #1510 * Add setup task for async executions by @pankajastro in #1518 * Add teardown task for async executions by @pankajastro in #1529 * Add ``ProjectConfig.install_dbt_deps`` & change operator ``install_deps=True`` as default by @tatiana in #1521 * Extend Virtualenv operator and mock dbt adapters for setup & teardown tasks in ``ExecutionMode.AIRFLOW_ASYNC`` by @pankajkoti, @tatiana and @pankajastro in #1544 Bug Fixes * Fix select complex intersection of three tag-based graph selectors by @tatiana in #1466 * Fix custom selector behaviour when the model name contains periods by @yakovlevvs and @60098727 in #1499 * Filter dbt and non-dbt kwargs correctly for async operator by @pankajastro in #1526 Enhancement * Fix OpenLineage deprecation warning by @CorsettiS in #1449 * Move ``DbtRunner`` related functions into ``dbt/runner.py`` module by @tatiana in #1480 * Add ``on_warning_callback`` to ``DbtSourceKubernetesOperator`` and refactor previous operators by @LuigiCerone in #1501 * Gracefully error when users set incompatible ``RenderConfig.dbt_deps`` and ``operator_args`` ``install_deps`` by @tatiana in #1505 * Store compiled SQL as template field for ``ExecutionMode.AIRFLOW_ASYNC`` by @pankajkoti in #1534 Docs * Improve ``RenderConfig`` arguments documentation by @tatiana in #1514 * Improve callback documentation by @tatiana in #1516 * Improve partial parsing docs by @tatiana in #1520 * Fix typo in selecting & excluding docs by @pankajastro in #1523 * Document ``async_py_requirements`` added in ``ExecutionConfig`` for ``ExecutionMode.AIRFLOW_ASYNC`` by @pankajkoti in #1545 Others * Ignore dbt package tests when running Cosmos tests by @tatiana in * Refactor to consolidate async dbt adapter code by @pankajkoti in #1509 * Log elapsed time for sql file(s) upload/download by @pankajastro in * Remove the fallback operator for async task by @pankajastro in #1538 * GitHub Actions Dependabot: #1487 * Pre-commit updates: #1473, #1493, #1503, #1531 (cherry picked from commit c7de602)
…for `ExecutionMode.AIRFLOW_ASYNC` (#1548) A user has reported after testing the `astronomer-cosmos==1.9.0a5` that they are getting the below error ``` from dbt_common.clients.agate_helper import empty_table ModuleNotFoundError: No module named 'dbt_common' ``` They are using `dbt-bigquery==1.7.2` Upon debugging, I observed that the `dbt_common` module that we rely on in the current mocking interface is available only in dbt bigquery adapterversion >= 1.8. For previous versions, to achieve the same, the helper seems to be available in `dbt.clients`. I tested this on dbt-bigquery versions 1.5, 1.6, 1.7 and 1.7.2 and the fix in this PR seems to solve the issue. closes: #1547 (cherry picked from commit 30019c6)
<!--pre-commit.ci start--> updates: - [github.com/astral-sh/ruff-pre-commit: v0.9.6 → v0.9.7](astral-sh/ruff-pre-commit@v0.9.6...v0.9.7) <!--pre-commit.ci end--> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> (cherry picked from commit 4f793f1)
(cherry picked from commit 8ef378d)
closes: #1564 **Fix unit test error** ``` tests/operators/_asynchronous/test_bigquery.py:6: in <module> from airflow.providers.google.cloud.operators.bigquery import BigQueryInsertJobOperator ../../../.local/share/hatch/env/virtual/astronomer-cosmos/Za_bFbg4/tests.py3.12-2.9/lib/python3.12/site-packages/airflow/providers/google/cloud/operators/bigquery.py:32: in <module> from airflow.providers.common.sql.operators.sql import ( # type: ignore[attr-defined] # for _parse_boolean ../../../.local/share/hatch/env/virtual/astronomer-cosmos/Za_bFbg4/tests.py3.12-2.9/lib/python3.12/site-packages/airflow/providers/common/sql/operators/sql.py:29: in <module> from airflow.providers.common.sql.hooks.sql import DbApiHook, fetch_all_handler, return_single_query_results ../../../.local/share/hatch/env/virtual/astronomer-cosmos/Za_bFbg4/tests.py3.12-2.9/lib/python3.12/site-packages/airflow/providers/common/sql/hooks/sql.py:37: in <module> from methodtools import lru_cache E ModuleNotFoundError: No module named 'methodtools' ``` **Disable DAG example_cosmos_dbt_build.py in CI because of error** ``` [2025-02-26 13:13:41,578] {taskinstance.py:1851} ERROR - Task failed with exception Traceback (most recent call last): File "/home/runner/work/astronomer-cosmos/astronomer-cosmos/cosmos/operators/base.py", line 278, in execute self.build_and_run_cmd(context=context, cmd_flags=self.add_cmd_flags()) File "/home/runner/work/astronomer-cosmos/astronomer-cosmos/cosmos/operators/local.py", line 708, in build_and_run_cmd result = self.run_command( File "/home/runner/work/astronomer-cosmos/astronomer-cosmos/cosmos/operators/local.py", line 556, in run_command self.handle_exception(result) File "/home/runner/work/astronomer-cosmos/astronomer-cosmos/cosmos/operators/local.py", line 229, in handle_exception_dbt_runner return dbt_runner.handle_exception_if_needed(result) File "/home/runner/work/astronomer-cosmos/astronomer-cosmos/cosmos/dbt/runner.py", line 113, in handle_exception_if_needed raise CosmosDbtRunError(f"dbt invocation completed with errors: {error_message}") cosmos.exceptions.CosmosDbtRunError: dbt invocation completed with errors: relationships_orders_customer_id__customer_id__ref_customers_: Database Error in test relationships_orders_customer_id__customer_id__ref_customers_ (models/schema.yml) relation "public.orders" does not exist LINE 13: from "***"."public"."orders" ^ compiled code at target/run/altered_jaffle_shop/models/schema.yml/relationships_orders_customer_id__customer_id__ref_customers_.sql ``` Created a follow-up issue: #1568 to enable DAG example_cosmos_dbt_build.py (cherry picked from commit 8630cae)
The Ubuntu 20.04 Actions runner image will begin deprecation on 2025-02-01 and will be fully unsupported by 2025-04-01: actions/runner-images#11101 (cherry picked from commit 7df2fde)
`install_dbt_deps` is missing from the `ProjectConfig` `__init__` method, which is inconsistent with the [documentation](https://astronomer.github.io/astronomer-cosmos/configuration/project-config.html) and does not make sense. Closes: #1555 (cherry picked from commit 0811e46)
`DbtToAirflowConverter` can pass dbt_vars to DbtGraph with the help of ProjectConfig or operator_args. If operator_args is used in `DbtToAirflowConverter` then it will lead to the issue with absence of dbt_vars in dbt ls command (faced rendering issue during usage of cosmos with project level variables) (cherry picked from commit 7016dd5)
<!--pre-commit.ci start--> updates: - [github.com/astral-sh/ruff-pre-commit: v0.9.7 → v0.9.9](astral-sh/ruff-pre-commit@v0.9.7...v0.9.9) <!--pre-commit.ci end--> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> (cherry picked from commit 08b85b6)
Add the missing Execution mode in bug template (cherry picked from commit b15c5c8)
…d Credentials, 401) (#1598) Workaround to fsspec/gcsfs#664 Since upgrading to `gcsfs==2025.3.0` from `2025.2.0`, we started facing this issue: ``` File "/home/runner/.local/share/hatch/env/virtual/astronomer-cosmos/Za_bFbg4/tests.py3.11-2.9/lib/python3.11/site-packages/fsspec/asyn.py", line 118, in wrapper return sync(self.loop, func, *args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/runner/.local/share/hatch/env/virtual/astronomer-cosmos/Za_bFbg4/tests.py3.11-2.9/lib/python3.11/site-packages/fsspec/asyn.py", line 103, in sync raise return_result File "/home/runner/.local/share/hatch/env/virtual/astronomer-cosmos/Za_bFbg4/tests.py3.11-2.9/lib/python3.11/site-packages/fsspec/asyn.py", line 56, in _runner result[0] = await coro ^^^^^^^^^^ File "/home/runner/.local/share/hatch/env/virtual/astronomer-cosmos/Za_bFbg4/tests.py3.11-2.9/lib/python3.11/site-packages/fsspec/asyn.py", line 696, in _exists await self._info(path, **kwargs) File "/home/runner/.local/share/hatch/env/virtual/astronomer-cosmos/Za_bFbg4/tests.py3.11-2.9/lib/python3.11/site-packages/gcsfs/core.py", line 1024, in _info exact = await self._get_object(path) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/runner/.local/share/hatch/env/virtual/astronomer-cosmos/Za_bFbg4/tests.py3.11-2.9/lib/python3.11/site-packages/gcsfs/core.py", line 557, in _get_object res = await self._call( ^^^^^^^^^^^^^^^^^ File "/home/runner/.local/share/hatch/env/virtual/astronomer-cosmos/Za_bFbg4/tests.py3.11-2.9/lib/python3.11/site-packages/gcsfs/core.py", line 477, in _call status, headers, info, contents = await self._request( ^^^^^^^^^^^^^^^^^^^^ File "/home/runner/.local/share/hatch/env/virtual/astronomer-cosmos/Za_bFbg4/tests.py3.11-2.9/lib/python3.11/site-packages/decorator.py", line 224, in fun return await caller(func, *(extras + args), **kw) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/runner/.local/share/hatch/env/virtual/astronomer-cosmos/Za_bFbg4/tests.py3.11-2.9/lib/python3.11/site-packages/gcsfs/retry.py", line 165, in retry_request raise e File "/home/runner/.local/share/hatch/env/virtual/astronomer-cosmos/Za_bFbg4/tests.py3.11-2.9/lib/python3.11/site-packages/gcsfs/retry.py", line 135, in retry_request return await func(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/runner/.local/share/hatch/env/virtual/astronomer-cosmos/Za_bFbg4/tests.py3.11-2.9/lib/python3.11/site-packages/gcsfs/core.py", line 461, in _request headers=self._get_headers(headers), ^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/runner/.local/share/hatch/env/virtual/astronomer-cosmos/Za_bFbg4/tests.py3.11-2.9/lib/python3.11/site-packages/gcsfs/core.py", line 438, in _get_headers self.credentials.apply(out) File "/home/runner/.local/share/hatch/env/virtual/astronomer-cosmos/Za_bFbg4/tests.py3.11-2.9/lib/python3.11/site-packages/gcsfs/credentials.py", line 212, in apply self.maybe_refresh() File "/home/runner/.local/share/hatch/env/virtual/astronomer-cosmos/Za_bFbg4/tests.py3.11-2.9/lib/python3.11/site-packages/gcsfs/credentials.py", line 203, in maybe_refresh raise HttpError( gcsfs.retry.HttpError: Invalid Credentials, 401 ``` When I use the same credentials with `2025.2.0` things work as expected. This problem was spotted while using Apache Airflow in our CI: https://github.com/astronomer/astronomer-cosmos/actions/runs/13772013607/job/38566202965?pr=1596 We used this script to generate the credentials that work: ``` import json import urllib.parse with open("/Users/tati//Downloads/astronomer-dag-authoring-121145ad8a5a.json", "r") as file: json_content = json.load(file) url_encoded_content = urllib.parse.quote(json.dumps(json_content)) print(url_encoded_content) print(f'google-cloud-platform://?keyfile_dict={url_encoded_content}&scope=https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fcloud-platform') ``` (cherry picked from commit b04717c)
<!--pre-commit.ci start--> updates: - [github.com/astral-sh/ruff-pre-commit: v0.9.9 → v0.9.10](astral-sh/ruff-pre-commit@v0.9.9...v0.9.10) <!--pre-commit.ci end--> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> (cherry picked from commit dd8b6c7)
closes: #1585 This PR modifies the DbtNode to include the packages, allowing us to correctly construct the path when reading the generated SQL files. In DBT projects with dbt_packages, the dbt run command generates SQL files within the respective dbt_packages folder inside the target/run directory, instead of the main project folder. <img width="1667" alt="Screenshot 2025-03-06 at 12 01 51 AM" src="https://github.com/user-attachments/assets/911e0859-327f-49bf-a081-4da7003d7817" /> **With Setup task** <img width="1687" alt="Screenshot 2025-03-06 at 12 02 58 AM" src="https://github.com/user-attachments/assets/c3c1f066-b19f-4bdd-9358-779845a32f8b" /> **Without Setup task** <img width="1688" alt="Screenshot 2025-03-06 at 12 03 34 AM" src="https://github.com/user-attachments/assets/b0bb21bd-e0d1-45a4-90ea-cb4857630318" /> (cherry picked from commit b309dac)
Currently, if someone attempts to run `simple_dag_async` without
previously installing `apache-airflow-providers-google`, they will face
this very ugly error:
```
Traceback (most recent call last):
File "/Users/tati/Code/cosmos-fresh/astronomer-cosmos/venv/lib/python3.9/site-packages/airflow/models/dagbag.py", line 383, in parse
loader.exec_module(new_module)
File "<frozen importlib._bootstrap_external>", line 850, in exec_module
File "<frozen importlib._bootstrap>", line 228, in _call_with_frames_removed
File "/Users/tati/Code/cosmos-fresh/astronomer-cosmos/dags/simple_dag_async.py", line 21, in <module>
simple_dag_async = DbtDag(
File "/Users/tati/Code/cosmos-fresh/astronomer-cosmos/cosmos/airflow/dag.py", line 26, in __init__
DbtToAirflowConverter.__init__(self, *args, **specific_kwargs(**kwargs))
File "/Users/tati/Code/cosmos-fresh/astronomer-cosmos/cosmos/converter.py", line 328, in __init__
self.tasks_map = build_airflow_graph(
File "/Users/tati/Code/cosmos-fresh/astronomer-cosmos/cosmos/airflow/graph.py", line 591, in build_airflow_graph
task_or_group = conversion_function( # type: ignore
File "/Users/tati/Code/cosmos-fresh/astronomer-cosmos/cosmos/airflow/graph.py", line 379, in generate_task_or_group
task = create_airflow_task(task_meta, dag, task_group=model_task_group)
File "/Users/tati/Code/cosmos-fresh/astronomer-cosmos/cosmos/core/airflow.py", line 36, in get_airflow_task
airflow_task = Operator(
File "/Users/tati/Code/cosmos-fresh/astronomer-cosmos/venv/lib/python3.9/site-packages/airflow/models/baseoperator.py", line 506, in apply_defaults
result = func(self, **kwargs, default_args=default_args)
File "/Users/tati/Code/cosmos-fresh/astronomer-cosmos/cosmos/operators/airflow_async.py", line 75, in __init__
super().__init__(
File "/Users/tati/Code/cosmos-fresh/astronomer-cosmos/venv/lib/python3.9/site-packages/airflow/models/baseoperator.py", line 506, in apply_defaults
result = func(self, **kwargs, default_args=default_args)
File "/Users/tati/Code/cosmos-fresh/astronomer-cosmos/cosmos/operators/_asynchronous/base.py", line 50, in __init__
async_operator_class = self.create_async_operator()
File "/Users/tati/Code/cosmos-fresh/astronomer-cosmos/cosmos/operators/_asynchronous/base.py", line 69, in create_async_operator
async_class_operator = _create_async_operator_class(profile_type, "DbtRun")
File "/Users/tati/Code/cosmos-fresh/astronomer-cosmos/cosmos/operators/_asynchronous/base.py", line 34, in _create_async_operator_class
raise ImportError(f"Error in loading class: {class_path}. Unable to find the specified operator class.") from e
ImportError: Error in loading class: cosmos.operators._asynchronous.bigquery.DbtRunAirflowAsyncBigqueryOperator. Unable to find the specified operator class.
```
The goal with this ticket is to give the same error handling as other
parts of Cosmos by raising a more graceful error message.
(cherry picked from commit 09bcb55)
Users who generated the `manifest.json` using MS Windows and attempted
to use Cosmos path selectors after, such as
`path:models/edr/run_results' were unable to do so, because the paths in
Windows were different from the selector:
```
"model.elementary.model_run_results": {
"database": "FDH_DEV_DB",
"schema": "MONITORING",
"name": "model_run_results",
"resource_type": "model",
"package_name": "elementary",
"path": "edr\\run_results\\model_run_results.sql",
"original_file_path": "models\\edr\\run_results\\model_run_results.sql",
"unique_id": "model.elementary.model_run_results",
"fqn": [
"elementary",
"edr",
"run_results",
"model_run_results"
],
```
As observed in this example, the property `original_file_path` used the
`\\` character as a divider in the path, but the selector checked using
the Posix notation.
Since Cosmos implements path selectors using: path_selection in
str(node.file_path), we have to normalize the input for the filter to
work.
This issue only happened when using `LoadMode.DBT_MANIFEST` and not
`LoadMode.DBT_LS` since dbt normalizes this internally when handling
selectors as part of this command line.
(cherry picked from commit 9a1c8fe)
The log that prints 'Total filtered nodes' printed the incorrect value (the total nodes instead of the actual filtered nodes). (cherry picked from commit 674f15c)
…1602) Let's say the dbt project has a file_path "gen2/models/parent.sql" ``` parent_node = DbtNode( unique_id=f"{DbtResourceType.MODEL.value}.{SAMPLE_PROJ_PATH.stem}.parent", resource_type=DbtResourceType.MODEL, depends_on=[grandparent_node.unique_id, another_grandparent_node.unique_id], file_path=SAMPLE_PROJ_PATH / "gen2/models/parent.sql", tags=["has_child", "is_child"], config={"materialized": "view", "tags": ["has_child", "is_child"]}, ) ``` When using Cosmos 1.9.0 with `LoadMode.MANIFEST` and trying to use: ``` RenderConfig(select="gen2/models/*") ``` The selector would not return any results. It would still work with `LoadMode.DBT_LS`. The goal of this PR is to solve this issue. (cherry picked from commit 0e1f81b)
This PR introduces a new CI job named `Run-Integration-Tests-DBT-Async` to ensure compatibility of the async example DAG with multiple dbt versions. It achieves this by adding a third dimension to the `pyproject.toml` matrix, enabling the CI to run the DAG across a list of dbt versions. Additionally, this PR includes a new test file: `tests/test_async_example_dag.py`. While we already have `tests/test_example_dags.py`, certain dbt versions have shown parsing issues with some example DAGs, which can cause CI failures unrelated to the async DAG. To prevent this, the new file is a modified copy that exclusively tests `simple_async_dag`, with other DAGs ignored via `.airflowignore`. This ensures that the CI job focuses on validating the async DAG without being affected by unrelated parsing errors. closes: #1489 (cherry picked from commit 372d388)
✅ Deploy Preview for sunny-pastelito-5ecb04 canceled.
|
…nMode.LOCAL` (#1571) As of now, when we set TestBehavior.BUILD, we are not leveraging the method on_warning_callback that is available for Test nodes and Source Nodes. I have added the parsing to DbtBuildLocalOperator in order to fix it. I tested it locally and I got positive results Related: #1569 (cherry picked from commit ddea39c)
Deploying astronomer-cosmos with
|
| Latest commit: |
06bdcec
|
| Status: | ✅ Deploy successful! |
| Preview URL: | https://9219b303.astronomer-cosmos.pages.dev |
| Branch Preview URL: | https://release-1-9-1.astronomer-cosmos.pages.dev |
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #1607 +/- ##
==========================================
+ Coverage 97.30% 97.42% +0.12%
==========================================
Files 80 80
Lines 4901 4938 +37
==========================================
+ Hits 4769 4811 +42
+ Misses 132 127 -5 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Collaborator
|
Thanks a lot, @pankajkoti ! If we could cherry-pick: If we're happy with this:
|
A few tests, such as `test_configure_remote_target_path_no_remote_target`, were taking a long time when using `hatch run tests.py3.9-2.9:test-cov`. Time to run this command before these changes: 89.34s Time to run this command after these changes: 14.50s Also fix unittest that was failing locally. (cherry picked from commit d494dcd)
Since Cosmos 1.9.0, users who attempted to use:
```
DbtRunLocalOperator.partial(task_id="foo", project_dir="foo")
```
Started facing the issue:
```
File /usr/local/lib/python3.11/site-packages/airflow/models/baseoperator.py:284, in partial(operator_class, task_id, dag, task_group, start_date, end_date, owner, email, params, resources, trigger_rule, depends_on_past, ignore_first_depends_on_past, wait_for_past_depends_before_skipping, wait_for_downstream, retries, queue, pool, pool_slots, execution_timeout, max_retry_delay, retry_delay, retry_exponential_backoff, priority_weight, weight_rule, sla, map_index_template, max_active_tis_per_dag, max_active_tis_per_dagrun, on_execute_callback, on_failure_callback, on_success_callback, on_retry_callback, on_skipped_callback, run_as_user, executor, executor_config, inlets, outlets, doc, doc_md, doc_json, doc_yaml, doc_rst, task_display_name, logger_name, allow_nested_operators, **kwargs)
281 from airflow.models.dag import DagContext
282 from airflow.utils.task_group import TaskGroupContext
--> 284 validate_mapping_kwargs(operator_class, "partial", kwargs)
286 dag = dag or DagContext.get_current_dag()
287 if dag:
File /usr/local/lib/python3.11/site-packages/airflow/models/mappedoperator.py:123, in validate_mapping_kwargs(op, func, value)
121 names = ", ".join(repr(n) for n in unknown_args)
122 error = f"unexpected keyword arguments {names}"
--> 123 raise TypeError(f"{op.name}.{func}() got {error}")
TypeError: DbtRunLocalOperator.partial() got an unexpected keyword argument 'project_dir'`
```
This was introduced given the changes in how Cosmos operators subclass,
that was introduced to allow to dynamically chose which Airflow operator
is run during DAG rendering time.
Closes: #1546
To validate it, we introduced a new small dbt project and an example
DAG, and it can be tested by running:
```
airflow dags test example_task_mapping
```
Co-authored-by: Ash Berlin-Taylor <ash@astronomer.io>
(cherry picked from commit c8c148b)
Contributor
Author
Thanks @tatiana . I merged #1609 And have cherry-picked here both the mentioned PRs #1600 and #1609 |
## Description
### TL/DR
* pas `container_name` to kwargs
* use a default value for **aws_conn_id**
### Long version
The current implementation of ECS integration implies passing the
`container_name` as part of the operator_args.
e.g.
```
operator_args = {
"container_name": "main",
...
}
```
Anyhow, this lead to errors like this:
```
[2025-03-07, 16:40:34 UTC] {ecs.py:515} INFO - EcsOperator overrides: {'containerOverrides': [{'name': None, 'command': ['dbt', '--no-partial-parse', 'run', '--models', 'my_first_dbt_model'], 'environment': [{'name': 'AIRFLOW_CTX_DAG_OWNER', 'value': '***'}, {'name': 'AIRFLOW_CTX_DAG_ID', 'value': 'example_cosmos'}, {'name': 'AIRFLOW_CTX_TASK_ID', 'value': 'dbt_task_group.my_first_dbt_model.run'}, {'name': 'AIRFLOW_CTX_EXECUTION_DATE', 'value': '2025-03-07T16:40:31.716155+00:00'}, {'name': 'AIRFLOW_CTX_TRY_NUMBER', 'value': '1'}, {'name': 'AIRFLOW_CTX_DAG_RUN_ID', 'value': 'manual__2025-03-07T16:40:31.716155+00:00'}, {'name': 'EXTRA_VAR', 'value': 'extra_value'}]}]}
```
The container name is None, leading to a failure in how boto3 invokes
the container.
The issue was due to the fact that `container_name` was not passed to
the kwargs, therefore, the container_name was not set properly to the
value that was set to cosmos.
### Full logs
<pre>
2025-03-10, 12:49:41 UTC] {ecs.py:512} INFO - Running ECS Task - Task
definition: dbt - on cluster dbt
[2025-03-10, 12:49:41 UTC] {ecs.py:515} INFO - EcsOperator overrides:
{'containerOverrides': [{'name': None, 'command': ['dbt',
'--no-partial-parse', 'run', '--models', 'my_first_dbt_model'],
'environment': [{'name': 'AIRFLOW_CTX_DAG_OWNER', 'value': '***'},
{'name': 'AIRFLOW_CTX_DAG_ID', 'value': 'example_cosmos'}, {'name':
'AIRFLOW_CTX_TASK_ID', 'value':
'dbt_task_group.my_first_dbt_model.run'}, {'name':
'AIRFLOW_CTX_EXECUTION_DATE', 'value':
'2025-03-10T12:49:28.878404+00:00'}, {'name': 'AIRFLOW_CTX_TRY_NUMBER',
'value': '1'}, {'name': 'AIRFLOW_CTX_DAG_RUN_ID', 'value':
'manual__2025-03-10T12:49:28.878404+00:00'}, {'name': 'EXTRA_VAR',
'value': 'extra_value'}]}]}
[2025-03-10, 12:49:41 UTC] {base.py:84} INFO - Retrieving connection
'aws_default'
[2025-03-10, 12:49:44 UTC] {credentials.py:1147} INFO - Found
credentials in environment variables.
[2025-03-10, 12:49:44 UTC] {taskinstance.py:3313} ERROR - Task failed
with exception
Traceback (most recent call last):
File
"/home/airflow/.local/lib/python3.12/site-packages/airflow/models/taskinstance.py",
line 768, in _execute_task
result = _execute_callable(context=context, **execute_callable_kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File
"/home/airflow/.local/lib/python3.12/site-packages/airflow/models/taskinstance.py",
line 734, in _execute_callable
return ExecutionCallableRunner(
^^^^^^^^^^^^^^^^^^^^^^^^
File
"/home/airflow/.local/lib/python3.12/site-packages/airflow/utils/operator_helpers.py",
line 252, in run
return self.func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^
File
"/home/airflow/.local/lib/python3.12/site-packages/cosmos/operators/base.py",
line 278, in execute
self.build_and_run_cmd(context=context, cmd_flags=self.add_cmd_flags())
File
"/home/airflow/.local/lib/python3.12/site-packages/cosmos/operators/aws_ecs.py",
line 98, in build_and_run_cmd
result = EcsRunTaskOperator.execute(self, context)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File
"/home/airflow/.local/lib/python3.12/site-packages/airflow/models/baseoperator.py",
line 424, in wrapper
return func(self, *args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File
"/home/airflow/.local/lib/python3.12/site-packages/airflow/providers/amazon/aws/operators/ecs.py",
line 526, in execute
self._start_task()
File
"/home/airflow/.local/lib/python3.12/site-packages/airflow/providers/amazon/aws/operators/ecs.py",
line 626, in _start_task
response = self.client.run_task(**run_opts)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File
"/home/airflow/.local/lib/python3.12/site-packages/botocore/client.py",
line 569, in _api_call
return self._make_api_call(operation_name, kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File
"/home/airflow/.local/lib/python3.12/site-packages/botocore/client.py",
line 980, in _make_api_call
request_dict = self._convert_to_request_dict(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File
"/home/airflow/.local/lib/python3.12/site-packages/botocore/client.py",
line 1047, in _convert_to_request_dict
request_dict = self._serializer.serialize_to_request(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File
"/home/airflow/.local/lib/python3.12/site-packages/botocore/validate.py",
line 381, in serialize_to_request
raise ParamValidationError(report=report.generate_report())
botocore.exceptions.ParamValidationError: Parameter validation failed:
Invalid type for parameter overrides.containerOverrides[0].name, value:
None, type: <class 'NoneType'>, valid types: <class 'str'>
</pre>
## Related Issue(s)
I didn't created any issue - but I just thought to propose a fix.
## Breaking Change?
It does because user have to use `dbt_container_name` with ECS, but it's
currently broken.
## Checklist
- [ ] I have made corresponding changes to the documentation (if
required)
- [ ] I have added tests that prove my fix is effective or that my
feature works
(cherry picked from commit 483ca7c)
Contributor
Author
|
Also, cherry-picked PR #1592 cc: @pankajastro @tatiana |
tatiana
reviewed
Mar 13, 2025
Collaborator
|
We've released from this branch and I created a mergeable PR (upgrading version & changelog) #1612 |
tatiana
added a commit
that referenced
this pull request
Mar 17, 2025
Bug Fixes * Fix import error in dbt bigquery adapter mock for ``dbt-bigquery<1.8`` for ``ExecutionMode.AIRFLOW_ASYNC`` by @pankajkoti in #1548 * Fix ``operator_args`` override configuration by @ghjklw in #1558 * Fix missing ``install_dbt_deps`` in ``ProjectConfig`` ``__init__`` method by @ghjklw in #1556 * Fix dbt project parsing ``dbt_vars`` behavior passed via ``operator_args`` by @AlexandrKhabarov in #1543 * Avoid reading the connection during DAG parsing of the async BigQuery operator by @joppevos in #1582 * Fix: Workaround to incorrectly raised ``gcsfs.retry.HttpError`` (Invalid Credentials, 401) by @tatiana in #1598 * Fix the async execution mode read sql files for dbt packages by @pankajastro in #1588 * Improve BQ async error handling by @tatiana in #1597 * Fix path selector when ``manifest.json`` is created using MS Windows by @tatiana in #1601 * Fix log that prints 'Total filtered nodes' by @tatiana in #1603 * Fix select behaviour using ``LoadMode.MANIFEST`` and a path with star by @tatiana in #1602 * Support ``on_warning_callback`` with ``TestBehavior.BUILD`` and ``ExecutionMode.LOCAL`` by @corsettigyg in #1571 * Fix ``DbtRunLocalOperator.partial()`` support by @tatiana @ashb in #1609 * fix: ``container_name`` is null for ecs integration by @nicor88 in #1592 Docs * Improve MWAA getting-started docs by removing unused imports by @jx2lee in #1562 Others * Disable ``example_cosmos_dbt_build.py`` DAG in CI by @pankajastro in #1567 * Upgrade GitHub Actions Ubuntu version by @tatiana in #1561 * Update GitHub bug issue template by @pankajastro in #1586 * Enable DAG ``example_cosmos_dbt_build.py`` in CI by @pankajastro in #1573 * Run async DAG in DAG without setup/teardown task by @pankajastro in #1599 * Add test case that fully covers recent select issue by @tatiana in #1604 * Add CI job to test multiple dbt versions for the async DAG by @pankajkoti in #1535 * Improve unit tests speed from 89s to 14s by @tatiana in #1600 * Pre-commit updates: #1560, #1583, #1596 Closes: #1550 Mergeable version of #1607 Co-authored-by: Pankaj Singh <98807258+pankajastro@users.noreply.github.com> Co-authored-by: Pankaj Koti <pankajkoti699@gmail.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Bug Fixes
dbt-bigquery<1.8forExecutionMode.AIRFLOW_ASYNCby @pankajkoti in Fix import error in dbt bigquery adapter mock fordbt-bigquery<1.8forExecutionMode.AIRFLOW_ASYNC#1548operator_argsoverride configuration by @ghjklw in Bugfixoperator_argsoverride configuration #1558install_dbt_depsinProjectConfig__init__method by @ghjklw in Bugfix ProjectConfig install_dbt_deps (#1555) #1556dbt_varsbehavior passed viaoperator_argsby @AlexandrKhabarov in Fix dbt project parsingdbt_varsbehavior #1543gcsfs.retry.HttpError(Invalid Credentials, 401) by @tatiana in Fix: Workaround to incorrectly raisedgcsfs.retry.HttpError(Invalid Credentials, 401) #1598manifest.jsonis created using MS Windows by @tatiana in Fix path selector whenmanifest.jsonwas created in MS Windows #1601LoadMode.MANIFESTand a path with star by @tatiana in Fix select behaviour usingLoadMode.MANIFESTand a path with star #1602on_warning_callbackwithTestBehavior.BUILDandExecutionMode.LOCALby @corsettigyg in Supporton_warning_callbackwithTestBehavior.BUILDandExecutionMode.LOCAL#1571DbtRunLocalOperator.partial()support by @tatiana @ashb in FixDbtRunLocalOperator.partial()support #1609container_nameis null for ecs integration by @nicor88 in fix: container_name is null for ecs integration #1592Docs
Others
example_cosmos_dbt_build.pyDAG in CI by @pankajastro in Disableexample_cosmos_dbt_build.pyDAG in CI #1567example_cosmos_dbt_build.pyin CI by @pankajastro in Enable DAG example_cosmos_dbt_build.py in CI #1573Closes: #1550