Support emitting Assets with Airflow 3#1713
Conversation
✅ Deploy Preview for sunny-pastelito-5ecb04 canceled.
|
6faa1fb to
5fb3d6e
Compare
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #1713 +/- ##
==========================================
+ Coverage 97.58% 97.60% +0.02%
==========================================
Files 83 83
Lines 5174 5180 +6
==========================================
+ Hits 5049 5056 +7
+ Misses 125 124 -1 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
5fb3d6e to
8b48084
Compare
8b48084 to
9ce5631
Compare
9ce5631 to
4156761
Compare
Previously, the following tests were failing:
```
FAILED tests/operators/test_local.py::test_run_operator_dataset_inlets_and_outlets_airflow_210_onwards - ModuleNotFoundError: No module named 'airflow.models.dataset'
```
Details:
```
_______ test_run_operator_dataset_inlets_and_outlets_airflow_210_onwards _______
caplog = <_pytest.logging.LogCaptureFixture object at 0x7f1234faf3a0>
@pytest.mark.skipif(
version.parse(airflow_version) < version.parse("2.10"),
reason="From Airflow 2.10 onwards, we started using DatasetAlias, which changed this behaviour.",
)
@pytest.mark.integration
def test_run_operator_dataset_inlets_and_outlets_airflow_210_onwards(caplog):
> from airflow.models.dataset import DatasetAliasModel
E ModuleNotFoundError: No module named 'airflow.models.dataset'
tests/operators/test_local.py:471: ModuleNotFoundError
```
The test `test_run_operator_dataset_url_encoded_names` will be handled
in the PR #1713
Closes: #1704
There was a problem hiding this comment.
Pull Request Overview
This PR adds support for emitting Assets (Datasets) when using Airflow 3 by updating test cases and adapting operator logic to work with the new Airflow APIs. Key changes include renaming tests to reflect Airflow version support, adding new integration tests for Airflow 3 behavior regarding AssetAlias, and modifying dataset handling (now using Assets instead of Datasets) in cosmos/operators/local.py.
Reviewed Changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.
| File | Description |
|---|---|
| tests/operators/test_local.py | Updated tests to skip or adjust for Airflow 3 behavior and to raise errors as needed. |
| cosmos/operators/local.py | Refactored dataset handling to use Assets/AssetAlias and updated openlineage handling. |
Comments suppressed due to low confidence (2)
tests/operators/test_local.py:563
- It appears the second log check is missing the 'assert' keyword, so the condition is not actually being verified. Please add 'assert' to ensure this log message is tested.
"Outlets: [Asset(name='postgres://0.0.0.0:5432/postgres/public.stg_customers', uri='postgres://0.0.0.0:5432/postgres/public.stg_customers'" in caplog.text
cosmos/operators/local.py:713
- [nitpick] Consider storing the AssetAlias instance in a local variable and reusing it (e.g. for both appending to self.outlets and as a dictionary key) to ensure consistency and avoid potential equality issues.
self.outlets.append(AssetAlias(dataset_alias_name))
pankajkoti
left a comment
There was a problem hiding this comment.
LGTM. Thanks for adding this support so quick. Have some questions in-line for my understanding and few minor suggestions if you'd like to consider.
Happy to merge the PR once CI passes.
1edf3c1 to
cfc2421
Compare
This reverts commit 37cd6c5.
Improve asset/dataset event scheduling after implementing #1713
Features * Airflow 3 support * Support running ``dbt deps`` incrementally to pre-defined ``dbt_packages`` by @tatiana in #1668 and #1670 * Add ``DuckDB`` profile mapping by @prithvijitguha and @pankajastro in #1553 * Implement DBT exposure selector by ghjklw #1717 Bug Fixes * Fix ``test_indirect_selection`` flag to be propagated in case of ``TestBehavior.BUILD`` by @corsettigyg in #1663 * Fix ``select`` clause in the case of detached tests by @anyapriya in #1680 * Operator argument fixes by @johnhoran in #1648 Airflow 3 Support * Support rendering DbtDag in Airflow 3 by @tatiana and @ashb in #1657 * Refactor Rendered Task Instance Fields (RTIF) handling for Airflow 2.x and 3.x by @pankajkoti in #1661 * Run cosmos operator in Airflow 3 by @pankajastro in #1642 * Fix ``python_virtualenv.prepare_env`` top-level import for Airflow 3 by @pankajkoti in #1678 * Fix Variable not found issue in Airflow 3 by @tatiana in #1684 * Disable CosmosPlugin on Airflow 3 setup by @pankajkoti in #1692, #1698 * Use ``schedule`` param in example DAGs instead of the 2.10 deprecated and 3.0 removed ``schedule_interval`` by @pankajkoti in #1701 * Ensure ``virtualenv_dir`` path exists by @pankajkoti in #1724 * Support emitting Assets with Airflow 3 by @tatiana in #1713 * Add docs on Airflow 3 compatibility by @pankajkoti and @tatiana in #1731 * Introduce, test and document asset/dataset breaking change by @tatiana in #1672 * Improve dataset/asset driven scheduling documentation by @tatiana in #1729 Enhancements * Allow multiple callbacks by @corsettigyg #1693 * Refactor kubernetes warning callback handling by @canbekley in #1681 Documentation * Add documentation related to ``copy_dbt_packages`` by @tatiana in #1671 * Make wording and command consistent in the contributing doc by @pankajkoti in #1697 * Add MonteCarlo callback example for importing dbt artifacts by @corsettigyg #1695 * Change async feature to be non-experimental by @tatiana in #1732 Others * Add sample ``dbt_packages`` to validate incremental ``dbt deps`` by @tatiana in #1669 * Add kubernetes execution mode example in Airflow 3 by @pankajastro in #1667 * Check only major version until Airflow 3 stable release by @pankajastro in #1665 * Install Airflow from main branch by @pankajastro in #1660 * Add dev tool for Airflow 3 by @pankajastro and @tatiana in #1627 * Improve Airflow 3 tooling by @pankajastro in #1656 * Skip associating ``openlineage_events_completes`` to ``ti`` in Airflow 3 by @pankajkoti in #1662 * Add .gitignore file for the scripts/airflow3 directory by @pankajkoti in #1658 * Remove ``original_jaffle_shop`` dbt project by @pankajkoti in #1676 * Fix or ignore type check error by @pankajastro in #1687 * Run virtualenv example with Airflow 3 tooling by @pankajastro in #1686 * Enable running setup/teardown tasks with Async execution DAG with Airflow 3 tooling by @pankajastro in #1696 * Enable integration tests for the DuckDB adapter by @pankajastro in #1699 * Add Airflow 3 tests matrix entries in CI by @pankajkoti in #1646 * Use a different way to get tasks count for asserting test_perf_dag by @pankajkoti in #1714 * Reinstall Airflow 3 dependency on ``pydantic>=2.11`` for dbt adapter versions 1.6 & 1.9 by @pankajkoti in #1715 * Fix outdated ``echo`` in Airflow 3 tooling script #1700 * Add files not needed for git tracking to .gitignore by @pankajkoti in #1723 * Use latest minor versions for dbt adapters to get in compatibility fixes by @pankajkoti in #1719 * Fix Airflow 3 tests raising generate_run_id() takes 0 positional arguments by @tatiana in #1725 * Fix dataset tests failing in Airflow 3 by @tatiana in #1716 * Enable example DAGs to run in CI that were disabled in PR #1646 by @pankajkoti in #1726 * Pre-commit updates: #1666, #1653, #1641, #1682, #1720 Co-authored-by: Pankaj Koti <pankajkoti699@gmail.com> Co-authored-by: Pankaj Singh <98807258+pankajastro@users.noreply.github.com> --------- Co-authored-by: Pankaj Koti <pankajkoti699@gmail.com>
Supports emitting Assets (Datasets) when using Cosmos with Airflow 3.
This implementation was tested in two ways:
airflow dags testWe were able to observe via the logs that we're reaching the desired code:
airflow standalone, triggering the DAGexample_operatorsmanually.We also created a DAG that consumes the dataset created by
example_operators, so we could confirm the DAG was being triggered via the generated dataset/alias:We were able to observe via the logs that we're reaching the desired code:
And we observed via Airflow UI the DAG
dataset_triggered_dagbeing triggered.Some screenshots:



During the process of implementing this feature, we identified a few limitations of Airflow 3.0.0 assets implementation, which were discussed with @uranusjr:
AssetAliasto theself.outletsof the OperatorAssetAliasis created during task execution, the UI does not display the source DAG + event + triggered DAGAssetAliasduring task initialization, these are displayed incorrectly in the UI - and we don't want to clutter the user's UI with aliasses that may not link to any actual Assets:Closes: #1635