From c3d6b12f6a0bc4e81f3e75cf9dbd40b960eedf0f Mon Sep 17 00:00:00 2001 From: Tatiana Al-Chueyr Date: Wed, 30 Apr 2025 14:14:34 +0100 Subject: [PATCH 1/7] Improve dataset scheduling documentation --- docs/configuration/scheduling.rst | 10 ++++++---- 1 file changed, 6 insertions(+), 4 deletions(-) diff --git a/docs/configuration/scheduling.rst b/docs/configuration/scheduling.rst index 67ed1a10cb..1b3df5cf2e 100644 --- a/docs/configuration/scheduling.rst +++ b/docs/configuration/scheduling.rst @@ -159,15 +159,15 @@ References about the root cause of these issues: Airflow 2.10.0 and 2.10.1 _________________________ -If using Cosmos with Airflow 2.10.0 or 2.10.1, the two issues previously described are resolved, since Cosmos uses ``DatasetAlias`` -to support the dynamic creation of datasets during task execution. However, users may face ``sqlalchemy.orm.exc.FlushError`` +If using Cosmos using a version of Airflow higher than 2.10.0, the two issues previously described are resolved, since Cosmos uses ``DatasetAlias`` +to support the dynamic creation of datasets during task execution. However, if users are using 2.10.0-2.10.4, they may face ``sqlalchemy.orm.exc.FlushError`` errors if they attempt to run Cosmos-powered DAGs using ``airflow dags test`` with these versions. -We've reported this issue and it will be resolved in future versions of Airflow: +We've reported this issue and it has been resolved in the latest 2.10 release: - https://github.com/apache/airflow/issues/42495 -For users to overcome this limitation in local tests, until the Airflow community solves this, we introduced the configuration +For users to overcome this limitation in local tests, we introduced the configuration ``AIRFLOW__COSMOS__ENABLE_DATASET_ALIAS``, that is ``True`` by default. If users want to run ``dags test` and not see ``sqlalchemy.orm.exc.FlushError``, they can set this configuration to ``False``. It can also be set in the ``airflow.cfg`` file: @@ -175,3 +175,5 @@ they can set this configuration to ``False``. It can also be set in the ``airflo [cosmos] enable_dataset_alias = False + +Starting in Airflow 3, Cosmos users no longer allowed to set ``AIRFLOW__COSMOS__ENABLE_DATASET_ALIAS`` to ``True``. From 04fbd5ae16de9f958506fc9547f33198dc02a097 Mon Sep 17 00:00:00 2001 From: Tatiana Al-Chueyr Date: Wed, 30 Apr 2025 14:21:23 +0100 Subject: [PATCH 2/7] Improve docs --- docs/configuration/scheduling.rst | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/docs/configuration/scheduling.rst b/docs/configuration/scheduling.rst index 1b3df5cf2e..b8c1d5c43b 100644 --- a/docs/configuration/scheduling.rst +++ b/docs/configuration/scheduling.rst @@ -160,14 +160,14 @@ Airflow 2.10.0 and 2.10.1 _________________________ If using Cosmos using a version of Airflow higher than 2.10.0, the two issues previously described are resolved, since Cosmos uses ``DatasetAlias`` -to support the dynamic creation of datasets during task execution. However, if users are using 2.10.0-2.10.4, they may face ``sqlalchemy.orm.exc.FlushError`` +to support the dynamic creation of datasets during task execution. However, some Airflow 2.10.x users reported ``sqlalchemy.orm.exc.FlushError`` errors if they attempt to run Cosmos-powered DAGs using ``airflow dags test`` with these versions. -We've reported this issue and it has been resolved in the latest 2.10 release: +We've reported this issue and it seems to no longer happen: - https://github.com/apache/airflow/issues/42495 -For users to overcome this limitation in local tests, we introduced the configuration +For users who still face this limitation in local tests, we introduced the configuration ``AIRFLOW__COSMOS__ENABLE_DATASET_ALIAS``, that is ``True`` by default. If users want to run ``dags test` and not see ``sqlalchemy.orm.exc.FlushError``, they can set this configuration to ``False``. It can also be set in the ``airflow.cfg`` file: From a1f99a594af0fda596fc95d4923c342318a3cfb6 Mon Sep 17 00:00:00 2001 From: Tatiana Al-Chueyr Date: Wed, 30 Apr 2025 15:56:39 +0100 Subject: [PATCH 3/7] Update docs/configuration/scheduling.rst Co-authored-by: Pankaj Koti --- docs/configuration/scheduling.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/configuration/scheduling.rst b/docs/configuration/scheduling.rst index b8c1d5c43b..cfcbe7115f 100644 --- a/docs/configuration/scheduling.rst +++ b/docs/configuration/scheduling.rst @@ -163,7 +163,7 @@ If using Cosmos using a version of Airflow higher than 2.10.0, the two issues pr to support the dynamic creation of datasets during task execution. However, some Airflow 2.10.x users reported ``sqlalchemy.orm.exc.FlushError`` errors if they attempt to run Cosmos-powered DAGs using ``airflow dags test`` with these versions. -We've reported this issue and it seems to no longer happen: +We had reported this issue and it seems to no longer happens: - https://github.com/apache/airflow/issues/42495 From 3e95e3a0e931100e5470b8ae8bd3b41b20556570 Mon Sep 17 00:00:00 2001 From: Tatiana Al-Chueyr Date: Thu, 1 May 2025 11:27:13 +0100 Subject: [PATCH 4/7] Update scheduling.rst --- docs/configuration/scheduling.rst | 30 ++++++++++++++++++++++++++---- 1 file changed, 26 insertions(+), 4 deletions(-) diff --git a/docs/configuration/scheduling.rst b/docs/configuration/scheduling.rst index cfcbe7115f..d9fc0e687b 100644 --- a/docs/configuration/scheduling.rst +++ b/docs/configuration/scheduling.rst @@ -132,6 +132,28 @@ From Cosmos 1.7 and Airflow 2.10, it is also possible to trigger DAGs be to be r Known Limitations ................. +Airflow 3.0 and beyond +______________________ + +Airflow Asset (Dataset) URIs validation rules changed in Airflow 3.0.0 and OpenLineage URIs (standard used by Cosmos) are no longer valid in Airflow 3. + +Therefore, if using Cosmos with Airflow 3, the Airflow Dataset URIs will be changed to use backslashes instead of dots to separate the schema and table name. + +Example of Airflow 2 Cosmos Dataset URI: +- postgres://0.0.0.0:5434/postgres.public.orders + +Example of Airflow 3 Cosmos Asset URI: +- postgres://0.0.0.0:5434/postgres/public/orders + + +If you want to use the Airflow 3 URI standard while still using Airflow 2, please set: + +``` +export AIRFLOW__COSMOS__USE_DATASET_AIRFLOW3_URI_STANDARD=1 +``` + +Remember to update any DAGs that are scheduled using this dataset. + Airflow 2.9 and below _____________________ @@ -159,15 +181,15 @@ References about the root cause of these issues: Airflow 2.10.0 and 2.10.1 _________________________ -If using Cosmos using a version of Airflow higher than 2.10.0, the two issues previously described are resolved, since Cosmos uses ``DatasetAlias`` -to support the dynamic creation of datasets during task execution. However, some Airflow 2.10.x users reported ``sqlalchemy.orm.exc.FlushError`` +If using Cosmos with Airflow 2.10.0 or 2.10.1, the two issues previously described are resolved, since Cosmos uses ``DatasetAlias`` +to support the dynamic creation of datasets during task execution. However, users may face ``sqlalchemy.orm.exc.FlushError`` errors if they attempt to run Cosmos-powered DAGs using ``airflow dags test`` with these versions. -We had reported this issue and it seems to no longer happens: +We've reported this issue and it will be resolved in future versions of Airflow: - https://github.com/apache/airflow/issues/42495 -For users who still face this limitation in local tests, we introduced the configuration +For users to overcome this limitation in local tests, until the Airflow community solves this, we introduced the configuration ``AIRFLOW__COSMOS__ENABLE_DATASET_ALIAS``, that is ``True`` by default. If users want to run ``dags test` and not see ``sqlalchemy.orm.exc.FlushError``, they can set this configuration to ``False``. It can also be set in the ``airflow.cfg`` file: From e6aa9c6a461ba4537da6c9024273632ca5e2c6e0 Mon Sep 17 00:00:00 2001 From: "pre-commit-ci[bot]" <66853113+pre-commit-ci[bot]@users.noreply.github.com> Date: Thu, 1 May 2025 10:28:23 +0000 Subject: [PATCH 5/7] =?UTF-8?q?=F0=9F=8E=A8=20[pre-commit.ci]=20Auto=20for?= =?UTF-8?q?mat=20from=20pre-commit.com=20hooks?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit --- docs/configuration/scheduling.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/configuration/scheduling.rst b/docs/configuration/scheduling.rst index d9fc0e687b..da6ad9122b 100644 --- a/docs/configuration/scheduling.rst +++ b/docs/configuration/scheduling.rst @@ -189,7 +189,7 @@ We've reported this issue and it will be resolved in future versions of Airflow: - https://github.com/apache/airflow/issues/42495 -For users to overcome this limitation in local tests, until the Airflow community solves this, we introduced the configuration +For users to overcome this limitation in local tests, until the Airflow community solves this, we introduced the configuration ``AIRFLOW__COSMOS__ENABLE_DATASET_ALIAS``, that is ``True`` by default. If users want to run ``dags test` and not see ``sqlalchemy.orm.exc.FlushError``, they can set this configuration to ``False``. It can also be set in the ``airflow.cfg`` file: From 86b803ba969a4a8bd041956de355ee8217012878 Mon Sep 17 00:00:00 2001 From: Tatiana Al-Chueyr Date: Thu, 1 May 2025 11:32:01 +0100 Subject: [PATCH 6/7] Add docs for AIRFLOW__COSMOS__USE_DATASET_AIRFLOW3_URI_STANDARD --- docs/configuration/cosmos-conf.rst | 6 ++++++ 1 file changed, 6 insertions(+) diff --git a/docs/configuration/cosmos-conf.rst b/docs/configuration/cosmos-conf.rst index b055813f83..a3dcff0e02 100644 --- a/docs/configuration/cosmos-conf.rst +++ b/docs/configuration/cosmos-conf.rst @@ -178,6 +178,12 @@ This page lists all available Airflow configurations that affect ``astronomer-co - Default: ``True`` - Environment Variable: ``AIRFLOW__COSMOS__ENABLE_TEARDOWN_ASYNC_TASK`` +`use_dataset_airflow3_uri_standard`_: + (Introduced in Cosmos 1.10.0): Changes Cosmos Dataset (Asset) URIs to be Airflow 3 compliant. Since this would be a breaking change, it is False by default in Cosmos 1.x. + - Default: ``False`` + - Environment Variable: ``AIRFLOW__COSMOS__USE_DATASET_AIRFLOW3_URI_STANDARD`` + + [openlineage] ~~~~~~~~~~~~~ From 700d60c7149c6a075514c3b63224fa501cfba1ff Mon Sep 17 00:00:00 2001 From: Tatiana Al-Chueyr Date: Thu, 1 May 2025 11:41:44 +0100 Subject: [PATCH 7/7] Add breaking change to AF3 compatibility --- docs/airflow3_compatibility/index.rst | 26 ++++++++++++++++++++++++++ 1 file changed, 26 insertions(+) diff --git a/docs/airflow3_compatibility/index.rst b/docs/airflow3_compatibility/index.rst index f0e5d0e0af..5668e92939 100644 --- a/docs/airflow3_compatibility/index.rst +++ b/docs/airflow3_compatibility/index.rst @@ -4,6 +4,32 @@ Airflow 3 Compatibility (First Iteration) The Cosmos 1.10.0 release marks the **first iteration** of adding compatibility for `Apache Airflow® 3 `_ This is an important milestone as we work towards ensuring that Cosmos seamlessly integrates with the latest advancements in the Airflow ecosystem. +Breaking changes +---------------- + +Airflow Asset (Dataset) URIs validation rules changed in Airflow 3.0.0 and OpenLineage URIs (standard used by Cosmos) are no longer valid in Airflow 3. + +Therefore, if using Cosmos with Airflow 3, the Airflow Dataset URIs will be changed to use backslashes instead of dots to separate the schema and table name. + +Example of Airflow 2 Cosmos Dataset URI: + +- postgres://0.0.0.0:5434/postgres.public.orders + +Example of Airflow 3 Cosmos Asset URI: + +- postgres://0.0.0.0:5434/postgres/public/orders + + +If you want to use the Airflow 3 URI standard while still using Airflow 2, please set: + +``` +export AIRFLOW__COSMOS__USE_DATASET_AIRFLOW3_URI_STANDARD=1 +``` + +.. warning:: + Remember to update any DAGs that are triggered using Cosmos-generated datasets or aliases to the new URI format. + + What Works ----------