Allow to push URIs as XCOM of cosmos tasks#2275
Conversation
✅ Deploy Preview for astronomer-cosmos ready!
To edit notification comments on pull requests, go to your Netlify project configuration. |
There was a problem hiding this comment.
Pull request overview
This pull request adds an opt-in feature to emit dataset URIs as XCom values from Cosmos tasks. This addresses a regression experienced when upgrading from Airflow 2.9 to 2.11, where the URI field that was previously available in XCom was no longer accessible due to the Dataset to DatasetAlias migration.
Changes:
- Added
enable_uri_xcomconfiguration setting (default: False) to control URI emission to XCom - Modified
_handle_datasetsmethod to push outlet URIs to XCom when the feature is enabled - Added comprehensive test coverage for the new feature across different scenarios
- Added documentation explaining how to enable and use the feature
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
| cosmos/settings.py | Adds new enable_uri_xcom boolean configuration setting (default: False) |
| cosmos/operators/local.py | Implements XCom push logic in _handle_datasets to emit outlet URIs when enabled |
| tests/operators/test_local.py | Adds 4 comprehensive tests covering enabled/disabled states and edge cases |
| docs/configuration/scheduling.rst | Documents the new feature with configuration examples and usage patterns |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
HI @corsettigyg thanks a lot for adding this. I'll take a look on this on Friday |
|
@tatiana no worries 🍷 take your name ! it is a very simple change overall |
tatiana
left a comment
There was a problem hiding this comment.
@corsettigyg I'm happy for us to ship this change in Cosmos 1.13.0, which we're aiming to have released on Thursday. Please address the Copilot feedback and have the checks pass; it would be great, and I'll approve, and we can go ahead and merge it.
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #2275 +/- ##
=======================================
Coverage 98.02% 98.02%
=======================================
Files 100 100
Lines 6420 6424 +4
=======================================
+ Hits 6293 6297 +4
Misses 127 127 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
|
@tatiana done 🚀 |
tatiana
left a comment
There was a problem hiding this comment.
Looks great, thanks a lot for making this a backwards-compatible feature, @corsettigyg! We can consider exposing these XComs as the default in Cosmos 2.0. Before this, though, it would be great to understand the long-term impact of storing this data in the Airflow metadata database - do you have any suggestions for us to monitor this?
|
performance-wise, I'd expect low to no impact considering how simple the xcom is. We will have it enabled on Getyourguide so I can keep track of it here, but on a broader perspective, the only concern I can think about is that some companies might not want to expose this information globally due to data privacy. |
Features * Support cross-referencing models across dbt projects using dbt-loom by @pankajkoti in #2271 * Support use of YAML selectors when using ``LoadMode.DBT_MANIFEST`` by @YourRoyalLinus in #2261 * Introduce ``ExecutionMode.WATCHER_KUBERNETES`` to use the watcher with ``KubernetesPodOperator`` by @tatiana in #2207 * Add support for StarRocks profile mapping by @kurkim0661 in #2256 * Allow pushing URIs as XComs for Cosmos tasks by @corsettigyg in #2275 * Support defining custom callbacks alongside the ``WATCHER_KUBERNETES`` callback by @johnhoran in #2307 Enhancements * Refactor: remove duplicate ``_construct_dest_file_path`` by @jx2lee in #2077 * Leverage Airflow ``::group::`` to group logs associated with DAG parsing by @tatiana in #2235 * Refactor ``DbtConsumerWatcherSensor`` for reusability by @tatiana in #2245 * Restore plain text output when using ``ExecutionMode.WATCHER`` by @tiovader in #2241 Bug Fixes * Fix running empty models or ephemeral nodes in ``ExecutionMode.WATCHER`` by @tatiana in #2279 * Improve watcher producer task priority in scheduling and the UI by @tatiana in #2237 * Fix typos and formatting issues in documentation by @pankajkoti in #2259 * Allow watcher producer retries without erroring by @tatiana in #2283 * Fix ``TestBehavior.AFTER_ALL`` is missing project_name information when loading project using manifest file by @tuantran0910 in #2242 * Fix duplicate log lines in watcher subprocess execution and format timestamps by @pankajkoti in #2301 Docs * Add Watcher Kubernetes documentation by @tatiana in #2303 * Document newly added telemetry metrics in the privacy notice by @pankajkoti in #2249 * Add compatibility policy document by @pankajastro in #2251 * Improve watcher documentation related to dbt threads by @tatiana in #2273 * Fix link in watcher execution mode documentation by @jedcunningham in #2277 * Update Apache Airflow minimum compatibility policy by @tatiana in #2285 * Clarify Cosmos runtime support until "End of Basic Support" by @jedcunningham in #2286 * Update watcher docs by @tatiana in #2298 * Update watcher kubernetes documentation by @tatiana in #2306 Others * Add Airflow 3 DAG versioning tests for Cosmos by @michal-mrazek in #2177 * Add dbt Core 1.11 to the test matrix by @tatiana in #2230 * Add integration tests using InvocationMode.SUBPROCESS and validate output by @tatiana in #2287 * Fix main branch failing tests by @tatiana in #2296 * Update pre-commit hooks to the latest versions by @jedcunningham in #2289 * Pre-commit autoupdates by @pre-commit in #2222, #2264, #2274 and #2290 * Dependabot updates by @dependabot in #2218, #2219, #2220, #2280 and #2284 * Add Scarf metrics to understand Cosmos feature usage patterns - Add telemetry tracking for dbt docs plugin usage by @pankajkoti in #2240 - Add DAG run telemetry metrics for load mode, invocation, and render_config parameters by @pankajkoti in #2223 - Collect profile metrics for DAG runs by @pankajastro in #2228 - Compress telemetry metadata to reduce serialized DAG size by @pankajkoti in #2252 - Skip storing telemetry metadata when emission is disabled by @pankajkoti in #2278 - Hide telemetry metadata parameters from the Airflow trigger UI by @pankajkoti in #2247 closes: astronomer/oss-integrations-private#317 --------- Co-authored-by: Tatiana Al-Chueyr <tatiana.alchueyr@gmail.com>
Description
At Getyourguide we were using airflow 2.9 and during the bump to airflow 2.11, we noticed that some of our auxiliary tasks that were reading the XCom and parsing the
urifield started failing. On a quick investigation, this PR explains the change that took effect from airflow 2.10 >= with the change from Dataset to DatasetAlias.We have been using the
uriproduced by the XCom field to parse the full table path, since it contains the catalog, schema and table name, greatly helping us to manage operations likeVACUUMandOPTIMIZEon those tables after the dbt tasks were finished.The idea of this PR is to re-enable the emission of the
urifield to the task XCom so users can have this pre-formatted information available in case they want to consume it downstream.I have tested it in my local Airflow and this implementation worked fine. Open for feedbacks from the community though !
Breaking Change?
Should not be since the default is
False. Only remark is that I am using the walrus operator to avoid nested ifs and it is only supported from python 3.8>=, although I believe cosmos do not use python 3.7<= anymore.Checklist