Collect profile metrics for dagrun#2228
Conversation
Store and emit 9 new Cosmos configuration metrics on DAG runs: - used_automatic_load_mode: Whether LoadMode.AUTOMATIC was used - actual_load_mode: The resolved load method (e.g., dbt_ls, dbt_manifest) - invocation_mode: How dbt is invoked (subprocess, dbt_runner) - install_deps: Whether dependency installation is enabled - uses_node_converter: Whether custom node converters are used - test_behavior: Test rendering behavior (after_each, none, etc.) - source_behavior: Source rendering behavior (all, none, etc.) - total_dbt_models: Total number of dbt models in the project - selected_dbt_models: Number of models selected after filtering Implementation: - Added _store_cosmos_telemetry_metadata_on_dag() in converter to store metadata on DAG object - Added get_cosmos_telemetry_metadata() helper in dag_run_listener to extract metadata - Updated on_dag_run_success and on_dag_run_failed hooks to include metadata in telemetry - Added comprehensive tests for both success and failure scenarios - Fixed existing test that was affected by additional debug logging
The metadata was stored as a custom attribute (_cosmos_telemetry_metadata) which is not preserved during Airflow DAG serialization. When the dag_run_listener receives the DAG, it gets a SerializedDAG where custom attributes are lost, resulting in an empty metadata dictionary.
Solution: Store metadata in dag.params which is serialized by Airflow and accessible in the listener. Using key __cosmos_telemetry_metadata__ to avoid conflicts with user-defined params.
Changes:
- Store metadata in dag.params[__cosmos_telemetry_metadata__] in converter
- Retrieve from dag.params.get(__cosmos_telemetry_metadata__, {}) in listener
- Updated docstrings to reflect the new storage mechanism
Covers exception handling in _store_cosmos_telemetry_metadata_on_dag to ensure graceful degradation when metrics computation fails. Tests verify that 8 out of 9 exception handlers work correctly for actual_load_mode, invocation_mode, install_deps, uses_node_converter, test_behavior, source_behavior, total_dbt_models, and selected_dbt_models.
✅ Deploy Preview for astronomer-cosmos canceled.
|
57841a3 to
f9262d5
Compare
There was a problem hiding this comment.
Pull request overview
This PR adds profile metrics collection for DAG runs to enable better telemetry about how Cosmos is configured with different dbt profile strategies. The changes extract profile-related configuration into reusable helper functions and ensure these metrics are stored in DAG metadata for consumption by DAG run listeners.
- Refactored profile metrics extraction into a shared
_get_profile_config_attributehelper function - Updated
DbtToAirflowConverterto store profile metrics (strategy, mapping class, database) in DAG telemetry metadata - Consolidated test profile configurations to use a shared
sample_profile_configfixture, reducing code duplication
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
| cosmos/listeners/task_instance_listener.py | Refactored profile metrics extraction into _get_profile_config_attribute helper function that can be reused across modules |
| cosmos/converter.py | Added profile_config parameter to telemetry metadata storage method and imported the shared helper function to collect profile metrics |
| tests/test_converter.py | Introduced shared sample_profile_config at module level and replaced inline ProfileConfig definitions and MagicMock instances; added assertions for new profile metrics |
| tests/listeners/test_dag_run_listener.py | Added test assertions to verify profile_strategy, profile_mapping_class, and database metrics are included in telemetry |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #2228 +/- ##
=======================================
Coverage 97.97% 97.97%
=======================================
Files 97 97
Lines 6266 6273 +7
=======================================
+ Hits 6139 6146 +7
Misses 127 127 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Add PRIVACY NOTICE documentation for DAG run telemetry metrics (load mode, invocation mode, dbt deps, node converters, test/source behavior, model counts) and dbt docs plugin metrics (storage type, docs configuration, custom connections, custom project name). These metrics were added in PR #2223, PR #2228, and PR #2240, but were not reflected in the privacy documentation. closes: #2248
pankajkoti
left a comment
There was a problem hiding this comment.
LGTM.
Happy to merge it if the emission has been tested with a DAG run.
Features * Support cross-referencing models across dbt projects using dbt-loom by @pankajkoti in #2271 * Support use of YAML selectors when using ``LoadMode.DBT_MANIFEST`` by @YourRoyalLinus in #2261 * Introduce ``ExecutionMode.WATCHER_KUBERNETES`` to use the watcher with ``KubernetesPodOperator`` by @tatiana in #2207 * Add support for StarRocks profile mapping by @kurkim0661 in #2256 * Allow pushing URIs as XComs for Cosmos tasks by @corsettigyg in #2275 * Support defining custom callbacks alongside the ``WATCHER_KUBERNETES`` callback by @johnhoran in #2307 Enhancements * Refactor: remove duplicate ``_construct_dest_file_path`` by @jx2lee in #2077 * Leverage Airflow ``::group::`` to group logs associated with DAG parsing by @tatiana in #2235 * Refactor ``DbtConsumerWatcherSensor`` for reusability by @tatiana in #2245 * Restore plain text output when using ``ExecutionMode.WATCHER`` by @tiovader in #2241 Bug Fixes * Fix running empty models or ephemeral nodes in ``ExecutionMode.WATCHER`` by @tatiana in #2279 * Improve watcher producer task priority in scheduling and the UI by @tatiana in #2237 * Fix typos and formatting issues in documentation by @pankajkoti in #2259 * Allow watcher producer retries without erroring by @tatiana in #2283 * Fix ``TestBehavior.AFTER_ALL`` is missing project_name information when loading project using manifest file by @tuantran0910 in #2242 * Fix duplicate log lines in watcher subprocess execution and format timestamps by @pankajkoti in #2301 Docs * Add Watcher Kubernetes documentation by @tatiana in #2303 * Document newly added telemetry metrics in the privacy notice by @pankajkoti in #2249 * Add compatibility policy document by @pankajastro in #2251 * Improve watcher documentation related to dbt threads by @tatiana in #2273 * Fix link in watcher execution mode documentation by @jedcunningham in #2277 * Update Apache Airflow minimum compatibility policy by @tatiana in #2285 * Clarify Cosmos runtime support until "End of Basic Support" by @jedcunningham in #2286 * Update watcher docs by @tatiana in #2298 * Update watcher kubernetes documentation by @tatiana in #2306 Others * Add Airflow 3 DAG versioning tests for Cosmos by @michal-mrazek in #2177 * Add dbt Core 1.11 to the test matrix by @tatiana in #2230 * Add integration tests using InvocationMode.SUBPROCESS and validate output by @tatiana in #2287 * Fix main branch failing tests by @tatiana in #2296 * Update pre-commit hooks to the latest versions by @jedcunningham in #2289 * Pre-commit autoupdates by @pre-commit in #2222, #2264, #2274 and #2290 * Dependabot updates by @dependabot in #2218, #2219, #2220, #2280 and #2284 * Add Scarf metrics to understand Cosmos feature usage patterns - Add telemetry tracking for dbt docs plugin usage by @pankajkoti in #2240 - Add DAG run telemetry metrics for load mode, invocation, and render_config parameters by @pankajkoti in #2223 - Collect profile metrics for DAG runs by @pankajastro in #2228 - Compress telemetry metadata to reduce serialized DAG size by @pankajkoti in #2252 - Skip storing telemetry metadata when emission is disabled by @pankajkoti in #2278 - Hide telemetry metadata parameters from the Airflow trigger UI by @pankajkoti in #2247 closes: astronomer/oss-integrations-private#317 --------- Co-authored-by: Tatiana Al-Chueyr <tatiana.alchueyr@gmail.com>
closes: #2216
This PR adds profile metrics collection for DAG runs to enable better telemetry about how Cosmos is configured with different dbt profile strategies. The changes extract profile-related configuration into reusable helper functions and ensure these metrics are stored in DAG metadata for consumption by DAG run listeners.
_get_profile_config_attributehelper functionDbtToAirflowConverterto store profile metrics (strategy, mapping class, database) in DAG telemetry metadata