Skip to content

Add DAG run telemetry metrics for load mode, invocation, and other render_config parameters#2223

Merged
pankajkoti merged 18 commits into
mainfrom
telemetry-dagrun-metrics
Jan 5, 2026
Merged

Add DAG run telemetry metrics for load mode, invocation, and other render_config parameters#2223
pankajkoti merged 18 commits into
mainfrom
telemetry-dagrun-metrics

Conversation

@pankajkoti
Copy link
Copy Markdown
Contributor

@pankajkoti pankajkoti commented Dec 23, 2025

Adds 9 new telemetry metrics to DAG run events to better understand how Cosmos is configured and used:

  • used_automatic_load_mode - whether the user specified LoadMode.AUTOMATIC
  • actual_load_mode - the resolved load method (dbt_ls, dbt_manifest, custom, etc.)
  • invocation_mode - subprocess or dbt_runner
  • install_deps - whether dbt deps installation is enabled
  • uses_node_converter - whether custom node converters are used
  • test_behavior - how dbt tests are handled (after_each, after_all, none, etc.)
  • source_behavior - how dbt sources are rendered
  • total_dbt_models - total number of dbt models in project
  • selected_dbt_models - number of models selected for rendering

These metrics are collected during DAG conversion and stored in dag.params to survive Airflow's serialisation process, then emitted by the DAG run listener on success/failure events.

closes: #2109

@netlify
Copy link
Copy Markdown

netlify Bot commented Dec 23, 2025

Deploy Preview for astronomer-cosmos canceled.

Name Link
🔨 Latest commit f9262d5
🔍 Latest deploy log https://app.netlify.com/projects/astronomer-cosmos/deploys/695bdff4d3d1e2000887e485

Store and emit 9 new Cosmos configuration metrics on DAG runs:
- used_automatic_load_mode: Whether LoadMode.AUTOMATIC was used
- actual_load_mode: The resolved load method (e.g., dbt_ls, dbt_manifest)
- invocation_mode: How dbt is invoked (subprocess, dbt_runner)
- install_deps: Whether dependency installation is enabled
- uses_node_converter: Whether custom node converters are used
- test_behavior: Test rendering behavior (after_each, none, etc.)
- source_behavior: Source rendering behavior (all, none, etc.)
- total_dbt_models: Total number of dbt models in the project
- selected_dbt_models: Number of models selected after filtering

Implementation:
- Added _store_cosmos_telemetry_metadata_on_dag() in converter to store metadata on DAG object
- Added get_cosmos_telemetry_metadata() helper in dag_run_listener to extract metadata
- Updated on_dag_run_success and on_dag_run_failed hooks to include metadata in telemetry
- Added comprehensive tests for both success and failure scenarios
- Fixed existing test that was affected by additional debug logging
The metadata was stored as a custom attribute (_cosmos_telemetry_metadata) which is not preserved during Airflow DAG serialization. When the dag_run_listener receives the DAG, it gets a SerializedDAG where custom attributes are lost, resulting in an empty metadata dictionary.

Solution: Store metadata in dag.params which is serialized by Airflow and accessible in the listener. Using key __cosmos_telemetry_metadata__ to avoid conflicts with user-defined params.

Changes:

- Store metadata in dag.params[__cosmos_telemetry_metadata__] in converter

- Retrieve from dag.params.get(__cosmos_telemetry_metadata__, {}) in listener

- Updated docstrings to reflect the new storage mechanism
@pankajkoti pankajkoti changed the title Telemetry dagrun metrics Add DAG run telemetry metrics for load mode, invocation, and other render_config parameters Dec 24, 2025
@codecov
Copy link
Copy Markdown

codecov Bot commented Dec 24, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 97.99%. Comparing base (0fa0163) to head (f9262d5).
⚠️ Report is 2 commits behind head on main.

Additional details and impacted files
@@           Coverage Diff           @@
##             main    #2223   +/-   ##
=======================================
  Coverage   97.98%   97.99%           
=======================================
  Files          95       95           
  Lines        6197     6222   +25     
=======================================
+ Hits         6072     6097   +25     
  Misses        125      125           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Covers exception handling in _store_cosmos_telemetry_metadata_on_dag
to ensure graceful degradation when metrics computation fails. Tests
verify that 8 out of 9 exception handlers work correctly for actual_load_mode,
invocation_mode, install_deps, uses_node_converter, test_behavior,
source_behavior, total_dbt_models, and selected_dbt_models.
Comment thread cosmos/converter.py Outdated
Comment thread cosmos/converter.py Outdated
Comment thread cosmos/converter.py Outdated
Comment thread cosmos/converter.py Outdated
Comment thread cosmos/converter.py Outdated
Comment thread cosmos/converter.py Outdated
Comment thread cosmos/converter.py Outdated
Copy link
Copy Markdown
Collaborator

@tatiana tatiana left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working on this, @pankajkoti. I'm glad the DAG params are working!

I left some feedback inline.

Please confirm which versions of Airflow you tested it against, and whether you were able to confirm the events reaching Scarf?

Comment thread tests/listeners/test_dag_run_listener.py
Comment thread tests/listeners/test_dag_run_listener.py
Comment thread tests/listeners/test_dag_run_listener.py
Copy link
Copy Markdown
Collaborator

@tatiana tatiana left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot, @pankajkoti , I left some minor feedback in line, but since you already tested and confirmed this worked with Airflow 2 and 3, I'm approving it - feel free to merge once the feedback is addressed.

The operator_args parameter in _store_cosmos_telemetry_metadata_on_dag was never used in the method body. All telemetry metadata is collected from render_config and project_config which are already passed as separate parameters.

Changes:
- Remove operator_args parameter from method signature and docstring
- Update method call site to remove the argument
- Remove unused mock_operator_args from test
@pankajkoti pankajkoti force-pushed the telemetry-dagrun-metrics branch from 57841a3 to f9262d5 Compare January 5, 2026 15:59
@pankajkoti pankajkoti merged commit 1085a3c into main Jan 5, 2026
90 checks passed
@pankajkoti pankajkoti deleted the telemetry-dagrun-metrics branch January 5, 2026 16:31
pankajkoti added a commit that referenced this pull request Jan 6, 2026
Add documentation for DAG run telemetry metrics (load mode, invocation mode, dbt deps, node converters, test/source behavior, model counts) and dbt docs plugin metrics (storage type, docs configuration, custom connections, custom project name).

These metrics were added in PR #2223 and PR #2240 but were not reflected in the privacy documentation.
pankajkoti added a commit that referenced this pull request Jan 6, 2026
Add PRIVACY NOTICE documentation for DAG run telemetry metrics (load
mode, invocation mode, dbt deps, node converters, test/source behavior,
model counts) and dbt docs plugin metrics (storage type, docs
configuration, custom connections, custom project name).

These metrics were added in PR #2223, PR #2228, and PR #2240, but were
not reflected in the privacy documentation.

closes: #2248
pankajkoti added a commit that referenced this pull request Jan 7, 2026
Implement gzip compression + base64 encoding for telemetry metadata
stored in dag.params. This reduces the size of serialized DAGs in
Airflow's database.

Changes:
- Add _compress_telemetry_metadata() and
_decompress_telemetry_metadata() to cosmos/telemetry.py
- Update converter to compress metadata before storing in dag.params
- Update dag_run_listener to decompress metadata when reading
- Catch specific exceptions during decompression (binascii.Error,
gzip.BadGzipFile, json.JSONDecodeError, EOFError)
- Add size comparison logging
- Update tests to verify compression

related: #2223 
closes:
astronomer/oss-integrations-private#300

---------

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
tatiana pushed a commit that referenced this pull request Jan 23, 2026
The _store_dag_telemetry_metadata method in DbtToAirflowConverter was
performing unnecessary processing even when telemetry was disabled. This
included compressing metadata and storing it in DAG params, which adds
overhead without benefit when telemetry collection is turned off.

The PR adds an early return check in `_store_dag_telemetry_metadata` to
skip all processing when `should_emit()` returns False. This ensures
telemetry-related operations only occur when telemetry is actually
enabled.

related: #2223
related: #2109
@pankajastro pankajastro mentioned this pull request Jan 29, 2026
tatiana added a commit that referenced this pull request Jan 30, 2026
Features

* Support cross-referencing models across dbt projects using dbt-loom by
@pankajkoti in #2271
* Support use of YAML selectors when using ``LoadMode.DBT_MANIFEST`` by
@YourRoyalLinus in #2261
* Introduce ``ExecutionMode.WATCHER_KUBERNETES`` to use the watcher with
``KubernetesPodOperator`` by @tatiana in #2207
* Add support for StarRocks profile mapping by @kurkim0661 in #2256
* Allow pushing URIs as XComs for Cosmos tasks by @corsettigyg in #2275
* Support defining custom callbacks alongside the ``WATCHER_KUBERNETES``
callback by @johnhoran in #2307

Enhancements

* Refactor: remove duplicate ``_construct_dest_file_path`` by @jx2lee in
#2077
* Leverage Airflow ``::group::`` to group logs associated with DAG
parsing by @tatiana in #2235
* Refactor ``DbtConsumerWatcherSensor`` for reusability by @tatiana in
#2245
* Restore plain text output when using ``ExecutionMode.WATCHER`` by
@tiovader in #2241

Bug Fixes

* Fix running empty models or ephemeral nodes in
``ExecutionMode.WATCHER`` by @tatiana in #2279
* Improve watcher producer task priority in scheduling and the UI by
@tatiana in #2237
* Fix typos and formatting issues in documentation by @pankajkoti in
#2259
* Allow watcher producer retries without erroring by @tatiana in #2283
* Fix ``TestBehavior.AFTER_ALL`` is missing project_name information
when loading project using manifest file by @tuantran0910 in #2242
* Fix duplicate log lines in watcher subprocess execution and format
timestamps by @pankajkoti in #2301

Docs

* Add Watcher Kubernetes documentation by @tatiana in #2303
* Document newly added telemetry metrics in the privacy notice by
@pankajkoti in #2249
* Add compatibility policy document by @pankajastro in #2251
* Improve watcher documentation related to dbt threads by @tatiana in
#2273
* Fix link in watcher execution mode documentation by @jedcunningham in
#2277
* Update Apache Airflow minimum compatibility policy by @tatiana in
#2285
* Clarify Cosmos runtime support until "End of Basic Support" by
@jedcunningham in #2286
* Update watcher docs by @tatiana in #2298
* Update watcher kubernetes documentation by @tatiana in #2306

Others

* Add Airflow 3 DAG versioning tests for Cosmos by @michal-mrazek in
#2177
* Add dbt Core 1.11 to the test matrix by @tatiana in #2230
* Add integration tests using InvocationMode.SUBPROCESS and validate
output by @tatiana in #2287
* Fix main branch failing tests by @tatiana in #2296
* Update pre-commit hooks to the latest versions by @jedcunningham in
#2289
* Pre-commit autoupdates by @pre-commit in #2222, #2264, #2274 and #2290
* Dependabot updates by @dependabot in #2218, #2219, #2220, #2280 and
#2284
* Add Scarf metrics to understand Cosmos feature usage patterns
- Add telemetry tracking for dbt docs plugin usage by @pankajkoti in
#2240
- Add DAG run telemetry metrics for load mode, invocation, and
render_config parameters by @pankajkoti in #2223
  - Collect profile metrics for DAG runs by @pankajastro in #2228
- Compress telemetry metadata to reduce serialized DAG size by
@pankajkoti in #2252
- Skip storing telemetry metadata when emission is disabled by
@pankajkoti in #2278
- Hide telemetry metadata parameters from the Airflow trigger UI by
@pankajkoti in #2247

closes:
astronomer/oss-integrations-private#317

---------

Co-authored-by: Tatiana Al-Chueyr <tatiana.alchueyr@gmail.com>
@tatiana tatiana added this to the Cosmos 1.13.0 milestone Feb 19, 2026
pankajkoti added a commit that referenced this pull request Mar 18, 2026
…2466)

It's observed that when something changes in the DAG or the dbt project,
the value that we set for the `__cosmos_telemetry_metadata__` changes,
and Airflow complains with ParamValidationError when the value is
updated for a Param with `const` attribute. We're therefore removing the
const attribute from the param to avoid such an error, which is fatal,
and when it occurs does not allow dags/task runs to progress.

By removing the `const` attribute, the earlier intended internal param
now appears in the DAG trigger form. We have added a helper text
suggesting users not to edit the value since Airflow does not allow us
to set read-only params from the Cosmos code.
<img width="992" height="520" alt="Screenshot 2026-03-16 at 7 17 14 PM"
src="https://github.com/user-attachments/assets/ff4b7e5a-d67e-4a66-8811-141474302b64"
/>


In case users edit these values, we have a fail-safe handler in place
already that takes care when decompressing fails for the param:
https://github.com/astronomer/astronomer-cosmos/blob/61ff9d17015ba7d9d0eee9578682fb8a582d88ba/cosmos/listeners/dag_run_listener.py#L78

closes: #2421 
related: #2223 
related: #2247
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Track render config via Scarf

3 participants