Add telemetry tracking for dbt docs plugin usage#2240
Conversation
- Track dbt docs access via Scarf telemetry - Capture metrics: storage_type, is_configured, uses_custom_conn - Add _get_storage_type helper method to detect storage backend - Add comprehensive tests for telemetry emission - Fixes #2111
Move telemetry emission from dbt_docs_view to dbt_docs_index endpoint because Airflow 3 navigation menu links directly to dbt_docs_index.html, bypassing the wrapper view. Update tests to match actual user access path.
- Adjust plugin tests to expect 404 when docs are not configured - Add missing index.html file in local storage test - Fix telemetry tests to check for log level presence instead of startswith to accommodate Airflow 3.1 logging format changes
✅ Deploy Preview for astronomer-cosmos canceled.
|
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #2240 +/- ##
=======================================
Coverage 97.98% 97.99%
=======================================
Files 95 96 +1
Lines 6197 6220 +23
=======================================
+ Hits 6072 6095 +23
Misses 125 125 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Remove duplicate `_get_storage_type` implementations from both Airflow 2 and Airflow 3 plugins and consolidate into a single `get_storage_type_from_path()` utility function in `cosmos/plugin/storage.py`. This eliminates code duplication and makes the function reusable across plugins. Updated corresponding tests to use the new utility function directly.
There was a problem hiding this comment.
Pull request overview
This PR adds telemetry tracking for the dbt docs plugin to understand how users access and configure the documentation viewer across both Airflow 2 and 3.
- Adds telemetry emission when users access dbt docs, tracking storage type, configuration status, and custom settings
- Introduces a new utility function to detect storage backend type from file paths
- Updates test assertions to use more flexible string matching for log level verification
Reviewed changes
Copilot reviewed 6 out of 6 changed files in this pull request and generated no comments.
Show a summary per file
| File | Description |
|---|---|
| cosmos/plugin/storage.py | New utility module for detecting storage backend type from file paths |
| cosmos/plugin/airflow2.py | Adds telemetry emission when dbt docs are accessed in Airflow 2 |
| cosmos/plugin/airflow3.py | Adds telemetry emission when dbt docs are accessed in Airflow 3 |
| tests/plugin/test_plugin_af2.py | Adds tests for telemetry emission in Airflow 2 plugin |
| tests/plugin/test_plugin_af3.py | Adds tests for telemetry emission in Airflow 3 plugin and storage type detection |
| tests/test_telemetry.py | Updates test assertions to use substring matching instead of prefix matching for log levels |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
tatiana
left a comment
There was a problem hiding this comment.
I liked the approach, @pankajkoti ! It will be exciting to see how people are using this feature.
Please, could you create a separate PR updating our privacy policy, including boht the change in this PR (#2240) and also the changes introduced in #2228
Add documentation for DAG run telemetry metrics (load mode, invocation mode, dbt deps, node converters, test/source behavior, model counts) and dbt docs plugin metrics (storage type, docs configuration, custom connections, custom project name). These metrics were added in PR #2223 and PR #2240 but were not reflected in the privacy documentation.
Add PRIVACY NOTICE documentation for DAG run telemetry metrics (load mode, invocation mode, dbt deps, node converters, test/source behavior, model counts) and dbt docs plugin metrics (storage type, docs configuration, custom connections, custom project name). These metrics were added in PR #2223, PR #2228, and PR #2240, but were not reflected in the privacy documentation. closes: #2248
Features * Support cross-referencing models across dbt projects using dbt-loom by @pankajkoti in #2271 * Support use of YAML selectors when using ``LoadMode.DBT_MANIFEST`` by @YourRoyalLinus in #2261 * Introduce ``ExecutionMode.WATCHER_KUBERNETES`` to use the watcher with ``KubernetesPodOperator`` by @tatiana in #2207 * Add support for StarRocks profile mapping by @kurkim0661 in #2256 * Allow pushing URIs as XComs for Cosmos tasks by @corsettigyg in #2275 * Support defining custom callbacks alongside the ``WATCHER_KUBERNETES`` callback by @johnhoran in #2307 Enhancements * Refactor: remove duplicate ``_construct_dest_file_path`` by @jx2lee in #2077 * Leverage Airflow ``::group::`` to group logs associated with DAG parsing by @tatiana in #2235 * Refactor ``DbtConsumerWatcherSensor`` for reusability by @tatiana in #2245 * Restore plain text output when using ``ExecutionMode.WATCHER`` by @tiovader in #2241 Bug Fixes * Fix running empty models or ephemeral nodes in ``ExecutionMode.WATCHER`` by @tatiana in #2279 * Improve watcher producer task priority in scheduling and the UI by @tatiana in #2237 * Fix typos and formatting issues in documentation by @pankajkoti in #2259 * Allow watcher producer retries without erroring by @tatiana in #2283 * Fix ``TestBehavior.AFTER_ALL`` is missing project_name information when loading project using manifest file by @tuantran0910 in #2242 * Fix duplicate log lines in watcher subprocess execution and format timestamps by @pankajkoti in #2301 Docs * Add Watcher Kubernetes documentation by @tatiana in #2303 * Document newly added telemetry metrics in the privacy notice by @pankajkoti in #2249 * Add compatibility policy document by @pankajastro in #2251 * Improve watcher documentation related to dbt threads by @tatiana in #2273 * Fix link in watcher execution mode documentation by @jedcunningham in #2277 * Update Apache Airflow minimum compatibility policy by @tatiana in #2285 * Clarify Cosmos runtime support until "End of Basic Support" by @jedcunningham in #2286 * Update watcher docs by @tatiana in #2298 * Update watcher kubernetes documentation by @tatiana in #2306 Others * Add Airflow 3 DAG versioning tests for Cosmos by @michal-mrazek in #2177 * Add dbt Core 1.11 to the test matrix by @tatiana in #2230 * Add integration tests using InvocationMode.SUBPROCESS and validate output by @tatiana in #2287 * Fix main branch failing tests by @tatiana in #2296 * Update pre-commit hooks to the latest versions by @jedcunningham in #2289 * Pre-commit autoupdates by @pre-commit in #2222, #2264, #2274 and #2290 * Dependabot updates by @dependabot in #2218, #2219, #2220, #2280 and #2284 * Add Scarf metrics to understand Cosmos feature usage patterns - Add telemetry tracking for dbt docs plugin usage by @pankajkoti in #2240 - Add DAG run telemetry metrics for load mode, invocation, and render_config parameters by @pankajkoti in #2223 - Collect profile metrics for DAG runs by @pankajastro in #2228 - Compress telemetry metadata to reduce serialized DAG size by @pankajkoti in #2252 - Skip storing telemetry metadata when emission is disabled by @pankajkoti in #2278 - Hide telemetry metadata parameters from the Airflow trigger UI by @pankajkoti in #2247 closes: astronomer/oss-integrations-private#317 --------- Co-authored-by: Tatiana Al-Chueyr <tatiana.alchueyr@gmail.com>
Adds telemetry emission to track dbt docs plugin usage via Scarf, capturing how users access and configure the dbt documentation viewer.
Metrics Tracked
storage_type: Backend storage type (s3, gcs, azure, http, local, or not_configured)dbt_docs_configured: Whether the docs directory is configureduses_custom_conn: Whether a custom connection ID is usedhas_custom_name: Whether a custom project name is set (Airflow 3 only)I have tested that the events are getting emitted to Scarf for both Airflow 2 and Airflow 3 plugins
closes: #2111