Skip to content

Simplify logging#1108

Merged
tatiana merged 7 commits into
astronomer:mainfrom
dwreeves:simplify-logging
Aug 15, 2024
Merged

Simplify logging#1108
tatiana merged 7 commits into
astronomer:mainfrom
dwreeves:simplify-logging

Conversation

@dwreeves
Copy link
Copy Markdown
Collaborator

@dwreeves dwreeves commented Jul 21, 2024

Description

This PR addresses #906 and fixes issues in the Cosmos logging once and for all.*

* Actually, there is another issue with logs being polluting with warnings about "source" nodes in 1.5.0, but that is a separate matter!

I have a long explanation of how the logging module in Python works, and the sort of idioms it expects of end users of the module, here: #906 (comment)

The choices I made, explained:

  • Although I don't know that I entirely agree with adding (astronomer-cosmos) to all the logs, clearly at least one user, and possibly many more, want it, and I don't believe we should remove it. The objective of this PR was therefore to preserve the feature while future-proofing against future issues.
    • Why I can't say I'm a fan of it: It seems that adding (astronomer-cosmos) to logs is a symptom of other problems with the Cosmos library, specifically how it impacts performance when users do not set it up effectively. And the prefix was added as a way to assist people in diagnosing these issues. I think ultimately we want to move away from this. Other components of the Airflow ecosystem do not feel compelled to do things like this. Also, the module path is something that can be handled in the log_format if users really want it.
    • How I future-proofed: As per the long post I link above, basically the issue is that there should not be tons of StreamHandlers being created. The proper and typical use of the logging module, with few exceptions, is to allow for logs to propagate upwards to a root logger. The reason the Cosmos logs presented issues for so long was because it deviated a lot from this.
  • I think default behavior being the "least astonishing" means making no modifications to the base logging behavior whatsoever. This is also less likely to morph into future issues if any further changes are made to the custom logging.
    • One thing I never mentioned: I found it odd that by default Cosmos did not "work out of the box" and that, despite using Astronomer's own Airflow platform (!), I had to set a config option that made Cosmos logging not be a nightmare (i.e. set propagate_logs = false). Previous logs referenced the Celery Executor as having issues, even though this is one of 2 of the most popular production ways to run Airflow. Something like this should just work out of the box for a majority of users!
  • For task execution, Cosmos should make use of the more Airflow-idiomatic LoggingMixin class whenever appropriate. This can also be used in scheduler / webserver related logging contexts but I think it is less out-of-place there to use globally scoped loggers.
    • These will not use the get_logger() implementation. That is intentional and probably desirable. These logs do not need to be "enriched" because they are isolated in the task execution logs.

Oh also, I fixed an issue in the project.entry_points in the pyproject.toml while I was at it.

Breaking Change?

  • Removes propagate_logging conf option, although removing this will not break users' builds. There is now a rich_logging conf option instead, which by default is disabled.

Checklist

  • I have made corresponding changes to the documentation (if required)
  • I have added tests that prove my fix is effective or that my feature works

@dosubot dosubot Bot added the size:M This PR changes 30-99 lines, ignoring generated files. label Jul 21, 2024
@netlify
Copy link
Copy Markdown

netlify Bot commented Jul 21, 2024

Deploy Preview for sunny-pastelito-5ecb04 canceled.

Name Link
🔨 Latest commit 3f8ddff
🔍 Latest deploy log https://app.netlify.com/sites/sunny-pastelito-5ecb04/deploys/66be0e25e57ed30008e9b6be

@dosubot dosubot Bot added area:logging Related to logging, like log levels, log formats, error logging, etc size:L This PR changes 100-499 lines, ignoring generated files. and removed size:M This PR changes 30-99 lines, ignoring generated files. labels Jul 21, 2024
dwreeves and others added 3 commits August 4, 2024 17:23
…#1109 (astronomer#1132)

Making an update to astronomer#1109, which introduced module-level imports of
optional dependencies. This is inappropriate as it will break if the
user does not have them installed, and indeed the user really does not
need them installed if they are not relying on them directly.

This PR lazy-loads the imports so that it does not impact users who do
not need them.

In the upath library, `az:`, `adl:`, `abfs:` and `abfss:` are also all valid schemes, 
albeit Airflow only references the latter 3 in the code: https://github.com/apache/airflow/blob/e3824eaaba7eada9a807f7a2f9f89d977a210e15/airflow/providers/microsoft/azure/fs/adls.py#L29, so `adl:`, `abfs:` and `abfss:` also have been added
to the list of schemes supported.
@codecov
Copy link
Copy Markdown

codecov Bot commented Aug 5, 2024

Codecov Report

Attention: Patch coverage is 89.58333% with 5 lines in your changes missing coverage. Please review.

Project coverage is 96.36%. Comparing base (f7354e5) to head (3f8ddff).
Report is 1 commits behind head on main.

Files Patch % Lines
cosmos/operators/local.py 77.27% 5 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1108      +/-   ##
==========================================
- Coverage   96.37%   96.36%   -0.01%     
==========================================
  Files          64       64              
  Lines        3452     3443       -9     
==========================================
- Hits         3327     3318       -9     
  Misses        125      125              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@dwreeves
Copy link
Copy Markdown
Collaborator Author

dwreeves commented Aug 14, 2024

So what's going on with the status of this PR is: I swapped the logger.info() for self.log.info() (same for error, warning etc.) inside the operators.

It's failing due to coverage for upload_to_cloud_storage(), which was previously not covered, but due to the fact I change how the logging works, it is now being marked as a coverage miss.

@tatiana I am wondering if we can make an exception for the coverage on that? I don't really want to have to write a unit test for code I don't even use right now 😭

The PR otherwise works, is documented, and has unit tests. So I think it should be fine and I removed the "Draft" from it.

@dwreeves dwreeves changed the title [Draft] Simplify logging Simplify logging Aug 14, 2024
@dosubot dosubot Bot added the lgtm This PR has been approved by a maintainer label Aug 15, 2024
@tatiana tatiana merged commit 89f5999 into astronomer:main Aug 15, 2024
@tatiana tatiana mentioned this pull request Aug 16, 2024
@pankajkoti pankajkoti mentioned this pull request Aug 16, 2024
pankajkoti added a commit that referenced this pull request Aug 20, 2024
New Features

* Add support for loading manifest from cloud stores using Airflow
Object Storage by @pankajkoti in #1109
* Cache ``package-lock.yml`` file by @pankajastro in #1086
* Support persisting the ``LoadMode.VIRTUALENV`` directory by @tatiana
in #1079
* Add support to store and fetch ``dbt ls`` cache in remote stores by
@pankajkoti in #1147
* Add default source nodes rendering by @arojasb3 in #1107
* Add Teradata ``ProfileMapping`` by @sc250072 in #1077

Enhancements

* Add ``DatabricksOauthProfileMapping`` profile by @CorsettiS in #1091
* Use ``dbt ls`` as the default parser when ``profile_config`` is
provided by @pankajastro in #1101
* Add task owner to dbt operators by @wornjs in #1082
* Extend Cosmos custom selector to support + when using paths and tags
by @mvictoria in #1150
* Simplify logging by @dwreeves in #1108

Bug fixes

* Fix Teradata ``ProfileMapping`` target invalid issue by @sc250072 in
#1088
* Fix empty tag in case of custom parser by @pankajastro in #1100
* Fix ``dbt deps`` of ``LoadMode.DBT_LS`` should use
``ProjectConfig.dbt_vars`` by @tatiana in #1114
* Fix import handling by lazy loading hooks introduced in PR #1109 by
@dwreeves in #1132
* Fix Airflow 2.10 regression and add Airflow 2.10 in test matrix by
@pankajastro in #1162

Docs

* Fix typo in azure-container-instance docs by @pankajastro in #1106
* Use Airflow trademark as it has been registered by @pankajastro in
#1105

Others

* Run some example DAGs in Kubernetes execution mode in CI by
@pankajastro in #1127
* Install requirements.txt by default during dev env spin up by
@@CorsettiS in #1099
* Remove ``DbtGraph.current_version`` dead code by @tatiana in #1111
* Disable test for Airflow-2.5 and Python-3.11 combination in CI by
@pankajastro in #1124
* Pre-commit hook updates in #1074, #1113, #1125, #1144, #1154,  #1167

---------

Co-authored-by: Pankaj Koti <pankajkoti699@gmail.com>
Co-authored-by: Pankaj Singh <98807258+pankajastro@users.noreply.github.com>
pankajastro added a commit that referenced this pull request May 20, 2026
## Summary

`cosmos/provider_info.py` currently claims that the `propagate_logs`
config option was added in `1.3.0a1` and deprecated in `1.6.0a1`.
Both references pre-date the changes they describe by weeks; switching
them to the GA versions (`1.3.0` and `1.6.0`) makes the metadata
accurate and matches the version style every other Airflow provider
uses in `provider.yaml`.

## Evidence from the release history

| Claim in `provider_info.py` | What actually happened |
|---|---|
| `"version_added": "1.3.0a1"` | `1.3.0a1` tagged 2023-10-27. The commit
that introduced `propagate_logs` (`d063b5ed`, PR #648) merged on
2023-11-09 — **two weeks later**. First alpha containing it: `1.3.0a2`.
First GA: `1.3.0` (2024-01-04). |
| `"version_deprecated": "1.6.0a1"` | `1.6.0a1` tagged 2024-07-05. The
"Simplify logging" change (`89f5999e`, PR #1108) that made the option
redundant merged 2024-08-15 — **six weeks later**. First alpha with the
deprecation: `1.6.0a7`. First GA: `1.6.0` (both on 2024-08-20). |

The existing `deprecation_reason` string already refers to `Cosmos
1.6.0` (no alpha suffix), so the file's own narrative already assumes
the GA boundary — this PR just makes the structured fields agree with
that.

## Diff

```diff
-                        "version_added": "1.3.0a1",
-                        "version_deprecated": "1.6.0a1",
+                        "version_added": "1.3.0",
+                        "version_deprecated": "1.6.0",
```

## Test plan

- [x] `hatch -e tests.py3.11-2.10-1.9 run pre-commit run --files
cosmos/provider_info.py` — all hooks Passed (including `mypy-python`)
- [x] Diff is exactly the two string literals; no other changes
- [ ] CI green

## Breaking change?

No. The `version_added` / `version_deprecated` fields are documentation
metadata surfaced through Airflow's `airflow config list` and the
provider listing — they don't affect runtime behavior. The option
itself remains deprecated either way.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area:logging Related to logging, like log levels, log formats, error logging, etc lgtm This PR has been approved by a maintainer size:L This PR changes 100-499 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants