Skip to content

[Bug] Accidentally attaching adapters to root logger leads to increased logging costs#1047

Merged
tatiana merged 4 commits into
astronomer:mainfrom
glebkrapivin:logging-duplicates
Jun 18, 2024
Merged

[Bug] Accidentally attaching adapters to root logger leads to increased logging costs#1047
tatiana merged 4 commits into
astronomer:mainfrom
glebkrapivin:logging-duplicates

Conversation

@glebkrapivin
Copy link
Copy Markdown
Contributor

@glebkrapivin glebkrapivin commented Jun 17, 2024

Closes #1048

Description

After having upgraded to astronomer-cosmos==1.4.3 we noticed an unusual 2x increase of the cost of the logs in our cloud provider. Quick investigation showed that the logs from cosmos logger were duplicated multiple times.
Example:

[2024-06-17 12:18:48,259] {{manager.py:165}} INFO - Launched DagFileProcessorManager with pid: 656
2024-06-17T12:18:48.260107549Z [2024-06-17 12:18:48,259] {manager.py:165} INFO - (astronomer-cosmos) - Launched DagFileProcessorManager with pid: 656
2024-06-17T12:18:48.260225215Z [2024-06-17 12:18:48,259] {manager.py:165} INFO - (astronomer-cosmos) - Launched DagFileProcessorManager with pid: 656
2024-06-17T12:18:48.260345799Z [2024-06-17 12:18:48,259] {manager.py:165} INFO - (astronomer-cosmos) - Launched DagFileProcessorManager with pid: 656

After reproducing the problem locally I've found the highest version that seemed to not have the problem, which was 1.3.2. However, the problem repeated itself when cosmos was imported in the dag. Afterwards, I looked at the changed files and saw that airflow plugin was added in the 1.4.0 version. It explains why the issue reproduces in all the version, but is hidden at first in 1.3.2

[project.entry-points."airflow.plugins"]
cosmos = "cosmos.plugin:CosmosPlugin"

Following this discovery i looked through the source code and saw that in some of the lines the logger was initialized though function get_logger without the name argument. If you call logging.getLogger(None), you get a root logger. Hence, cosmos adapter got attached to root logger multiple times in these places.

def get_logger(name: str) -> logging.Logger:
    logger = logging.getLogger(None) # Now it is a root logger
    ....
    logger.addHandler(handler) # now we have added astronomer logger to root logger, which is not intented

What was done

  • I made the name argument in get_logger required, otherwise it would be easy to forget to add it when a new file with logger call is created.
  • Added LOGGER_NAME_TEMPLATE so that there are no collisions with the adapters attaching to wrong loggers in the future. The name is not used in the actual format of the log message, so it does not matter much.

The previous tests did not catch this corner case, but when the argument is required there is no need to check it

Related Issue(s)

No

Breaking Change?

No

Checklist

  • I have made corresponding changes to the documentation (if required)
  • I have added tests that prove my fix is effective or that my feature works

@dosubot dosubot Bot added the size:S This PR changes 10-29 lines, ignoring generated files. label Jun 17, 2024
@netlify
Copy link
Copy Markdown

netlify Bot commented Jun 17, 2024

Deploy Preview for sunny-pastelito-5ecb04 ready!

Name Link
🔨 Latest commit 3311c80
🔍 Latest deploy log https://app.netlify.com/sites/sunny-pastelito-5ecb04/deploys/66716af95eec5f00089ec388
😎 Deploy Preview https://deploy-preview-1047--sunny-pastelito-5ecb04.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

@dosubot dosubot Bot added the area:logging Related to logging, like log levels, log formats, error logging, etc label Jun 17, 2024
@glebkrapivin
Copy link
Copy Markdown
Contributor Author

@tatiana

I've provided the solution as well. Would appreciate if we could prioritize this small bugfix as this is connected to costs. Moreover, other guys might have that as well.

@tatiana
Copy link
Copy Markdown
Collaborator

tatiana commented Jun 17, 2024

Hi @glebkrapivin , thanks for raising the issue, identifying the problem, and proposing a solution. This is a great work!
Would there be a way for us to add some tests to make sure we avoid this same problem in the future? Perhaps if you could add a unit test on get_logger without a name and check that it raises an exception, it would be better than no tests.

@tatiana tatiana added this to the Cosmos 1.5.0 milestone Jun 17, 2024
Comment thread cosmos/log.py
@codecov
Copy link
Copy Markdown

codecov Bot commented Jun 17, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 95.81%. Comparing base (e4f9ece) to head (3311c80).
Report is 2 commits behind head on main.

Additional details and impacted files
@@           Coverage Diff           @@
##             main    #1047   +/-   ##
=======================================
  Coverage   95.81%   95.81%           
=======================================
  Files          62       62           
  Lines        3009     3010    +1     
=======================================
+ Hits         2883     2884    +1     
  Misses        126      126           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@dosubot dosubot Bot added the lgtm This PR has been approved by a maintainer label Jun 18, 2024
@tatiana tatiana merged commit 9420404 into astronomer:main Jun 18, 2024
@dwreeves dwreeves mentioned this pull request Jun 20, 2024
@pankajkoti pankajkoti mentioned this pull request Jun 27, 2024
tatiana pushed a commit that referenced this pull request Jun 27, 2024
New Features

* Speed up ``LoadMode.DBT_LS`` by caching dbt ls output in Airflow
Variable by @tatiana in #1014
* Support to cache profiles created via ``ProfileMapping`` by
@pankajastro in #1046
* Support for running dbt tasks in AWS EKS in #944 by @VolkerSchiewe
* Add Clickhouse profile mapping by @roadan and @pankajastro in #353 and
#1016
* Add node config to TaskInstance Context by @linchun3 in #1044

Bug fixes

* Support partial parsing when cache is disabled by @tatiana in #1070
* Fix disk permission error in restricted env by @pankajastro in #1051
* Add CSP header to iframe contents by @dwreeves in #1055
* Stop attaching log adaptors to root logger to reduce logging costs by
@glebkrapivin in #1047

Enhancements

* Support ``static_index.html`` docs by @dwreeves in #999
* Support deep linking dbt docs via Airflow UI by @dwreeves in #1038
* Add ability to specify host/port for Snowflake connection by @whummer
in #1063

Docs

* Fix rendering for env ``enable_cache_dbt_ls`` by @pankajastro in #1069

Others

* Update documentation for DbtDocs generator by @arjunanan6 in #1043
* Use uv in CI by @dwreeves in #1013
* Cache hatch folder in the CI by @tatiana in #1056
* Change example DAGs to use ``example_conn`` as opposed to
``airflow_db`` by @tatiana in #1054
* Mark plugin integration tests as integration by @tatiana in #1057
* Ensure compliance with linting rule D300 by using triple quotes for
docstrings by @pankajastro in #1049
* Pre-commit hook updates in #1039, #1050, #1064
* Remove duplicates in changelog by @jedcunningham in #1068
arojasb3 pushed a commit to arojasb3/astronomer-cosmos that referenced this pull request Jul 14, 2024
…ed logging costs (astronomer#1047)

Closes astronomer#1048 

After having upgraded to `astronomer-cosmos==1.4.3` we noticed an
unusual 2x increase of the cost of the logs in our cloud provider. Quick
investigation showed that the logs from cosmos logger were duplicated
multiple times.
**Example:**
```bash 
[2024-06-17 12:18:48,259] {{manager.py:165}} INFO - Launched DagFileProcessorManager with pid: 656
2024-06-17T12:18:48.260107549Z [2024-06-17 12:18:48,259] {manager.py:165} INFO - (astronomer-cosmos) - Launched DagFileProcessorManager with pid: 656
2024-06-17T12:18:48.260225215Z [2024-06-17 12:18:48,259] {manager.py:165} INFO - (astronomer-cosmos) - Launched DagFileProcessorManager with pid: 656
2024-06-17T12:18:48.260345799Z [2024-06-17 12:18:48,259] {manager.py:165} INFO - (astronomer-cosmos) - Launched DagFileProcessorManager with pid: 656
```

After reproducing the problem locally I've found the highest version
that seemed to not have the problem, which was `1.3.2`. However, the
problem repeated itself when `cosmos` was imported in the dag.
Afterwards, I looked at the changed files and saw that airflow plugin
was added in the 1.4.0 version. It explains why the issue reproduces in
all the version, but is hidden at first in 1.3.2
```toml
[project.entry-points."airflow.plugins"]
cosmos = "cosmos.plugin:CosmosPlugin"
```

Following this discovery i looked through the source code and saw that
in some of the lines the logger was initialized though function
`get_logger` without the name argument. If you call
`logging.getLogger(None)`, you get a root logger. Hence, cosmos adapter
got attached to root logger multiple times in these places.

```python 
def get_logger(name: str) -> logging.Logger:
    logger = logging.getLogger(None) # Now it is a root logger
    ....
    logger.addHandler(handler) # now we have added astronomer logger to root logger, which is not intented
```

## What was done

- I made the `name` argument in `get_logger` required, otherwise it
would be easy to forget to add it when a new file with logger call is
created.
- Added `LOGGER_NAME_TEMPLATE` so that there are no collisions with the
adapters attaching to wrong loggers in the future. The name is not used
in the actual format of the log message, so it does not matter much.

The previous tests did not catch this corner case, but when the argument
is required, there is no need to check it
arojasb3 pushed a commit to arojasb3/astronomer-cosmos that referenced this pull request Jul 14, 2024
New Features

* Speed up ``LoadMode.DBT_LS`` by caching dbt ls output in Airflow
Variable by @tatiana in astronomer#1014
* Support to cache profiles created via ``ProfileMapping`` by
@pankajastro in astronomer#1046
* Support for running dbt tasks in AWS EKS in astronomer#944 by @VolkerSchiewe
* Add Clickhouse profile mapping by @roadan and @pankajastro in astronomer#353 and
astronomer#1016
* Add node config to TaskInstance Context by @linchun3 in astronomer#1044

Bug fixes

* Support partial parsing when cache is disabled by @tatiana in astronomer#1070
* Fix disk permission error in restricted env by @pankajastro in astronomer#1051
* Add CSP header to iframe contents by @dwreeves in astronomer#1055
* Stop attaching log adaptors to root logger to reduce logging costs by
@glebkrapivin in astronomer#1047

Enhancements

* Support ``static_index.html`` docs by @dwreeves in astronomer#999
* Support deep linking dbt docs via Airflow UI by @dwreeves in astronomer#1038
* Add ability to specify host/port for Snowflake connection by @whummer
in astronomer#1063

Docs

* Fix rendering for env ``enable_cache_dbt_ls`` by @pankajastro in astronomer#1069

Others

* Update documentation for DbtDocs generator by @arjunanan6 in astronomer#1043
* Use uv in CI by @dwreeves in astronomer#1013
* Cache hatch folder in the CI by @tatiana in astronomer#1056
* Change example DAGs to use ``example_conn`` as opposed to
``airflow_db`` by @tatiana in astronomer#1054
* Mark plugin integration tests as integration by @tatiana in astronomer#1057
* Ensure compliance with linting rule D300 by using triple quotes for
docstrings by @pankajastro in astronomer#1049
* Pre-commit hook updates in astronomer#1039, astronomer#1050, astronomer#1064
* Remove duplicates in changelog by @jedcunningham in astronomer#1068
tatiana pushed a commit that referenced this pull request Jul 17, 2024
New Features

* Speed up ``LoadMode.DBT_LS`` by caching dbt ls output in Airflow
Variable by @tatiana in #1014
* Support to cache profiles created via ``ProfileMapping`` by
@pankajastro in #1046
* Support for running dbt tasks in AWS EKS in #944 by @VolkerSchiewe
* Add Clickhouse profile mapping by @roadan and @pankajastro in #353 and
#1016
* Add node config to TaskInstance Context by @linchun3 in #1044

Bug fixes

* Support partial parsing when cache is disabled by @tatiana in #1070
* Fix disk permission error in restricted env by @pankajastro in #1051
* Add CSP header to iframe contents by @dwreeves in #1055
* Stop attaching log adaptors to root logger to reduce logging costs by
@glebkrapivin in #1047

Enhancements

* Support ``static_index.html`` docs by @dwreeves in #999
* Support deep linking dbt docs via Airflow UI by @dwreeves in #1038
* Add ability to specify host/port for Snowflake connection by @whummer
in #1063

Docs

* Fix rendering for env ``enable_cache_dbt_ls`` by @pankajastro in #1069

Others

* Update documentation for DbtDocs generator by @arjunanan6 in #1043
* Use uv in CI by @dwreeves in #1013
* Cache hatch folder in the CI by @tatiana in #1056
* Change example DAGs to use ``example_conn`` as opposed to
``airflow_db`` by @tatiana in #1054
* Mark plugin integration tests as integration by @tatiana in #1057
* Ensure compliance with linting rule D300 by using triple quotes for
docstrings by @pankajastro in #1049
* Pre-commit hook updates in #1039, #1050, #1064
* Remove duplicates in changelog by @jedcunningham in #1068

(cherry picked from commit 18d2c90)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area:logging Related to logging, like log levels, log formats, error logging, etc lgtm This PR has been approved by a maintainer size:S This PR changes 10-29 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug] Accidentally attaching adapters to root logger leads to increased logging costs

2 participants