[Bug] Accidentally attaching adapters to root logger leads to increased logging costs#1047
Conversation
✅ Deploy Preview for sunny-pastelito-5ecb04 ready!
To edit notification comments on pull requests, go to your Netlify site configuration. |
|
I've provided the solution as well. Would appreciate if we could prioritize this small bugfix as this is connected to costs. Moreover, other guys might have that as well. |
|
Hi @glebkrapivin , thanks for raising the issue, identifying the problem, and proposing a solution. This is a great work! |
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #1047 +/- ##
=======================================
Coverage 95.81% 95.81%
=======================================
Files 62 62
Lines 3009 3010 +1
=======================================
+ Hits 2883 2884 +1
Misses 126 126 ☔ View full report in Codecov by Sentry. |
New Features * Speed up ``LoadMode.DBT_LS`` by caching dbt ls output in Airflow Variable by @tatiana in #1014 * Support to cache profiles created via ``ProfileMapping`` by @pankajastro in #1046 * Support for running dbt tasks in AWS EKS in #944 by @VolkerSchiewe * Add Clickhouse profile mapping by @roadan and @pankajastro in #353 and #1016 * Add node config to TaskInstance Context by @linchun3 in #1044 Bug fixes * Support partial parsing when cache is disabled by @tatiana in #1070 * Fix disk permission error in restricted env by @pankajastro in #1051 * Add CSP header to iframe contents by @dwreeves in #1055 * Stop attaching log adaptors to root logger to reduce logging costs by @glebkrapivin in #1047 Enhancements * Support ``static_index.html`` docs by @dwreeves in #999 * Support deep linking dbt docs via Airflow UI by @dwreeves in #1038 * Add ability to specify host/port for Snowflake connection by @whummer in #1063 Docs * Fix rendering for env ``enable_cache_dbt_ls`` by @pankajastro in #1069 Others * Update documentation for DbtDocs generator by @arjunanan6 in #1043 * Use uv in CI by @dwreeves in #1013 * Cache hatch folder in the CI by @tatiana in #1056 * Change example DAGs to use ``example_conn`` as opposed to ``airflow_db`` by @tatiana in #1054 * Mark plugin integration tests as integration by @tatiana in #1057 * Ensure compliance with linting rule D300 by using triple quotes for docstrings by @pankajastro in #1049 * Pre-commit hook updates in #1039, #1050, #1064 * Remove duplicates in changelog by @jedcunningham in #1068
…ed logging costs (astronomer#1047) Closes astronomer#1048 After having upgraded to `astronomer-cosmos==1.4.3` we noticed an unusual 2x increase of the cost of the logs in our cloud provider. Quick investigation showed that the logs from cosmos logger were duplicated multiple times. **Example:** ```bash [2024-06-17 12:18:48,259] {{manager.py:165}} INFO - Launched DagFileProcessorManager with pid: 656 2024-06-17T12:18:48.260107549Z [2024-06-17 12:18:48,259] {manager.py:165} INFO - (astronomer-cosmos) - Launched DagFileProcessorManager with pid: 656 2024-06-17T12:18:48.260225215Z [2024-06-17 12:18:48,259] {manager.py:165} INFO - (astronomer-cosmos) - Launched DagFileProcessorManager with pid: 656 2024-06-17T12:18:48.260345799Z [2024-06-17 12:18:48,259] {manager.py:165} INFO - (astronomer-cosmos) - Launched DagFileProcessorManager with pid: 656 ``` After reproducing the problem locally I've found the highest version that seemed to not have the problem, which was `1.3.2`. However, the problem repeated itself when `cosmos` was imported in the dag. Afterwards, I looked at the changed files and saw that airflow plugin was added in the 1.4.0 version. It explains why the issue reproduces in all the version, but is hidden at first in 1.3.2 ```toml [project.entry-points."airflow.plugins"] cosmos = "cosmos.plugin:CosmosPlugin" ``` Following this discovery i looked through the source code and saw that in some of the lines the logger was initialized though function `get_logger` without the name argument. If you call `logging.getLogger(None)`, you get a root logger. Hence, cosmos adapter got attached to root logger multiple times in these places. ```python def get_logger(name: str) -> logging.Logger: logger = logging.getLogger(None) # Now it is a root logger .... logger.addHandler(handler) # now we have added astronomer logger to root logger, which is not intented ``` ## What was done - I made the `name` argument in `get_logger` required, otherwise it would be easy to forget to add it when a new file with logger call is created. - Added `LOGGER_NAME_TEMPLATE` so that there are no collisions with the adapters attaching to wrong loggers in the future. The name is not used in the actual format of the log message, so it does not matter much. The previous tests did not catch this corner case, but when the argument is required, there is no need to check it
New Features * Speed up ``LoadMode.DBT_LS`` by caching dbt ls output in Airflow Variable by @tatiana in astronomer#1014 * Support to cache profiles created via ``ProfileMapping`` by @pankajastro in astronomer#1046 * Support for running dbt tasks in AWS EKS in astronomer#944 by @VolkerSchiewe * Add Clickhouse profile mapping by @roadan and @pankajastro in astronomer#353 and astronomer#1016 * Add node config to TaskInstance Context by @linchun3 in astronomer#1044 Bug fixes * Support partial parsing when cache is disabled by @tatiana in astronomer#1070 * Fix disk permission error in restricted env by @pankajastro in astronomer#1051 * Add CSP header to iframe contents by @dwreeves in astronomer#1055 * Stop attaching log adaptors to root logger to reduce logging costs by @glebkrapivin in astronomer#1047 Enhancements * Support ``static_index.html`` docs by @dwreeves in astronomer#999 * Support deep linking dbt docs via Airflow UI by @dwreeves in astronomer#1038 * Add ability to specify host/port for Snowflake connection by @whummer in astronomer#1063 Docs * Fix rendering for env ``enable_cache_dbt_ls`` by @pankajastro in astronomer#1069 Others * Update documentation for DbtDocs generator by @arjunanan6 in astronomer#1043 * Use uv in CI by @dwreeves in astronomer#1013 * Cache hatch folder in the CI by @tatiana in astronomer#1056 * Change example DAGs to use ``example_conn`` as opposed to ``airflow_db`` by @tatiana in astronomer#1054 * Mark plugin integration tests as integration by @tatiana in astronomer#1057 * Ensure compliance with linting rule D300 by using triple quotes for docstrings by @pankajastro in astronomer#1049 * Pre-commit hook updates in astronomer#1039, astronomer#1050, astronomer#1064 * Remove duplicates in changelog by @jedcunningham in astronomer#1068
New Features * Speed up ``LoadMode.DBT_LS`` by caching dbt ls output in Airflow Variable by @tatiana in #1014 * Support to cache profiles created via ``ProfileMapping`` by @pankajastro in #1046 * Support for running dbt tasks in AWS EKS in #944 by @VolkerSchiewe * Add Clickhouse profile mapping by @roadan and @pankajastro in #353 and #1016 * Add node config to TaskInstance Context by @linchun3 in #1044 Bug fixes * Support partial parsing when cache is disabled by @tatiana in #1070 * Fix disk permission error in restricted env by @pankajastro in #1051 * Add CSP header to iframe contents by @dwreeves in #1055 * Stop attaching log adaptors to root logger to reduce logging costs by @glebkrapivin in #1047 Enhancements * Support ``static_index.html`` docs by @dwreeves in #999 * Support deep linking dbt docs via Airflow UI by @dwreeves in #1038 * Add ability to specify host/port for Snowflake connection by @whummer in #1063 Docs * Fix rendering for env ``enable_cache_dbt_ls`` by @pankajastro in #1069 Others * Update documentation for DbtDocs generator by @arjunanan6 in #1043 * Use uv in CI by @dwreeves in #1013 * Cache hatch folder in the CI by @tatiana in #1056 * Change example DAGs to use ``example_conn`` as opposed to ``airflow_db`` by @tatiana in #1054 * Mark plugin integration tests as integration by @tatiana in #1057 * Ensure compliance with linting rule D300 by using triple quotes for docstrings by @pankajastro in #1049 * Pre-commit hook updates in #1039, #1050, #1064 * Remove duplicates in changelog by @jedcunningham in #1068 (cherry picked from commit 18d2c90)
Closes #1048
Description
After having upgraded to
astronomer-cosmos==1.4.3we noticed an unusual 2x increase of the cost of the logs in our cloud provider. Quick investigation showed that the logs from cosmos logger were duplicated multiple times.Example:
[2024-06-17 12:18:48,259] {{manager.py:165}} INFO - Launched DagFileProcessorManager with pid: 656 2024-06-17T12:18:48.260107549Z [2024-06-17 12:18:48,259] {manager.py:165} INFO - (astronomer-cosmos) - Launched DagFileProcessorManager with pid: 656 2024-06-17T12:18:48.260225215Z [2024-06-17 12:18:48,259] {manager.py:165} INFO - (astronomer-cosmos) - Launched DagFileProcessorManager with pid: 656 2024-06-17T12:18:48.260345799Z [2024-06-17 12:18:48,259] {manager.py:165} INFO - (astronomer-cosmos) - Launched DagFileProcessorManager with pid: 656After reproducing the problem locally I've found the highest version that seemed to not have the problem, which was
1.3.2. However, the problem repeated itself whencosmoswas imported in the dag. Afterwards, I looked at the changed files and saw that airflow plugin was added in the 1.4.0 version. It explains why the issue reproduces in all the version, but is hidden at first in 1.3.2Following this discovery i looked through the source code and saw that in some of the lines the logger was initialized though function
get_loggerwithout the name argument. If you calllogging.getLogger(None), you get a root logger. Hence, cosmos adapter got attached to root logger multiple times in these places.What was done
nameargument inget_loggerrequired, otherwise it would be easy to forget to add it when a new file with logger call is created.LOGGER_NAME_TEMPLATEso that there are no collisions with the adapters attaching to wrong loggers in the future. The name is not used in the actual format of the log message, so it does not matter much.The previous tests did not catch this corner case, but when the argument is required there is no need to check it
Related Issue(s)
No
Breaking Change?
No
Checklist