Unify operator logging on self.log#2681
Conversation
PR #1108 established the convention that operator code uses self.log (Airflow's LoggingMixin) so messages land in the per-task-instance log shown in the Airflow UI, while library code uses cosmos.log.get_logger for branding and scoped log levels. PR #1079 deviated from that convention in three call sites because pytest's caplog could not capture self.log at the time; #1157 was filed to track cleaning up the workaround. Current pytest/Airflow does capture self.log -- the neighbouring assertions in tests/operators/test_virtualenv.py already match self.log calls in the same file. Convert the three holdover sites to self.log so that: - aws_ecs.py and gcp_cloud_run_job.py: the ECS/Cloud Run task result is emitted into the same task log as the "Running command" line above it instead of disappearing to worker stdout. - virtualenv.py: the "Waiting for virtualenv lock" message routes to the task log alongside the other lock messages in the same method. With the conversion the module-level `logger = get_logger(__name__)` and its import become dead in all three files; drop them. test_run_command_with_virtualenv_dir, which asserts caplog.text.count("Waiting for virtualenv lock to be released") == 2, still passes after the change, confirming caplog captures self.log here. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
This PR aligns remaining operator logging call sites with the established convention of using Airflow task logging via self.log, so messages appear in task instance logs rather than module-level Cosmos logs.
Changes:
- Removed unused
get_loggerimports and module-level loggers from affected operators. - Replaced three remaining module-level
logger.info(...)calls withself.log.info(...).
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated no comments.
| File | Description |
|---|---|
cosmos/operators/virtualenv.py |
Routes virtualenv lock wait logging through the operator task logger. |
cosmos/operators/gcp_cloud_run_job.py |
Logs Cloud Run execution results through the operator task logger. |
cosmos/operators/aws_ecs.py |
Logs ECS task execution results through the operator task logger. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #2681 +/- ##
==========================================
+ Coverage 98.03% 98.07% +0.03%
==========================================
Files 105 105
Lines 7843 7837 -6
==========================================
- Hits 7689 7686 -3
+ Misses 154 151 -3 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
pankajkoti
left a comment
There was a problem hiding this comment.
Do we feel confident enough about the changes in aws_ecs.py and gcp_cloud_run_job.py? I guess we don't have dedicated tests for those, right? Happy to merge if you feel confident/were able to test it.
|
Confident, yes. No log-asserting tests like the virtualenv change has, but |
The Logging section told contributors to "get loggers via `cosmos.log.get_logger`" unconditionally, which would lead someone working in an operator to add a module-level `get_logger` — the exact inconsistency #1157 set out to fix. Scope that guidance to library/module-level code and state the operator rule: inside operators and hooks (anything with `LoggingMixin`), log via `self.log` so messages land in the per-task-instance log shown in the Airflow UI. Examples updated to show both cases. This locks in the outcome of the logging cleanup (#2670, #2680, #2681) so it doesn't silently regress.
PR #1108 established the convention that operator code uses self.log (Airflow's LoggingMixin) so messages land in the per-task-instance log shown in the Airflow UI, while library code uses cosmos.log.get_logger for branding and scoped log levels. PR #1079 deviated from that convention in three call sites because pytest's caplog could not capture self.log at the time; #1157 was filed to track cleaning up the workaround.
Current pytest/Airflow does capture self.log -- the neighbouring assertions in tests/operators/test_virtualenv.py already match self.log calls in the same file. Convert the three holdover sites to self.log so that:
With the conversion the module-level
logger = get_logger(__name__)and its import become dead in all three files; drop them.test_run_command_with_virtualenv_dir, which asserts caplog.text.count("Waiting for virtualenv lock to be released") == 2, still passes after the change, confirming caplog captures self.log here.
closes #1157