-
Notifications
You must be signed in to change notification settings - Fork 17.2k
KPO async logging callback #60778
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
KPO async logging callback #60778
Changes from all commits
b6a4ef4
22744c0
c3110ad
593a2f6
413c2ec
326ba80
e955453
82cd175
488d75e
26028d5
69d98fc
4f7efc8
66a2f63
36cbfc8
33107bd
2a1656b
32afc4f
8d4a6c5
1771384
f78339c
690b396
f3cf176
a7b4a06
3faeb21
c9e2d29
9c4f672
462e8e7
1b45bb5
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -29,6 +29,7 @@ | |
| from datetime import timedelta | ||
| from typing import TYPE_CHECKING, Literal, cast | ||
|
|
||
| import kubernetes_asyncio.client as async_k8s | ||
| import pendulum | ||
| from kubernetes import client, watch | ||
| from kubernetes.client.rest import ApiException | ||
|
|
@@ -680,6 +681,8 @@ def fetch_requested_container_logs( | |
| follow_logs=False, | ||
| container_name_log_prefix_enabled: bool = True, | ||
| log_formatter: Callable[[str, str], str] | None = None, | ||
| since_time: DateTime | None = None, | ||
| post_termination_timeout: int = 120, | ||
| ) -> list[PodLoggingStatus]: | ||
| """ | ||
| Follow the logs of containers in the specified pod and publish it to airflow logging. | ||
|
|
@@ -702,6 +705,8 @@ def fetch_requested_container_logs( | |
| follow=follow_logs, | ||
| container_name_log_prefix_enabled=container_name_log_prefix_enabled, | ||
| log_formatter=log_formatter, | ||
| since_time=since_time, | ||
| post_termination_timeout=post_termination_timeout, | ||
| ) | ||
| pod_logging_statuses.append(status) | ||
| return pod_logging_statuses | ||
|
|
@@ -1110,31 +1115,57 @@ async def fetch_container_logs_before_current_sec( | |
| since_seconds=(math.ceil((now - since_time).total_seconds()) if since_time else None), | ||
| ) | ||
| message_to_log = None | ||
| try: | ||
| now_seconds = now.replace(microsecond=0) | ||
| for line in logs: | ||
| line_timestamp, message = parse_log_line(line) | ||
| # Skip log lines from the current second to prevent duplicate entries on the next read. | ||
| # The API only allows specifying 'since_seconds', not an exact timestamp. | ||
| if line_timestamp and line_timestamp.replace(microsecond=0) == now_seconds: | ||
| break | ||
| if line_timestamp: # detect new log line | ||
| if message_to_log is None: # first line in the log | ||
| message_to_log = message | ||
| else: # previous log line is complete | ||
| if message_to_log is not None: | ||
| if is_log_group_marker(message_to_log): | ||
| print(message_to_log) | ||
| else: | ||
| self.log.info("[%s] %s", container_name, message_to_log) | ||
| message_to_log = message | ||
| elif message_to_log: # continuation of the previous log line | ||
| message_to_log = f"{message_to_log}\n{message}" | ||
| finally: | ||
| # log the last line and update the last_captured_timestamp | ||
| if message_to_log is not None: | ||
| if is_log_group_marker(message_to_log): | ||
| print(message_to_log) | ||
| else: | ||
| self.log.info("[%s] %s", container_name, message_to_log) | ||
| async with self._hook.get_conn() as connection: | ||
| v1_api = async_k8s.CoreV1Api(connection) | ||
| try: | ||
| now_seconds = now.replace(microsecond=0) | ||
| for line in logs: | ||
| line_timestamp, message = parse_log_line(line) | ||
| # Skip log lines from the current second to prevent duplicate entries on the next read. | ||
| # The API only allows specifying 'since_seconds', not an exact timestamp. | ||
| if line_timestamp and line_timestamp.replace(microsecond=0) == now_seconds: | ||
| break | ||
| if line_timestamp: # detect new log line | ||
| if message_to_log is None: # first line in the log | ||
| message_to_log = message | ||
| else: # previous log line is complete | ||
| if message_to_log is not None: | ||
| if is_log_group_marker(message_to_log): | ||
| print(message_to_log) | ||
| else: | ||
| for callback in self._callbacks: | ||
| cb = callback.progress_callback( | ||
| line=message_to_log, | ||
| client=v1_api, | ||
| mode=ExecutionMode.ASYNC, | ||
| container_name=container_name, | ||
| timestamp=line_timestamp, | ||
| pod=pod, | ||
| ) | ||
| if asyncio.iscoroutine(cb): | ||
| await cb | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Should we really be executing arbitrary user code inside the triggerer? Even though this is async, a long-running or blocking progress callback (for example calling an external API without a timeout) can still starve the triggerer’s event loop. That at least blocks the trigger executing it, and potentially other triggers handled by the same triggerer process. This feels like a fairly big design trade-off just to support progress logging, and I’m not sure it’s worth it.
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The specific use case I have in mind is in support of the watcher pattern that is being implement in astronomer cosmos for running DBT. astronomer/astronomer-cosmos#2207 With this you have a pod that's actually running DBT and an airflow task that is parsing kubernetes logs to extract DBT events and create xcom variables that are consumed by sensors. I think the parsing of the logs and setting xcom variables should be lightweight enough that it can be run from the triggerer. Implementing this from the triggerer was part of the original design of the defferred mode for KPO, but the implementation ran into issues so it was stripped back out. I did wonder if I should check the callbacks and only pass the ones that have actually implemented progress_callbacks to make everything a little lighter.
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The key phrase here is “should be lightweight”. Can we really guarantee that? Reading the code, it doesn’t seem like there are any restrictions on what a callback could execute, so we are effectively trusting users not to do any heavy lifting in them. One possible compromise would be to enforce a small global timeout for callbacks (e.g. a few seconds at most). However, even with that, this still sets a precedent for executing arbitrary user code in the triggerer. I agree the motivation here is solid and I can see the value of the feature, but this crosses into a fundamental triggerer design decision, which I’m not comfortable approving or disapproving unilaterally. @jscheffl I know you requested my review — I’d be interested in your thoughts on whether this is a precedent we’re happy to set for triggerers.
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. There aren't any restrictions on what can be run in the triggerer either though. You can add a custom operator and run any arbitrary code you want there. The larger concern I have is that somebody might write a callback using the non deferred mode and then switch to running in deferred mode, and then you have the triggerer calling synchronous code it wasn't designed for. I don't really have an answer to that, beyond noting that the progress_callback was broken from first implementation until very recently, so I guess nobody has been really using it.
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I have never worked with progress callbacks... do we call them on EACH log line? That smells like a massive overhead. So once people start using this I see a massive performance problem coming or you can not scale.
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yeah its called per line. I guess for the use case I have in mind, the log output isn't particularly verbose, but I can imagine situations where it would be an issue. |
||
|
|
||
| self.log.info("[%s] %s", container_name, message_to_log) | ||
| message_to_log = message | ||
| elif message_to_log: # continuation of the previous log line | ||
| message_to_log = f"{message_to_log}\n{message}" | ||
| finally: | ||
| # log the last line and update the last_captured_timestamp | ||
| if message_to_log is not None: | ||
| if is_log_group_marker(message_to_log): | ||
| print(message_to_log) | ||
| else: | ||
| for callback in self._callbacks: | ||
| cb = callback.progress_callback( | ||
| line=message_to_log, | ||
| client=v1_api, | ||
| mode=ExecutionMode.ASYNC, | ||
| container_name=container_name, | ||
| timestamp=line_timestamp, | ||
| pod=pod, | ||
| ) | ||
| if asyncio.iscoroutine(cb): | ||
| await cb | ||
|
|
||
| self.log.info("[%s] %s", container_name, message_to_log) | ||
| return now # Return the current time as the last log time to ensure logs from the current second are read in the next fetch. | ||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you confident that
def serialiable_callbackwill not be called from within a running event loop? Since this is a helper we cannot guarantee that it will not be invoked in a context without a pre-existing event loop and this could result inaRuntimeError. I believe it would be better to not special-case ASYNC mode and let the wrapper return the function as is regardless ofExecutionModewithout invokingasyncio.run(). Unless you have a strong reason to do it this way.Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well the idea was to define a helper function that could be used in a callback like
and that the same callback could be used in the triggerer or in the operator, which is invoked when the triggerer hands back to the operator if there are remaining logs in the pod that haven't been processed. Ideally then the callback should be written in async format, as it would be blocking in the callback if it wasn't, though that obviously depends on what the callback is doing.
Invocations of these callbacks are in the operator/triggerer, I don't know of anywhere else they are used.
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So the invocation of
@serializable_callbackis not being guarded and you are relying on the user correctly inferring the intent of the function? I agree with your motivation for introducing this helper but we cannot guarantee that it will not be called within an event loop in the operator. Is there any reason why the operator cannot do something like this:asyncio.run(callback(...))If that is not feasible, I believe at the very minimum:
RuntimeErrorshould be documented very clearly (docstring and comment)Below is a suggested implementation if you are still intent on keeping 2 separate modes:
This will immediately inform the user of the reason for the
RuntimeErrorand mitigate against further unsafe usage.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure I really understand the objection here. The invocations of this would be from the kubernetes pod operator, and sure I can't stop somebody from calling it from somewhere else, but I mean you could say that about anything...
Anyway to be honest I'd be just as happy to drop this helper from the PR.