on_failure_callback not being triggered on -9 error #40557
-
Apache Airflow versionOther Airflow 2 version (please specify below) If "Other Airflow 2 version" selected, which one?2.8.1 What happened?I have a DAG which is getting a -9 error code
On the DAG I've defined a on_failure_callback function, on the default_args, but it is not being triggered. It is only triggered if I call a kind of "manually throw exception" function. So the SMTP settings are fine and work, just the on_failure_callback is somehow not being triggered when the -9 error happens. Here a part as example from my DAG
The task update_xxx1_table_task is getting the -9 from above, and no email is being sent. If I define something like
This works fine sending the email. What you think should happen instead?No response How to reproduceDefine a DAG as my example from above, put a task which casues a -9 (memory issue) error, and see how the error Email is not being sent. Operating SystemAWS Versions of Apache Airflow ProvidersNo response DeploymentAmazon (AWS) MWAA Deployment detailsNo response Anything else?Let me know if I can check some concrete logs to provide more information. I've been checking all Airflow Logs I know and I could not find any error or something related to the Email sending or related to the on_failure_callback. Are you willing to submit PR?
Code of Conduct
|
Beta Was this translation helpful? Give feedback.
Replies: 2 comments 3 replies
-
Thanks for opening your first issue here! Be sure to follow the issue template! If you are willing to raise PR to address this issue please do so, no need to wait for approval. |
Beta Was this translation helpful? Give feedback.
-
That's correct behaviour. -9 means SIGKILL described for example here https://komodor.com/learn/what-is-sigkill-signal-9-fast-termination-of-linux-containers/ and it cannot be handled (it immediately forces the process to die without giving it any chance to react) - this is POSIX standard, so system level behaviour, not even Python behaviour. SIGKILL is a sign that something TERRIBLE happened - and whatever sends SIGKILL is deliberately requesting the process to DIE instantly without giving it a chance to do anything. You need to find out what is the root cause for SIGKILL (something is forcefully killing your process - but what it is - only you can find out). Often it can be caused by resource problems (for example Out Of Memory) . But only you can check it and find out what is the thing that is not graceful to your process and kills it instantly. Converted it to discussion in case more is needed. |
Beta Was this translation helpful? Give feedback.
That's correct behaviour. -9 means SIGKILL described for example here https://komodor.com/learn/what-is-sigkill-signal-9-fast-termination-of-linux-containers/ and it cannot be handled (it immediately forces the process to die without giving it any chance to react) - this is POSIX standard, so system level behaviour, not even Python behaviour.
SIGKILL is a sign that something TERRIBLE happened - and whatever sends SIGKILL is deliberately requesting the process to DIE instantly without giving it a chance to do anything. You need to find out what is the root cause for SIGKILL (something is forcefully killing your process - but what it is - only you can find out). Often it can be caused by re…