-
Notifications
You must be signed in to change notification settings - Fork 14.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make Airflow error messages more specific, clear and actionable #43171
Comments
I can recommend this guide from Google about writing good error messages: https://developers.google.com/tech-writing/error-messages. The rest of the courses in that book are also really good btw.
Actionable errors are good, but has to be done very carefully, because if it gives misleading advice it will lead users down chasing the wrong rabbit hole. For example this log in |
I am big fan of "always tell the user what action from their side the error implies.". Agree things can be misleading and re the case you mentioned - I cannot find it now (I think I discussed it in the past), but I think in case of such complicated and multi-possible-root-cause we should explain what's going on and link to a FAQ page on Airflow explaining possible reasons. This way when you have the error, and we find other reasons and more detailed explanations what could be wrong and how to remediate it - we can always update the docs and add more information that will be useful for many past versions of airflow that people will have.
Absolutely :) |
Have a suggestion for multi-possible-root-cause issues - we can print Airflow error code with the error message e.g.
Since error codes are shareable and easily searchable, it would be useful for team collaboration as well (e.g. instead of me saying "I'm looking into the error |
❤️ this. This is what many other tools are doing already. And being able to classify and list all the different types of errors that the software can generate, together with explaining their cause and remediations - even just list those - is a sign of high maturity of the software. |
I really like it. We could finally find a use for AirflowException - so far it was mainly about being a base class for a number of exceptions, but if we add mandatory "error id" to AirflowException and make Airflow Exception abstract, and add handling so that that Error ID is displayed in the logs and maybe also produced as metric (counting the errors) and produce an event in the OTEL trace when they happen, might be really great mechanism to have and to "force" classification of all the errors that we have in Airflow. |
@potiuk @omkar-foss I really like how this discussion is shaping up. Have we established any guidelines or SOPs around how to designate the error codes? Or if there's a thread where this discussion is ongoing, would be happy to contribute (both via discussions and PR). |
Nice to hear from you @kunaljubce. I'm working on a doc to describe a list of all Airflow-related exceptions - starting with the We can then update that list as required based on further discussion. |
Hi, I'm still working on this, got caught up with other things. Will share the list in the next couple of days or so. |
Description
As per users' feedback in the Airflow Debugging Survey 2024, around 41.7% respondents don't consider error messages as actionable. Overall feedback also suggests that users find some error messages vague and confusing.
Use case/motivation
Goals for this issue are the following:
Celery command failed on host
can be transformed or displayed with something like "Please check your DAG processor timeout variable for this". So the user has a starting point to start debugging.Related issues
Parent Issue: #40975
Are you willing to submit a PR?
Code of Conduct
The text was updated successfully, but these errors were encountered: