Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add basic Airflow error guide #44616

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

omkar-foss
Copy link
Collaborator

related: #43171

This PR introduces a very basic guide with Airflow error codes mapping to some common errors. This can be treated as a starting point for the mapping to which we can all start adding to and improving the coverage of errors and their mapping to possible causes and resolutions.

@omkar-foss omkar-foss self-assigned this Dec 3, 2024
@omkar-foss
Copy link
Collaborator Author

List of core Airflow exceptions from airflow/exceptions.py as below. I've started with AirflowException for now. Also error logs are good source of common errors that users face, which we should map - I've mapped a few as Error Log in the markdown table.

AirflowException
AirflowBadRequest
AirflowNotFoundException
DagNotFound
DagCodeNotFound
DagRunNotFound
AirflowConfigException
AirflowSensorTimeout
AirflowRescheduleException
InvalidStatsNameException
AirflowTaskTimeout
AirflowTaskTerminated
AirflowWebServerTimeout
AirflowSkipException
AirflowFailException
AirflowOptionalProviderFeatureException
AirflowInternalRuntimeError
XComNotFound
UnmappableOperator
XComForMappingNotPushed
UnmappableXComTypePushed
UnmappableXComLengthPushed
AirflowDagCycleException
AirflowDagDuplicatedIdException
AirflowClusterPolicyViolation
AirflowClusterPolicySkipDag
AirflowClusterPolicyError
AirflowTimetableInvalid
AirflowFileParseException
FileSyntaxError
ConnectionNotUnique
TaskDeferred
TaskDeferralError
PodMutationHookException
PodReconciliationError
RemovedInAirflow3Warning
AirflowProviderDeprecationWarning
DeserializingResultError
UnknownExecutorException

@potiuk
Copy link
Member

potiuk commented Dec 6, 2024

One comment here @omkar-foss. This is quite some change in how we treat errors, so it would be great to announce intention to implement those error numbers and messages at the devlist. While there was survey and few people discussed that this is a good idea, "What did not happen on devlist, did not happen" - so likely start a discussion on devlist - with intention to run lazy consensus / (or vote in case there will be any doubts).

@omkar-foss
Copy link
Collaborator Author

One comment here @omkar-foss. This is quite some change in how we treat errors, so it would be great to announce intention to implement those error numbers and messages at the devlist. While there was survey and few people discussed that this is a good idea, "What did not happen on devlist, did not happen" - so likely start a discussion on devlist - with intention to run lazy consensus / (or vote in case there will be any doubts).

Done, sent on devlist ✅

Apologies for the delayed response! I'll continue adding error mappings to this PR while we await responses on devlist and finalize items etc.

@omkar-foss omkar-foss force-pushed the airflow-error-guide-43171 branch 2 times, most recently from 7c250e8 to 67520d8 Compare December 30, 2024 08:48
@omkar-foss omkar-foss force-pushed the airflow-error-guide-43171 branch from 015ae51 to 86fc8b1 Compare December 30, 2024 09:10
@omkar-foss omkar-foss marked this pull request as ready for review December 30, 2024 09:10
@omkar-foss
Copy link
Collaborator Author

omkar-foss commented Dec 30, 2024

In accordance with @ashb's feedback on this slack thread to include errors relevant to end users, I've updated the Airflow Error Codes list in this PR with top 100 user-facing errors with their descriptions and newly assigned (tentative) error codes.

I've created this top 100 errors list by referring to Airflow-related questions on StackOverflow, suggestions from ChatGPT and also by referring to few questions asked on #user-troubleshooting slack channel.

There's a lot of scope for improving this list so would be great if you can check it out and drop a comment on this PR as necessary. Thank you :)

Markdown-rendered view here: https://github.com/apache/airflow/blob/86fc8b10bd248e41aba2d80de76bac04280e2c03/dev/AIRFLOW_ERROR_GUIDE.md

@potiuk
Copy link
Member

potiuk commented Dec 31, 2024

As discussed in slack - value of that list and the page is going to be WAY better if there is an action that the user can make for all of those. Users often do not look for description of what is going on, they are looking after the solutions. And in a number of cases we can at the very least guide them where to look for such solutions, which part ofthe documentation should they look for (i.e. link to relevant documentation) . In some other cases we can suspect that this is a deployment issue and tell the users to look there, In many other cases we can even point them to actual configuration parameters that could be changed, or typical resolutions and aras they should look for. In many other cases you can add some examples what could be done.

The ruff rules for one are very good way of approaching it lilke https://docs.astral.sh/ruff/rules/#legend - many of those rules explain what happen, and a number of thos provide a proposal for a solution/example of fixes. While it's a bit "easier" with ruff, as the rules are simpler than potential Airflow errors, I see no reason why we should not be able to at least guide the people to the solutions. That might significantly decrease the number of issues people will open in our repo, and even if not - it will make it easier for all contributors and committers and triage team to be able to respond to such issue and direct the users to those pages, providing first liine of support for our users.

All those do not haave to be there in the PR to get it merged, but IMHO we should design it in the way that it is possible - and "crowdsource" filling that information (via an issue where we will have)

  • ERR1
  • ERR2

And let the community people contribute the possible solutions and things to look at there.

Possibly table like that is a bit to "small" to keep that information.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Development

Successfully merging this pull request may close these issues.

2 participants