-
Notifications
You must be signed in to change notification settings - Fork 14.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve Airflow's debugging story #40975
Comments
Let me add to it what I wrote about OTEL in the https://lists.apache.org/thread/b2bvn8sbxfncg9qpvry9w142944mnlj6 - this might be a great tool to hlep with things. I a not sure if I want to take lone ownership about that one - maybe there will be someone else who would like to take a look and explore things as well - but I am happy to be deeply involved in that one. |
I'd like to be involved in this effort in some capacity. At least: brainstorming, qa, and documentation. |
Happy to help out with some of the logging and error handling implementation. The debug snapshot idea sounds very useful @potiuk. It may give a canonical view of the user's environment. I suppose Jaeger provides a similar tool called Anonymizer, which generates a shareable json of a trace - probably same one that you were referring to in your mail. We can build our own debug snapshot util, or can think of using this tool with Jaeger since it supports the existing OTEL metrics and traces. |
@Dev-iL Could I assign this GitHub issue to you? You can the lead the "scoping" part of this epic by talking to Jarek and others on Slack, mailing list and other venues and come back with a concrete proposal. Would you like to do that? |
@kaxil Honestly? It sounds a bit scary going from contributing minor patches to being responsible for an important feature in an upcoming release. I prefer to actively observe and learn, at least once, how something like this is done and take on a similar responsibility after I know how much time/work it requires. |
Absolutely, that's completely fine
@omkar-foss Do you want to take a stab at leading it? |
@kaxil I would love to take the lead on this, but right now I suppose I'm still a rookie in the ways of the Airflow community. So for this one, I'll prefer to assist all of you in every way possible, while trying to get a better grasp of the processes, codebase etc. Hope that's okay, thanks for considering me though 😇 |
@kaxil Any idea if there's a predefined user research template that has been used for prior releases? If not, I'd like to propose the following for conducting the survey:
Please let me know your thoughts on this, thanks. |
@omkar-foss The main question is who the target audience of the research is, where possible answers are: maintainers, contributors, power users, general public, etc. Based on @kaxil's instructions, I'd say mostly power-users and above. If that is the case, I'm assuming most will be willing to participate in a survey, even if it has questions on topics people might not have an opinion on. If on the other hand, we're looking to get more participants, I think a literal survey is not the way, since people might open it, see how long it is, and just give up. That, of course, would be a terrible waste, because there are likely many use-cases that will not be represented. For the above reason, I was thinking something like a feature voting platform (example1, example2) could be suitable - that way, if someone has a pain-point related to how a particular system works, they can look for existing posts or briefly explain what they have in mind (possibly with a template like a bug report) and allow others to vote or add to these suggestions. This also takes care of much of the aggregation work of the results. |
Hey @Dev-iL, I agree with your reasoning above. I checked out the sample Feature Upvote board that you've shared above and it surely feels simpler (and quicker) to submit compared to a regular survey form. I suppose we'll need an initial list of features on the upvote board for the participants to vote, would be great to hear if you've any thoughts around it. Not sure how much help I can be on this, but I'm here so feel free to tag me if you need any assistance! :) |
I'd say mostly power-users - yes, but also the tooling and debuggability should be targeted for "new" users. I think power-users mostly know their ways - they can do remote debugging, they know how to connect their IDEs to the code, they are able to even use pdb, py-spy and other tools while remote shelling to container instances etc. But the goal here is to shorten the path between "I wrote some DAG and it does not work" to "how do I most effectively find inspect and understand what's going on there" - for a user who just wrote their first few dags. I think an assumption should be that that person has some Python experience, they have an IDE (PyCharm/ VSCode) and they are willing to follow some instructions on setting up things first - while ideally this should be one-time setup and they should be able to re-use it easily (and teach others how to do it).
I think yes - survey is a good idea if well prepared and those power-users might indeed be willing to share their experiences - we can even leverage the upcoming Airlfow summit and do some prices / recognition and generally a bit more fuss about it - so if we could do it still in August and maybe run the survey during the Summit as well, we could likely make it much more efficient. |
@potiuk @omkar-foss In the interest of moving ahead with this, I've made a google doc so we can start hashing out this survey collaboratively. Currently, it's publicly open for commenting - please send me your google account via slack so I could add you to the editors. If there are any privacy or other concerns, I don't mind moving the document to another platform. |
@Dev-iL Drop a mail to [email protected] too (Public archive: https://lists.apache.org/[email protected]). I am sure a lot of developer & users might want to add things to it as well as in Airflow's slack channel |
It's been a few days, and the document hasn't seen any activity (outside of my own placeholder ideas), nor did anyone approach me for editing rights. If this trend continues, we won't have the survey ready on time. @kaxil I just saw your comment on the mailing list. My plan was to first iterate on the survey's structure in docs, move to form once satisfied, then circulate it for responses. |
Doc looks good to me. Just one question/suggestion - will all questions be optional, or some mandatory, some optional? My suggestion would be to keep as many questions optional especially free text type questions (Q 2.4, 3.4, 4.4, 4.5). Reason being not all people will have feedback suiting each question. |
@omkar-foss don't suggest - decide. I, too, think questions should be mostly optional. As for the contents of the survey - I don't believe it's ready. It currently has questions asking about general sentiments on things, and I don't know how actionable it will be unless users answer the free text questions en masse. I'll give you an example: suppose user satisfaction with the airflow documentation comes out as "medium" overall - what do you do about this? OTOH, suppose we had a multi-select question that mentioned airflow features introduced in the last few 2.x releases, asking if users find the examples provided for them sufficient - now that would be something actionable. See what I mean? It needs the eyes of someone who knows airflow and its power user community better than I do, to know the right questions to ask, potentially about specific components, plugins, use-cases, etc, so that feedback is insightful and useful. |
I think none of the maintainers know "power users" well. Almost by definition, we are not running, nor maining airlfow and we do not have teams of people working together on DAGs. we are pretty much blind-folded when it comes to their needs and can at most guess what is troublesome for them or what can help them. We mostly know how to debug Airflow itself, not how to debug Airflow DAGs. There are huge and significant differences for workflows, tooling and integration with IDEs. Same as with documentation - we are very POOR documentation writers, because a) we think about internals and not externals b) we have a lot of knowledge and assumptions that readers might not have and we might fail to explain it to them c) we tend to focus on HOW things are done not WHAT our users might want to learn form it. That's why we NEED power users themselvs and ideally people who work in teams and have an opportunity to lead and decide on those questions and questionaire. We might definitely advise on decision making but we should not "lead" such process. |
Yes, we're on the same page. We're now in the phase of collecting feedback on finalizing the survey draft on Airflow Slack, hoping for quicker response and finding users who use Airflow along with their teams. Starting with Would be great if we all can continue this conversation from this issue to Airflow Slack (on |
I also got a chance to review the doc and make some suggestions to it. |
I also looked at it - and actually I have a comment a bit contrary to those early comments of @amoghrajesh who insisted on "choice" answers. Since we are not really sure about the debugging usage in a number of places I find the rating questions (Often/Rare/Satisfied etc.) telling us very little - especially that we also have no baseline to compare it. I think this survey will be answered by a small number of people (not few 100s but few 10s maybe) so statistical aggregation of the data for such a small sample will be very misleading and useless - we will anyhow get mostly answers from people who are frustrated by their experiences, this is almost a given, so any stats based on the ranked answers will be a) super biased b) very little telling. I think the biggest value of this survey is to get some concrete examples, stories, unknown to us ways how people are debugging Airlflow and the "free form" answer is absolutely most important insight we can get from it - we can learn for example that somoene uses x.y.z tool in this specific way, and that they miss that and this feature there - but we will never be able to ask the right question for it - especially one tha thave "rated" answer". So I think pretty much all the questions there should be of the type:
Or
And I think the choice should be in most cases binary. Otherwise I'd find very little value finding out that 15 of 20 people find that informations are often misleading without any additional explanation. So I think all the questions that have 5 choices of satisfaction should be decresed to 2 choices ("not my problem/my problem) and the scond should be accompanied with obligatory explanation why. Yes it will make the survey longer to fill, and yes it will decrease the number of responses we get but I feel this will be way more useful for us. |
fyi, following are the docs that have actionable next steps based on the questions (and options) in the survey:
|
The survey form is ready: https://s.apache.org/airflow-debugging-survey2024 , thanks to @Dev-iL , @omkar-foss & @amoghrajesh |
Thanks to @Dev-iL -- we have a QR code that links to the survey |
This will allow him to interact with the GitHub project for sig-debugging: apache#40975
This will allow him to interact with the GitHub project for sig-debugging: #40975
Hi all! As per discussion, we'll be tracking all issues related to Airflow Debugging Story (based on debugging survey responses) on this project: https://github.com/orgs/apache/projects/421 |
This will allow him to interact with the GitHub project for sig-debugging: apache#40975
This will allow him to interact with the GitHub project for sig-debugging: apache#40975
Also see #40802 (comment) discussion. I believe with OTEL and traces (and even including limited set of logs in the traces) we are closer to address big gap in debugging of Airflow where we can give our users a tool to provide us way more diagnostics information that will allow us to analyse, diagnose, and fix many problems much more efficiently. |
This will allow him to interact with the GitHub project for sig-debugging: apache#40975
Summary
As we prepare for the release of Airflow 3.0, one of the key areas that need significant enhancement is the debugging experience.
Current Challenges
dag.test
andtask.test
does a good job already but we should see if we can do even better.airflow dags parse
does a job at it, worth checking if it is sufficient or not.Whoever takes on this task should conduct a user research on the mailing list, Slack, Meetup or Airflow Summit to identify other common debugging problems that can be fixed.
The text was updated successfully, but these errors were encountered: