-
Notifications
You must be signed in to change notification settings - Fork 16.4k
Disable GitSync + Persistence combo in the Helm Chart #28822
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
0caed7a to
59c15db
Compare
dstandish
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems fine, though being technically breaking probably requires major release? Unless you want to think of it as bugfix.....
@jedcunningham
|
We had a lot of discussions about it with @jedcunningham in related issues and some time ago I wrote the article describing in detail why git-sync and networked persistence do not play along together (and do not serve any useful use case). I think we've accumulated enough of the issues from our users that confirm that they REALLY did not want to use persistence and git-sync, and we had a lot of problems which we could attribute to lack of atomicity that is amplified by git-sync + network persistence. The last straw for me was this discussion https://apache-airflow.slack.com/archives/C027H098M1C/p1673333515991549. where the users wanted to use dags.persistence and dags.gitSync to combine development and production workflows in one (which would not work but the user was unaware of that as they did not know that how git-sync works under the hood it would overwrote the manually updated/copied dags). All-in-all - I think allowing this combo is actively harmful and we should help our users to make the right decision (and ourselves by not having similar discussions/getting issues created) by disabling the combo. |
|
I personally think it is a bugfix (because it does not work anyway :) ) |
59c15db to
7a969c1
Compare
Git Sync and Persistence for DAGs makes very little sense together and is largely misleading our users on what it does. Git Sync provides atomicity of DAG folder synchronisation via checking out a complete copy of the DAGs folder and swapping symbolic link pointing to it. It does not play well with networked persistence. It makes it super-easy by users unaware how git-sync and persistence work under-the-hood to walk into several traps: * git sync on persistent remote volumes such as EFS generate a LOT of extra traffic due to the way how git sync works (it creates second working folder for dags and replaces symbolic link to folders which effectively forces full sync of whole DAG folder for all involved instances with every commit * due to that sync that gets distributed over multiple clients of persistent volumes it looses the atomicity property of git sync and the above case where there are burst of synchronisation betwween multiple nodes, it is very likely to trigger inconsistent DAG parsing * the problem amplifies when the network volumes are distributed among multiple nodes and there are some networking limits (for example not provisioned IOPS in EFS). The amount of traffic generated at sync might cause even more inconsistencies - only solvable by paying extra IOPS (where it would not be needed normally) * users might be tricked into trying to use gitSync and also update DAGs using persistence (so basically combine the development friendly dag distribution over persistent volumes and production-ready git-sync - without being aware that git-sync will override the manually synced DAGS when swapping the symbolic links On top of it, the current status is that it does not work. There are several issues where volumes are missing when the combo is used in certain situations and better than fixing those is to disable it. Closes: apache#27545 Closes: apache#27476 Closes: apache#27080 Related: apache#27124
7a969c1 to
f3cfb6e
Compare
|
I think I'd rather warn instead of block (but I'm also a bit worn down on thinking about the combo in our chart, so I certainly won't stand in the way here). |
|
I think this is one of those cases that we really balloon a matrix of tests (and complexity of the helm) for very little reason. Seems that we alrady have a number of tests - now failing) where both flags are set - yet we still do not have the combo right. After seeing the number of tests failing (28), though, I have a second thought - because apparently this combo was actually quite thoroughly tested before. I wonder (for my own understanding here) - why the (currently failing) tests were added. What was the reasoning behind having a number of I think answer to that should determine whether it makes sense to remove it completely or maybe just warn the users and try to fix the cases we know for now. |
|
And yes I am just afraid that we might break things for some users - so far I understood that #27545 #27476 effectively means that it was really difficult to get a prod-ready setting, but After looking closely it's been "freshly" broken by #22913 Also #22913 is triggered by recently added standalone DAG File processor. So my claims that this is not "breaking" is not as bold as it was. Let me think for a while about it. |
|
Yes, it did work at one point and the tests were intentionally added (but even then this combo really only made sense with a single scheduler). |
|
Still thinking on it :). |
|
Hussein Awala ***@***.***>, 19 Şub 2023 Paz, 03:10 tarihinde
şunu yazdı:
… And yes I am just afraid that we might break things for some users
@potiuk <https://github.com/potiuk> I think the user who uses the combo
and wants to upgrade to the new chart version could still use it by
enabling git-sync, disabling dags persistence and adding the dags volume
manually
<https://github.com/gmsantos/airflow/blob/main/chart/values.yaml#L246-L250>,
what do you think?
—
Reply to this email directly, view it on GitHub
<#28822 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/A3I4YZXULSKELUGOWARXQKTWYFQFRANCNFSM6AAAAAATWMD7HQ>
.
You are receiving this because you are subscribed to this thread.Message
ID: ***@***.***>
|
|
Hi, I was about to create a bug report related to this but then I saw this PR, so probably better to include it here. The issue appears when activating both dags persistence and git-sync. When you have a DAG inside a package, that is importing some other module from the package, like: from mypackage.lib.mylib import lib_methodthe import will fail at DAG execution time if both git-sync and dag persistence are enabled. The DAG is still displayed in the Web UI, and airflow list it as correctly imported (i.e. it is displayed in the output of Traceback (most recent call last):
File "/home/airflow/.local/lib/python3.7/site-packages/airflow/models/dagbag.py", line 339, in parse
loader.exec_module(new_module)
File "<frozen importlib._bootstrap_external>", line 728, in exec_module
File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed
File "/opt/airflow/dags/repo/dags/mypackage/mydags/mydag.py", line 6, in <module>
from mypackage.lib.mylib import lib_method
ModuleNotFoundError: No module named 'mypackage'If you disable dags persistence then everything works as expected. I have created a repository that reproduces the issue: https://github.com/JMLizano/explore-airflow-chart-issue |
|
Closing in favour of #32181 |
Git Sync and Persistence for DAGs makes very little sense together and is largely misleading our users on what it does.
Git Sync provides atomicity of DAG folder synchronisation via checking out a complete copy of the DAGs folder and swapping symbolic link pointing to it. It does not play well with networked persistence.
It makes it super-easy by users unaware how git-sync and persistence work under-the-hood to walk into several traps:
On top of it, the current status is that it does not work. There are several issues where volumes are missing when the combo is used in certain situations and better than fixing those is to disable it.
Closes: #27545
Closes: #27476
Closes: #27080
Related: #27124
^ Add meaningful description above
Read the Pull Request Guidelines for more information.
In case of fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in a newsfragment file, named
{pr_number}.significant.rstor{issue_number}.significant.rst, in newsfragments.