Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Windows support for Airflow #10388

Open
shachibista opened this issue Aug 18, 2020 · 24 comments
Open

Windows support for Airflow #10388

shachibista opened this issue Aug 18, 2020 · 24 comments
Labels
kind:feature Feature Requests

Comments

@shachibista
Copy link

Description

Currently, the airflow project uses PEP-3143 style daemons to launch tasks (as implemented in https://pypi.org/project/python-daemon/), however this is targeted towards unix daemons. As a result, running airflow on windows requires multiple levels of abstraction each with their own problems. Would it be possible to use something like daemoniker (https://daemoniker.readthedocs.io/en/latest/) to launch tasks? What are the challenges and issues?

In machine learning workflows, with large datasets, it is a huge time-saver if the pipeline tasks can be run on the GPU. WSL 1 does not support GPU passthrough, docker through WSL 2 supports GPU passthrough only with the Insiders build, additionally it has issues with networking when connected to VPN (microsoft/WSL#5068).

Use case / motivation

Natively running airflow without WSL 1/2 or docker on Windows. This is helpful in cases where the company ecosystem is windows-based.

Possible implementation

The daemon module is only used to daemonize the scheduler and webserver. Here's a sample code that runs the scheduler (airflow origin/v1-10-stable) using daemoniker, comments are welcome:

# airflow/bin/cli.py
from daemoniker import Daemonizer

...

if args.daemon:
    with Daemonizer() as (is_setup, daemonizer):
        if is_setup:
            pid, stdout, stderr, log_file = setup_locations("scheduler",
                                                    args.pid,
                                                    args.stdout,
                                                    args.stderr,
                                                    args.log_file)
        
        _is_parent = daemonizer(
            pid,
            stdout_goto=stdout,
            stderr_goto=stderr
        )

    job.run()
@shachibista shachibista added the kind:feature Feature Requests label Aug 18, 2020
@boring-cyborg
Copy link

boring-cyborg bot commented Aug 18, 2020

Thanks for opening your first issue here! Be sure to follow the issue template!

@mik-laj
Copy link
Member

mik-laj commented Aug 18, 2020

have you encountered other problems with running Airflow on Windows? Windows support is highly anticipated by our users, but no one has dealt with this topic intensively yet. Personally, I use MacOS, but I support the idea of ​​adding support for Windows.

@shachibista
Copy link
Author

Yes. Following the installation manual on the homepage pip install apache-airflow installs the airflow command, but it is not a windows executable and windows does not recognize the #! ..../python3.exe shebang.

@mik-laj
Copy link
Member

mik-laj commented Aug 19, 2020

@shachibista Have you tried installing the development version from source? I think this change should fix this problem.
https://github.com/apache/airflow/pull/7808/files#r396126977

@potiuk
Copy link
Member

potiuk commented Aug 19, 2020

I think it would be great if someone could invest in Windows support. I believe there are few things - not only the daemon model but also Local Executor uses fork mechanisms which won't be able on Windows, also there might be some problem if you want to use Celery Executor on Windows: https://www.distributedpython.com/2018/08/21/celery-4-windows/ There are few POSIX-compliant packages used as well with might not work on Windows. And automated testing might be a problem since we are using Docker. It looks like quite a big effort to invest..

@shachibista
Copy link
Author

@mik-laj No, I haven't tried installing the development version from source. Is there a simple way to do it within windows?

@potiuk
Copy link
Member

potiuk commented Aug 22, 2020

I am afraid not. We know Airflow works in WSL2, but we also know it does not work on Windows. Unless you can convince someone to make it works for Windows, I am afraid it's not going to happen.

@mik-laj
Copy link
Member

mik-laj commented Aug 22, 2020

you can install the application from local sources by cloning the repository and then running the pip install -e . command

@shachibista
Copy link
Author

@mik-laj Yes, the development version fixes the issue with the airflow command, at least. But, I cannot start the scheduler due to the aforementioned issues.

@potiuk Are you sure there are no fork-like mechanisms for windows? I would really like to get this working at least using Local/SequentialExecutor.

@potiuk
Copy link
Member

potiuk commented Aug 25, 2020

@mik-laj Yes, the development version fixes the issue with the airflow command, at least. But, I cannot start the scheduler due to the aforementioned issues.

@potiuk Are you sure there are no fork-like mechanisms for windows? I would really like to get this working at least using Local/SequentialExecutor.

There are different mechanisms - here is the whole discussion about it: https://docs.python.org/3/library/subprocess.html#popen-constructor - but they work differently and Airflow relies on some of the properties of Popen and passing opened file handlers (for example to opened log files). I think there are also a number of other dependencies and possibly hard-coded UNIX path "/" across the code, also Windows is not POSIX-compliant, and I think there are many places where we rely on some tools or binaries which are part of POSIX standard.

I am not saying it's impossible, I just think it's quite an effort and unless you make all the tests pass on windows we can't even start thinking about it. You can start with forking Airflow and trying to make the test work on Windows. Github Actions support Windows runners, so this should be easy to enable.

We are heavily relying on Bash scripts for executing the tests and building Docker Images - and all our tests are run in Ubuntu docker image - however if you want to run it on Windows, it has to be done differently and likey not using Docker images - simply creating a virtualenv and installing everything.

Maybe you can find others who have time and would like to take a look at that together with you ? Simply start a discussion on our devlist and ask for help. I am afraid at this stage for the community, the fact that it works for WSL2 for Windows users is quite enough.

I know there were some changes implemented by @evgenyshulman from DataBand to make Airlfow work in a very limited way on Windows - so maybe rather than run a full set of tests on Windows, just getting a very simple support for Local Executor is possible ? Still Starting from a GitHub actions step installing Airflow on Windows is a good start, we cannot accept the code that is not tested, so being able to test it automatically is a prerequisite.

Happy to review any changes if you come up with tests running on Windows :).

@casra-developers
Copy link
Contributor

In our company we have now a setup where we use Ubuntu server to host Airflow (Web-Server, Dask-Scheduler) and a Windows Server as Dask-Worker. We need the tasks to run on Windows since there are some dependencies in them that cannot easily be ported to other platforms. Since the Dask-Worker also needs to have Airflow installed we had to clone the repository and add some extensions to deal with all the POSIX-only python functions that are not available on Windows. We ended up adding a platform check in certain files and "mimicking" POSIX behavior where necessary.
This approach works really well in the limited manner we need it to work, but it would be great if such a custom solution could be replaced by something more official. We would be willing to share our insights, if the devs are interested in pursuing this.

@potiuk
Copy link
Member

potiuk commented May 26, 2021

Absolutely! I think that might be great thing to add to Airflow. Maybe you would like to open a PR about this (cc: me) with your changes and we can discuss how to approach it.

@casra-developers
Copy link
Contributor

Great to hear! We will need a bit of time since we only cloned the repository and have not forked it yet. I will check with my team mates and create a PR as soon as I have the time. Thanks.

@potiuk
Copy link
Member

potiuk commented May 26, 2021

Looking forward to it. Today we've merged official MSSQL support so seems we are getting friendlier for Microsoft :)

@casra-developers
Copy link
Contributor

We have a go, I will create a fork and CC you @potiuk in the PR. There is probably a lot of things we need to do since the only goal was to implement enough functionality for Dask to run properly.

@potiuk
Copy link
Member

potiuk commented May 26, 2021

We can do it in stages as well. Happy to introduce some parts and see if this needs/can be replicated elsewhere.

@potiuk
Copy link
Member

potiuk commented May 26, 2021

We also want probably to add some tests in the CI of ours to run on Windows. GitHub supports Windows runners as well so I am happy to work on incrementally adding more tests and run them in our CI.

@casra-developers
Copy link
Contributor

Glad to hear it. We work mainly on Azure-DevOps so we are not very familiar with testing and CI tools on GitHub, but happy to learn. I have created the fork and started with implementing the changes. How would you go about step-wise integration?

@potiuk
Copy link
Member

potiuk commented May 26, 2021

Just split the changes needed maybe start with some small few lines part - I could then add the Windows CI tests around it on top. And we could add other PRs afterwards. Generally the smaller PR - the better :)

@casra-developers
Copy link
Contributor

I've added the PR. After playing around a bit I now know that while this works fine for a Dask-Worker, it does not if you want to run the Web-Server or a Scheduler on Windows just because of the process handling. There probably needs to be something like one more layer of abstraction to handle the execution of processes platform agnostic.

@pforai
Copy link

pforai commented Jul 11, 2022

Any updates on this?

@potiuk
Copy link
Member

potiuk commented Jul 11, 2022

If there is no update here, then there is no update. Everything here happens if somoene does it. Airflow is created by > 2100 users - more of them like you @pforai - users who contribute stuff if they need it.

Maybe you would like to take a lead on it and move it forward? We need users like you who apparenly have both the need and capabiliity (and in this case use Windows) who would like to move things forward and improve compatibility.

@Dev-iL
Copy link
Contributor

Dev-iL commented May 26, 2024

One incompatibility with Windows has to do with sqlite-db path absoluteness detection. Specifically, airflow currently requires absolute paths to start with / (i.e. sqlite:////...) otherwise they're determined to be invalid. On Windows, absolute paths start with a drive letter and not /.

Places where I saw checks like this:

  • airflow.settings.configure_orm
  • airflow.www.app

@potiuk
Copy link
Member

potiuk commented May 26, 2024

Yes. There are many more incompatibilities - like forking, and POSIX -only libraries used. but if someone would like to take on the task and implement and test all those, they are absolutely welcome to.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind:feature Feature Requests
Projects
None yet
Development

No branches or pull requests

5 participants