diff --git a/scripts/airflow3/README.md b/scripts/airflow3/README.md index dfe8f174ba..c194d9d2c5 100644 --- a/scripts/airflow3/README.md +++ b/scripts/airflow3/README.md @@ -1,16 +1,18 @@ # Run Airflow3 Locally -This guide walks you through setting up Apache Airflow 3 locally using Hatch and a Postgres container. You'll run a Postgres container to use it as a backend database for Airflow and set up the necessary environment to run Airflow. +This guide will walk you through the process of setting up Apache Airflow 3 locally using pip. You can choose either SQLite or Postgres as the database backend for Airflow. In this setup, we'll be using a Postgres profile to run Cosmos DAGs. -## 1. Setup Postgres Container (Optional) +## 1. Setup Postgres Container -We'll use PostgreSQL as the backend. The following command will pull the official Postgres image, create a container named postgres, and expose the necessary ports. +By default, SQLite will be used as Airflow metadata database unless you update the AIRFLOW__DATABASE__SQL_ALCHEMY_CONN environment variable to point to PostgreSQL. We spawn Postgres container to use Postgres profile in Cosmos DAGs. The following command will pull the official Postgres image , create a container named postgres, and expose the required ports. + +### 1.1 Pull Postgres Image ```commandline docker run --name postgres -p 5432:5432 -p 5433:5433 -e POSTGRES_PASSWORD=postgres postgres ``` -## 2. Access the PostgreSQL Console and Create the Database +### 1.2 Access the PostgreSQL Console and Create the Database Now that the PostgreSQL container is running, you can connect to it via the command line using psql @@ -18,7 +20,7 @@ Now that the PostgreSQL container is running, you can connect to it via the comm psql --u postgres ``` -### Create the Database for Airflow +### 1.3 Create the Database for Airflow Once you're inside the psql interactive terminal, you can create a new database that Airflow will use. @@ -26,11 +28,11 @@ Once you're inside the psql interactive terminal, you can create a new database CREATE DATABASE airflow_db; ``` -## 3. Setup Virtual Environment for Airflow3 +## 2. Setup Virtual Environment for Airflow3 -With your Postgres container running and your database set up, you need to configure the virtual environment for Airflow3. +You need to configure the virtual environment for Airflow3. -### Export ENV +### 2.1 Export ENV This will export the AIRFLOW related env like AIRFLOW_HOME etc @@ -38,13 +40,13 @@ This will export the AIRFLOW related env like AIRFLOW_HOME etc source scripts/airflow3/env.sh ``` -### Install Dependency +## 3. Install Dependency ```commandline sh scripts/airflow3/setup.sh ``` -## 5. Run Airflow in Standalone Mode +## 4. Run Airflow in Standalone Mode Activate the virtual env created in previous step and run airflow @@ -60,7 +62,7 @@ This command will: - Initialize the Airflow database. - Start Airflow webserver, scheduler and trigger. -### Run Airflow Tests +## 5. Run Airflow Tests Once Airflow is running, you can also run tests. @@ -72,8 +74,40 @@ source "$(pwd)/scripts/airflow3/venv-af3/bin/activate" sh scripts/airflow3/tests.sh ``` -## 4. Access the Airflow Web Interface +## 6. Access the Airflow Web Interface After running the standalone command, you can access the Airflow web interface to monitor the status of your DAGs, tasks, and more. - The web interface should be available at http://localhost:8080 + + +## 7. Install Airflow from the Main Branch + +If you want to install Airflow from the main branch, follow the steps from sections 1, 2, and 3 above. Then, proceed with the following steps: + +### 7.1 Set ENV AIRFLOW_REPO_DIR + +Set ENV `AIRFLOW_REPO_DIR` in scripts/airflow3/env.sh pointing to the path where your Airflow repository is cloned. + +### 7.2 Activate the Virtual Environment + +```commandline +source scripts/airflow3/env.sh + +source "$(pwd)/scripts/airflow3/venv-af3/bin/activate" +``` + +### 7.3 Install Airflow from the Main Branch + +```commandline +sh scripts/airflow3/install_from_main.sh +``` + +### 7.4 Run Airflow standalone + +Finally, run Airflow in standalone mode again: + + +```commandline +airflow standalone +``` diff --git a/scripts/airflow3/env.sh b/scripts/airflow3/env.sh index 0631049c82..32ca41b120 100644 --- a/scripts/airflow3/env.sh +++ b/scripts/airflow3/env.sh @@ -17,3 +17,4 @@ export AIRFLOW__CORE__LOAD_EXAMPLES=false export AIRFLOW__CORE__DAGBAG_IMPORT_ERROR_TRACEBACK_DEPTH=10 export AIRFLOW__CORE__DAGBAG_IMPORT_TIMEOUT=300 # export AIRFLOW__LOGGING__LOGGING_LEVEL=DEBUG +export AIRFLOW_REPO_DIR="$PWD/../airflow" diff --git a/scripts/airflow3/install_from_main.sh b/scripts/airflow3/install_from_main.sh new file mode 100644 index 0000000000..51f6cacd15 --- /dev/null +++ b/scripts/airflow3/install_from_main.sh @@ -0,0 +1,40 @@ +#!/bin/bash + +set -v +set -x +set -e + +: "${AIRFLOW_REPO_DIR:?Environment variable AIRFLOW_REPO_DIR is not set}" +echo "AIRFLOW_REPO_DIR is set to '$AIRFLOW_REPO_DIR'" + +COSMOS_ROOT="$PWD" + +cd "$AIRFLOW_REPO_DIR" +git checkout main && git pull + +pip uninstall -y apache-airflow-core +pip uninstall -y apache-airflow-task-sdk +pip uninstall -y apache-airflow-providers-fab +pip uninstall -y apache-airflow +pip uninstall -y apache-airflow-providers-git + +rm -rf dist + +pip install uv + +pip install -e "$AIRFLOW_REPO_DIR/dev/breeze" --force + +breeze release-management prepare-provider-distributions \ + --distributions-list celery,common.io,common.compat,fab,standard,openlineage,git \ + --distribution-format wheel + +breeze release-management prepare-airflow-distributions --distribution-format wheel + +cd task-sdk +uv build --package apache-airflow-task-sdk --wheel + +cd .. + +pip install dist/* + +cd "$COSMOS_ROOT"