Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
58 changes: 46 additions & 12 deletions scripts/airflow3/README.md
Original file line number Diff line number Diff line change
@@ -1,50 +1,52 @@
# Run Airflow3 Locally

This guide walks you through setting up Apache Airflow 3 locally using Hatch and a Postgres container. You'll run a Postgres container to use it as a backend database for Airflow and set up the necessary environment to run Airflow.
This guide will walk you through the process of setting up Apache Airflow 3 locally using pip. You can choose either SQLite or Postgres as the database backend for Airflow. In this setup, we'll be using a Postgres profile to run Cosmos DAGs.

## 1. Setup Postgres Container (Optional)
## 1. Setup Postgres Container

We'll use PostgreSQL as the backend. The following command will pull the official Postgres image, create a container named postgres, and expose the necessary ports.
By default, SQLite will be used as Airflow metadata database unless you update the AIRFLOW__DATABASE__SQL_ALCHEMY_CONN environment variable to point to PostgreSQL. We spawn Postgres container to use Postgres profile in Cosmos DAGs. The following command will pull the official Postgres image , create a container named postgres, and expose the required ports.

### 1.1 Pull Postgres Image

```commandline
docker run --name postgres -p 5432:5432 -p 5433:5433 -e POSTGRES_PASSWORD=postgres postgres
```

## 2. Access the PostgreSQL Console and Create the Database
### 1.2 Access the PostgreSQL Console and Create the Database

Now that the PostgreSQL container is running, you can connect to it via the command line using psql

```commandline
psql --u postgres
```

### Create the Database for Airflow
### 1.3 Create the Database for Airflow

Once you're inside the psql interactive terminal, you can create a new database that Airflow will use.

```commandline
CREATE DATABASE airflow_db;
```

## 3. Setup Virtual Environment for Airflow3
## 2. Setup Virtual Environment for Airflow3

With your Postgres container running and your database set up, you need to configure the virtual environment for Airflow3.
You need to configure the virtual environment for Airflow3.

### Export ENV
### 2.1 Export ENV

This will export the AIRFLOW related env like AIRFLOW_HOME etc

```commandline
source scripts/airflow3/env.sh
```

### Install Dependency
## 3. Install Dependency

```commandline
sh scripts/airflow3/setup.sh
```

## 5. Run Airflow in Standalone Mode
## 4. Run Airflow in Standalone Mode

Activate the virtual env created in previous step and run airflow

Expand All @@ -60,7 +62,7 @@ This command will:
- Initialize the Airflow database.
- Start Airflow webserver, scheduler and trigger.

### Run Airflow Tests
## 5. Run Airflow Tests

Once Airflow is running, you can also run tests.

Expand All @@ -72,8 +74,40 @@ source "$(pwd)/scripts/airflow3/venv-af3/bin/activate"
sh scripts/airflow3/tests.sh
```

## 4. Access the Airflow Web Interface
## 6. Access the Airflow Web Interface

After running the standalone command, you can access the Airflow web interface to monitor the status of your DAGs, tasks, and more.

- The web interface should be available at http://localhost:8080


## 7. Install Airflow from the Main Branch

If you want to install Airflow from the main branch, follow the steps from sections 1, 2, and 3 above. Then, proceed with the following steps:

### 7.1 Set ENV AIRFLOW_REPO_DIR

Set ENV `AIRFLOW_REPO_DIR` in scripts/airflow3/env.sh pointing to the path where your Airflow repository is cloned.

### 7.2 Activate the Virtual Environment

```commandline
source scripts/airflow3/env.sh

source "$(pwd)/scripts/airflow3/venv-af3/bin/activate"
```

### 7.3 Install Airflow from the Main Branch

```commandline
sh scripts/airflow3/install_from_main.sh
```

### 7.4 Run Airflow standalone

Finally, run Airflow in standalone mode again:


```commandline
airflow standalone
```
1 change: 1 addition & 0 deletions scripts/airflow3/env.sh
Original file line number Diff line number Diff line change
Expand Up @@ -17,3 +17,4 @@ export AIRFLOW__CORE__LOAD_EXAMPLES=false
export AIRFLOW__CORE__DAGBAG_IMPORT_ERROR_TRACEBACK_DEPTH=10
export AIRFLOW__CORE__DAGBAG_IMPORT_TIMEOUT=300
# export AIRFLOW__LOGGING__LOGGING_LEVEL=DEBUG
export AIRFLOW_REPO_DIR="$PWD/../airflow"
Comment thread
pankajkoti marked this conversation as resolved.
40 changes: 40 additions & 0 deletions scripts/airflow3/install_from_main.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,40 @@
#!/bin/bash

set -v
set -x
set -e

: "${AIRFLOW_REPO_DIR:?Environment variable AIRFLOW_REPO_DIR is not set}"
echo "AIRFLOW_REPO_DIR is set to '$AIRFLOW_REPO_DIR'"

COSMOS_ROOT="$PWD"

cd "$AIRFLOW_REPO_DIR"
git checkout main && git pull

pip uninstall -y apache-airflow-core
pip uninstall -y apache-airflow-task-sdk
pip uninstall -y apache-airflow-providers-fab
pip uninstall -y apache-airflow
pip uninstall -y apache-airflow-providers-git

rm -rf dist

pip install uv

pip install -e "$AIRFLOW_REPO_DIR/dev/breeze" --force

breeze release-management prepare-provider-distributions \
--distributions-list celery,common.io,common.compat,fab,standard,openlineage,git \
--distribution-format wheel

breeze release-management prepare-airflow-distributions --distribution-format wheel

cd task-sdk
uv build --package apache-airflow-task-sdk --wheel

cd ..

pip install dist/*
Comment thread
pankajkoti marked this conversation as resolved.

cd "$COSMOS_ROOT"