Skip to content

Commit 958a59e

Browse files
authored
Merge pull request #35 from BasPH/feature/fix-ch12
Feature/fix ch12
2 parents a3b5b79 + a9c7d0c commit 958a59e

File tree

5 files changed

+44
-52
lines changed

5 files changed

+44
-52
lines changed

chapters/chapter12/README.md

Lines changed: 24 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -1,28 +1,38 @@
11
# Chapter 12
22

3-
Code accompanying Chapter 12 of the book 'Data Pipelines with Apache Airflow'.
3+
Code accompanying Chapter 12 of the book [Data Pipelines with Apache Airflow](https://www.manning.com/books/data-pipelines-with-apache-airflow).
44

55
## Contents
66

7-
This code example contains the following:
7+
This folder contains DAGs from Chapter 12. Topics covered are monitoring, logging, scaling horizontal, etc. An
8+
accompanying Docker Compose setup was built for demonstration purposes. This includes:
89

9-
- /DAGs:
10-
- dag_failure_callback.py - Trigger a callback function when a DAG run fails.
11-
- dag_puller_dag.py - Fetches latest code from a Git master branch every 5 minutes (Git must be configured).
12-
- task_failure_callback.py - Trigger a callback function when a task fails.
13-
- task_failure_email.py - Send an email when a task fails (SMTP must be configured).
14-
- task_sla.py - Send an SLA miss notification in case of a missed SLA.
15-
- /docker: An example (base) Dockerfile for use in a CI/CD pipeline.
16-
- /monitoring_docker_compose: A docker-compose setup for monitoring Airflow with StatsD, Prometheus, and Grafana.
10+
- Airflow (webserver, scheduler, and Celery workers)
11+
- PostgreSQL database for Airflow metastore
12+
- Redis for Celery queue
13+
- Flower, a Celery monitoring tool
14+
- Prometheus, for scraping and storing metrics
15+
- Grafana, for visualizing metrics
16+
- And a Redis & StatsD exporter to expose metrics
17+
18+
Given the number of services, this can become a bit resource-heavy on your machine.
19+
20+
Unfortunately, not everything can be scripted/pre-initialized, especially in Grafana. Therefore, you must add
21+
Prometheus as a datasource and create a dashboard yourself.
1722

1823
## Usage
1924

20-
To get started with the code examples, start Airflow in docker using the following command:
25+
To get started with the code examples, start Airflow with Docker Compose with the following command:
2126

22-
docker-compose up -d --build
27+
```bash
28+
docker-compose up -d
29+
```
2330

24-
Wait for a few seconds and you should be able to access the examples at http://localhost:8080/.
31+
The webserver initializes a few things, so wait for a few seconds, and you should be able to access the
32+
Airflow webserver at http://localhost:8080.
2533

2634
To stop running the examples, run the following command:
2735

28-
docker-compose down
36+
```bash
37+
docker-compose down -v
38+
```
Lines changed: 20 additions & 20 deletions
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@ version: '3.7'
44
x-environment: &airflow_environment
55
- AIRFLOW__CELERY__BROKER_URL=redis://redis:6379/1
66
- AIRFLOW__CELERY__RESULT_BACKEND=db+postgresql://airflow:airflow@postgres:5432/airflow
7-
- AIRFLOW__CORE__EXECUTOR=CeleryExecutor
7+
- AIRFLOW__CORE__EXECUTOR=LocalExecutor
88
- AIRFLOW__CORE__LOAD_DEFAULT_CONNECTIONS=False
99
- AIRFLOW__CORE__LOAD_EXAMPLES=False
1010
- AIRFLOW__CORE__SQL_ALCHEMY_CONN=postgresql://airflow:airflow@postgres:5432/airflow
@@ -14,7 +14,9 @@ x-environment: &airflow_environment
1414
- AIRFLOW__SCHEDULER__STATSD_ON=True
1515
- AIRFLOW__SCHEDULER__STATSD_PORT=9125
1616
- AIRFLOW__WEBSERVER__EXPOSE_CONFIG=True
17-
- AIRFLOW__WEBSERVER__RBAC=True
17+
- AIRFLOW__WEBSERVER__RBAC=False
18+
19+
x-airflow-image: &airflow_image apache/airflow:1.10.12-python3.8
1820
# ====================================== /AIRFLOW ENVIRONMENT VARIABLES ======================================
1921

2022
services:
@@ -27,53 +29,51 @@ services:
2729
ports:
2830
- "5432:5432"
2931

30-
initdb_adduser:
31-
image: apache/airflow:1.10.11-python3.7
32+
init:
33+
image: *airflow_image
3234
depends_on:
3335
- postgres
3436
environment: *airflow_environment
3537
entrypoint: /bin/bash
36-
# The webserver initializes permissions, so sleep for that to (approximately) be finished
37-
# No disaster if the webserver isn't finished by then, but create_user will start spitting out errors until the permissions exist
38-
command: -c 'airflow initdb && sleep 5 && airflow create_user --role Admin --username airflow --password airflow -e [email protected] -f airflow -l airflow'
38+
command: -c 'airflow upgradedb && sleep 5 && airflow create_user --username admin --password admin --firstname John --lastname Smith --role Admin --email [email protected]'
3939

4040
webserver:
41-
image: apache/airflow:1.10.11-python3.7
41+
image: *airflow_image
4242
restart: always
4343
depends_on:
4444
- postgres
45-
volumes:
46-
- logs:/opt/airflow/logs
4745
ports:
4846
- "8080:8080"
47+
volumes:
48+
- logs:/opt/airflow/logs
4949
environment: *airflow_environment
5050
command: webserver
5151

5252
scheduler:
53-
image: apache/airflow:1.10.11-python3.7
53+
image: *airflow_image
5454
restart: always
5555
depends_on:
5656
- postgres
5757
volumes:
58-
- ./hello_airflow.py:/opt/airflow/dags/hello_airflow.py
5958
- logs:/opt/airflow/logs
59+
- ./dags:/opt/airflow/dags
6060
environment: *airflow_environment
6161
command: scheduler
6262

6363
# docker-compose -f docker-compose-celeryexecutor.yml up --scale worker=3 -d
6464
worker:
65-
image: apache/airflow:1.10.11-python3.7
65+
image: *airflow_image
6666
restart: always
6767
depends_on:
6868
- scheduler
6969
volumes:
70-
- ./hello_airflow.py:/opt/airflow/dags/hello_airflow.py
7170
- logs:/opt/airflow/logs
71+
- ./dags:/opt/airflow/dags
7272
environment: *airflow_environment
7373
command: worker
7474

7575
flower:
76-
image: apache/airflow:1.10.11-python3.7
76+
image: *airflow_image
7777
restart: always
7878
depends_on:
7979
- worker
@@ -83,20 +83,20 @@ services:
8383
command: flower
8484

8585
statsd_exporter:
86-
image: prom/statsd-exporter
86+
image: prom/statsd-exporter:v0.18.0
8787
restart: always
8888
volumes:
89-
- ./statsd_mapping.yml:/tmp/statsd_mapping.yml
89+
- ./files/statsd_mapping.yml:/tmp/statsd_mapping.yml
9090
ports:
9191
- "9102:9102"
9292
- "9125:9125/udp"
9393
command: --statsd.mapping-config=/tmp/statsd_mapping.yml
9494

9595
prometheus:
96-
image: prom/prometheus
96+
image: prom/prometheus:v2.22.0
9797
restart: always
9898
volumes:
99-
- ./prometheus.yml:/etc/prometheus/prometheus.yml
99+
- ./files/prometheus.yml:/etc/prometheus/prometheus.yml
100100
ports:
101101
- "9090:9090"
102102
command:
@@ -109,7 +109,7 @@ services:
109109
- --web.console.templates=/usr/share/prometheus/consoles
110110

111111
grafana:
112-
image: grafana/grafana
112+
image: grafana/grafana:7.2.1
113113
restart: always
114114
ports:
115115
- "3000:3000"

chapters/chapter12/monitoring_docker_compose/hello_airflow.py

Lines changed: 0 additions & 18 deletions
This file was deleted.

0 commit comments

Comments
 (0)