An Airflow-based dashboard for the LA Metro ETL pipeline!
Perform the following steps from your terminal.
-
Clone the LA Metro Councilmatic repository and follow the instructions in its README to build and run the application.
-
Clone this repository and create a local
.env
file.
cp .env.example .env
Fill in the absolute location of your GPG keyring, usually the absolute path for ~/.gnupg
.
- Build and run the dashboard:
docker-compose up
- Finally, to visit the dashboard app, go to http://localhost:8080/admin/. The Councilmatic app runs on http://localhost:8001/.
See the Airflow documentation for more on navigating the UI and development.
Dashboard DAGS are based on one of two applications:
The conversation on how to ensure DAGs are running against the current version of these applications is captured in this issue.
tl;dr - Application dependencies are packaged as Docker images and pushed to GitHub Container Registry. When a task starts, it pulls the corresponding image, runs a custom script to decrypt the bundled secrets and append dashboard-specific connection strings, then executes its command in a container.
The dashboard runs DAGs from application images stored in GitHub Container Registry:
Both images are configured to build automatically from their corresponding
GitHub repositories. Commits to main
(i.e., staging deployments) build a
main
tag. Pushes to deploy
(i.e., production deployments) build a deploy
tag.
When DAGs are run, our custom Docker operator
tries to decrypt the secrets bundled the application image using your local GPG keyring.
This does not seem to work for GPG keys with a passphrase, i.e., your personal
GPG key. If decryption fails, the dashboard will fall back to using the example
settings files. (See scripts/concat_settings.sh
.)
Scrapes will run with the default settings file. Note that running the bill scrape without the encrypted token will not capture private bills, however it should provide enough signal to test whether scrapes are working unless you are specifically trying to test private bill logic.
Metro processing requires AWS credentials and a SmartLogic API token, i.e., Metro DAGs will fail locally without decrypted secrets.
If you need to test the Metro ETL pipeline, I would suggest manually deploying your branch to staging and running the DAGs there, as the server has the appropriate keys to decrypt Metro application secrets.
If you must work locally, you can follow steps 1-5 in our instructions for
moving keys between servers
to export the private key, then log out of the server and scp
it down to your
computer:
# pubkey.txt is a misnomer from the linked documentation – this is a text file
# containing the *private* key you exported using gpg --export-secret-key
scp [email protected]:/home/ubuntu/pubkey.txt .
gpg --import pubkey.txt
Don't forget to remove pubkey.txt
from the server and from your local machine
after you've imported the keys successfully.
Now you can run Metro DAGs locally using decrypted secrets.
Admins can add users on the 'List Users' page located in the 'Security' dropdown on the top Airflow toolbar. Each user is assigned a role, which has associated permissions. The Airflow documentation has a thorough explanation of permissions. The role 'Metro Admin' grants users access only to the Dashboard view.