This document outlines an hourly process using SQLMesh to harvest data from external REST APIs, transform it, and store it in MySQL for consumption by Drupal views.
The justfile
in this repository has most useful commands (run just prereqs
and restart codespace/devcontainer before diving in to make sure all cli utilities are in place):
$ just -l -u
Available recipes:
default # Choose a task to run
prereqs # Install project tools
minikube # Setup minikube
mysql-svc # Forward mysql from k8s cluster
dev # SQLMesh ui for local dev
test # Build and test container
dump-consultations # Dump the sqlmesh database to logs/consultations.sql.gz (run test to create/populate db first)
awslogin # use aws sso login profiles
setup-eks # Create an eks cluster for testing
schedule-with-eks # Deploy scheduled task to eks with secrets
To get started, run just dev
to create a minikube cluster, forward the mysql service and expose the sqlmesh ui.
To dump the sqlmesh
database for validation/testing:
just clean # This tears down the kubernetes namespace, including local db
just test # Builds and pushes local docker image into minikube, then runs the job once off (and schedules for hourly runs)
just dump-consultations # Exports current database from minikube
# grab output from logs/consultations.sql.gz
To run the packaged container in a production environment, it will need HARVEST_PORTALS
and MYSQL_DUCKDB_PATH
configured (refer to duckdb mysql extension). The remaining env vars in example.env are just to simplify local development. The below example also includes adjusting the output database/table (note that the database in the MYSQL_DUCKDB_PATH
connection and the SQLMESH__VARIABLES__OUTPUT_DB
should match.
# .env example
HARVEST_PORTALS='{"engagementhq":["https://ehq-site1.example.domain","https:/ehq-site2.example.domain"],"citizenspace":["https://cs-site3.example.domain","https://cs-site4.example.domain"]}'
MYSQL_PWD='...'
MYSQL_DUCKDB_PATH='host=localhost user=root database=sqlmesh'
SQLMESH__VARIABLES__OUTPUT_DB="sqlmesh"
SQLMESH__VARIABLES__OUTPUT_TABLE="consultations"
The justfile with this repository includes a default configuration that can be used with just setup-eks
and then just schedule-with-eks
which will create an AWS EKS Auto cluster, then schedule a job and database admin container in the harvest-consultations
namespace. Note that secrets will also be pulled from local env vars and saved in the cluster (which will be using KMS encrypted sealed secrets if setup as above). Reviewing the justfile and the eks manifest directory should be enough to configure for specific use cases (e.g. Use existing VPC and customising security groups / NAT gateways).
For further runtime customisation, see environment overrides in the sqlmesh configuration guide and this projects config.yaml.
Current release is v0.3.1 which has a published container image built for both linux/amd64
and linux/arm64
architectures from the ghcr.io/astral-sh/uv:python3.13-bookworm-slim image.
-
Hourly Data Harvesting: SQLMesh connects to and harvests data from external REST APIs
- SQLMesh Python Models
- Configurable API endpoints and authentication
- Runs every hour via kubernetes CronJob
-
Data Transformation: SQLMesh processes the harvested data
- SQLMesh SQL Models
- Data cleaning and standardization
- Value translation in SQL views
- Data clone from
duckdb
state engine tomysql
target tables using DuckDB MySQL Extension
-
Content Management:
- Read-only imports of external content
- Full management of Drupal-authored content
For detailed implementation guidance, refer to: