Skip to content

wagov-dtt/wa.gov.au_harvest-consultations

Repository files navigation

SQLMesh Data Pipeline for Drupal Integration

Overview

This document outlines an hourly process using SQLMesh to harvest data from external REST APIs, transform it, and store it in MySQL for consumption by Drupal views.

Open in GitHub Codespaces

Developing locally

The justfile in this repository has most useful commands:

$ just -l -u
Available recipes:
    default         # Choose a task to run
    prereqs         # Install project tools
    minikube        # Setup minikube
    mysql-svc       # Forward mysql from service defined in env
    dev             # SQLMesh ui for local dev
    skaffold *args  # skaffold configured with env and minikube
    mysqldump *args # mysqldump configured with same env as SQLMesh
    mysql *args     # mysql configured with same env as SQLMesh
    everestctl      # Install percona everest cli
    everest         # Percona Everest webui to manage databases

To get started, run just everest and use the web ui to create a database. Configure the database details in the .env file (refer example.env). Once configured you can run just local-dev to forward the mysql port and expose the sqlmesh ui.

To dump the sqlmesh database for validation/testing:

just mysqldump sqlmesh | gzip > sqlmesh.sql.gz

Testing container with skaffold

Configure secrets then run skaffold dev (which expects secrets created in cluster).

Container publish workflow

Process Design

  1. Hourly Data Harvesting: SQLMesh connects to and harvests data from external REST APIs

  2. Data Transformation: SQLMesh processes the harvested data

    • SQLMesh SQL Models
    • Data cleaning and standardization
    • Value translation based on mapping configuration
    • Data clone from duckdb state engine to mysql target tables
  3. Content Management:

    • Read-only imports of external content
    • Full management of Drupal-authored content

Notes on Development

For detailed implementation guidance, refer to: