This repository contains the code to operate the duckdb-dev-dashboard:
https://duckdblabs.github.io/duckdb-dev-dashboard
The tech stack:
- back-end: ducklake, with postgres catalog, and storage on cloudflare R2
- front-end: evidence (https://docs.evidence.dev)
- hosted: Github Pages
- updates: periodically via Github actions cron
- create a bucket at Cloudflare R2, to serve as data store for the ducklake. Get the following vars:
DUCKLAKE_STORAGE_S3_KEY_IDDUCKLAKE_STORAGE_S3_SECRETDUCKLAKE_STORAGE_S3_ENDPOINT(the path of the bucket, e.g.r2://my-bucket/)DUCKLAKE_STORAGE_R2_ACCOUNT_ID
- create a Postgres server, e.g. at https://neon.com/ to serve as catalog for the ducklake. Get the following vars, (mentioned in the
connection string):DUCKLAKE_CATALOG_PG_HOSTDUCKLAKE_CATALOG_PG_USERDUCKLAKE_CATALOG_PG_PASSWORD
- for convenience and local testing, add the vars mentioned above to a
.envfile (gitignored) and runmake secretsto create persistent secrets to connect to the ducklake. Note that secrets are stored in~/.duckdb/stored_secrets. - note that the front end is hosted on GitHub pages: https://docs.evidence.dev/deployment/self-host/github-pages
- to connect to the ducklake with the credentials created above:
ATTACH 'ducklake:ducklake_secret' AS my_ducklake (READ_ONLY);
USE my_ducklake;The ducklake can be used as regular database, e.g PRAGMA show_tables; (no result if the ducklake is still empty).
- to connect to the catalog database (e.g. to see the metadata):
ATTACH '' AS postgres_db (TYPE postgres, SECRET pg_secret, READ_ONLY);
USE postgres_db;
SELECT * FROM ducklake_metadata;- to directly query the data store (with the credentials created above) e.g.:
SELECT * FROM glob('r2://my-r2-bucket/**/*');Creating a dashboard requires the following steps:
- create a data feed (python script); this will run periodically and add should create and update the data in the ducklake required for the dashboard
- define a 'source' in evidence; a subdirectory under
./evidence/sources - define a 'page' in evidence; a markdown file under
./evidence/pages
Data feeds are scripts that periodically store data in the ducklake
- all data feeds are python packages under
./feeds/and will be run byrun_feeds.py(viamake run_feeds) - to add a data feed, add python script (single file package) in a directory under
./feeds/and updaterun_feeds.py - data feeds should create the data table on first run
- the general lay-out of a data feed can be as follows:
data = my_func_to_fetch_data_from_somewhere()
from utils.ducklake import DuckLakeConnection
with DuckLakeConnection() as con:
con.execute(<<< sql statments to create tables, add records, etc... >>>)The evidence front-end (see ./evidence/README.md) can not directly serve from the ducklake, therefore .duckdb files will be created as in-between step.
This is not ideal, since evidence itself also copies the data to convert the data into parquet.
Therfore (for now) there are 2 build steps:
make generate_sources: converts data in the ducklake into.duckdbpersistent filemake build: converts.duckdbinto.parquetand builds the front-end
Steps to define a new source:
- initial step to create a new source, see: https://docs.evidence.dev/core-concepts/data-sources/duckdb/
- run
make devto spawn the front-end - following the steps in the link above will create a subdirectory under
./evidence/sources
- run
- add the source to
evidence/sources/sources.json, to specify which tables from the ducklake are needed. - run
make generate_sources, this should create the.duckdbfile (which is .gitignored, but needed for local testing). - add one or more
.sqlfiles to select the data relevant for the dashboard
Define dashboard pages in ./evidence/pages.
- To use the
.sqlfiles created in previous step, see https://docs.evidence.dev/core-concepts/queries/ - Bars / Charts and other components, see: https://docs.evidence.dev/core-concepts/components/
- locally test with
make dev
- data feed:
feeds/collect_ci_metrics.py#todo update - source:
evidence/sources/ci_metrics/ - page:
evidence/pages/ci-stats.md
The data feed that fetches and stores completed CI runs from:
- https://api.github.com/repos/duckdb/duckdb/actions/workflows
- https://api.github.com/repos/duckdb/duckdb/actions/runs
- https://api.github.com/repos/duckdb/duckdb/actions/runs/{RUN_ID}/jobs
Note that only consecutive 'completed' runs are stored. After an initial run the script will add new completed runs ('append only').