DuckDB Dev Dashboard

This repository contains the code to operate the duckdb-dev-dashboard: https://duckdblabs.github.io/duckdb-dev-dashboard

The tech stack:

back-end: ducklake, with postgres catalog, and storage on cloudflare R2
front-end: evidence (https://docs.evidence.dev)
hosted: Github Pages
updates: periodically via Github actions cron

Setup

Create a postgres-r2-ducklake with DuckDB CLI

create a bucket at Cloudflare R2, to serve as data store for the ducklake. Get the following vars:
- DUCKLAKE_STORAGE_S3_KEY_ID
- DUCKLAKE_STORAGE_S3_SECRET
- DUCKLAKE_STORAGE_S3_ENDPOINT (the path of the bucket, e.g. r2://my-bucket/)
- DUCKLAKE_STORAGE_R2_ACCOUNT_ID
create a Postgres server, e.g. at https://neon.com/ to serve as catalog for the ducklake. Get the following vars, (mentioned in the connection string):
- DUCKLAKE_CATALOG_PG_HOST
- DUCKLAKE_CATALOG_PG_USER
- DUCKLAKE_CATALOG_PG_PASSWORD
for convenience and local testing, add the vars mentioned above to a .env file (gitignored) and run make secrets to create persistent secrets to connect to the ducklake. Note that secrets are stored in ~/.duckdb/stored_secrets.
note that the front end is hosted on GitHub pages: https://docs.evidence.dev/deployment/self-host/github-pages

Testing set-up: Connecting to the ducklake

to connect to the ducklake with the credentials created above:

ATTACH 'ducklake:ducklake_secret' AS my_ducklake (READ_ONLY);
USE my_ducklake;

The ducklake can be used as regular database, e.g PRAGMA show_tables; (no result if the ducklake is still empty).

to connect to the catalog database (e.g. to see the metadata):

ATTACH '' AS postgres_db (TYPE postgres, SECRET pg_secret, READ_ONLY);
USE postgres_db;
SELECT * FROM ducklake_metadata;

to directly query the data store (with the credentials created above) e.g.:

SELECT * FROM glob('r2://my-r2-bucket/**/*');

Adding Dashboards

Creating a dashboard requires the following steps:

create a data feed (python script); this will run periodically and add should create and update the data in the ducklake required for the dashboard
define a 'source' in evidence; a subdirectory under ./evidence/sources
define a 'page' in evidence; a markdown file under ./evidence/pages

defining data feeds

Data feeds are scripts that periodically store data in the ducklake

all data feeds are python packages under ./feeds/ and will be run by run_feeds.py (via make run_feeds)
to add a data feed, add python script (single file package) in a directory under ./feeds/ and update run_feeds.py
data feeds should create the data table on first run
the general lay-out of a data feed can be as follows:

data = my_func_to_fetch_data_from_somewhere()

from utils.ducklake import DuckLakeConnection
with DuckLakeConnection() as con:
    con.execute(<<< sql statments to create tables, add records, etc... >>>)

defining sources

The evidence front-end (see ./evidence/README.md) can not directly serve from the ducklake, therefore .duckdb files will be created as in-between step. This is not ideal, since evidence itself also copies the data to convert the data into parquet. Therfore (for now) there are 2 build steps:

make generate_sources: converts data in the ducklake into .duckdb persistent file
make build: converts .duckdb into .parquet and builds the front-end

Steps to define a new source:

initial step to create a new source, see: https://docs.evidence.dev/core-concepts/data-sources/duckdb/
- run make dev to spawn the front-end
- following the steps in the link above will create a subdirectory under ./evidence/sources
add the source to evidence/sources/sources.json, to specify which tables from the ducklake are needed.
run make generate_sources, this should create the .duckdb file (which is .gitignored, but needed for local testing).
add one or more .sql files to select the data relevant for the dashboard

defining dashboard pages

Define dashboard pages in ./evidence/pages.

To use the .sql files created in previous step, see https://docs.evidence.dev/core-concepts/queries/
Bars / Charts and other components, see: https://docs.evidence.dev/core-concepts/components/
locally test with make dev

example: ci stats dashboard to monitor CI

data feed: feeds/collect_ci_metrics.py #todo update
source: evidence/sources/ci_metrics/
page: evidence/pages/ci-stats.md

The data feed that fetches and stores completed CI runs from:

Note that only consecutive 'completed' runs are stored. After an initial run the script will add new completed runs ('append only').

Name		Name	Last commit message	Last commit date
Latest commit History 104 Commits
.github/workflows		.github/workflows
evidence		evidence
feeds		feeds
utils		utils
.gitignore		.gitignore
Makefile		Makefile
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

DuckDB Dev Dashboard

Setup

Create a postgres-r2-ducklake with DuckDB CLI

Testing set-up: Connecting to the ducklake

Adding Dashboards

defining data feeds

defining sources

defining dashboard pages

example: ci stats dashboard to monitor CI

About

Uh oh!

Contributors 2

Uh oh!

Languages

duckdblabs/duckdb-dev-dashboard

Folders and files

Latest commit

History

Repository files navigation

DuckDB Dev Dashboard

Setup

Create a postgres-r2-ducklake with DuckDB CLI

Testing set-up: Connecting to the ducklake

Adding Dashboards

defining data feeds

defining sources

defining dashboard pages

example: ci stats dashboard to monitor CI

About

Resources

Uh oh!

Stars

Watchers

Forks

Contributors 2

Uh oh!

Languages