Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
42 commits
Select commit Hold shift + click to select a range
03e22bb
type: description
ludamad Feb 2, 2026
c888ad8
feat: add namespace billing dashboard to rkapp
ludamad Feb 2, 2026
a232477
feat: fetch real GCP billing data from BigQuery for namespace dashboard
ludamad Feb 2, 2026
9395244
feat: use GCP Cloud Billing Export for real namespace costs
ludamad Feb 2, 2026
fcbf424
feat: use GKE resource consumption metering for real namespace billing
ludamad Feb 2, 2026
2f36a57
feat: add network/storage billing, auto-fetch, and minimal dashboard …
ludamad Feb 7, 2026
0938afb
docs: add comprehensive CI cost and metrics tracking plan
ludamad Feb 7, 2026
d985c87
feat: add SQLite metrics foundation and CI instrumentation
ludamad Feb 7, 2026
57d9fff
feat: add full CI cost tracking, AWS/GCP spend, and 6 metrics dashboards
ludamad Feb 8, 2026
4556573
feat: add CI cost attribution, resource details, and fix PR number ex…
ludamad Feb 8, 2026
63a1ee5
fix: test event field mismatch, PR data gaps, and missing SECTIONS
ludamad Feb 8, 2026
68f5e69
fix: resolve PR numbers from branch names via GitHub PR cache
ludamad Feb 8, 2026
62f23e7
feat: tag EC2 CI instances with GITHUB_ACTOR
ludamad Feb 10, 2026
5b0cc55
fix: top N namespaces computed per bucket, display union set
ludamad Feb 10, 2026
3ceb036
fix: cap chart legend to N labels, group remaining union into "other"
ludamad Feb 10, 2026
05193ae
fix: "other" bar includes all costs not in the named top N
ludamad Feb 10, 2026
ee4d77e
fix: chart shows all per-bucket top-N union members, "other" is the rest
ludamad Feb 10, 2026
464644e
fix: per-bucket top N with "other" always last
ludamad Feb 10, 2026
29511f4
fix: pin "other" to bottom of stack and end of tooltip
ludamad Feb 10, 2026
8893c03
feat: sync date range and granularity across dashboard pages
ludamad Feb 10, 2026
e5fd952
fix: use GKE usage table for all resources, not just network/storage
ludamad Feb 10, 2026
2b9cd9b
refactor: replace file-based billing cache with in-memory BigQuery fetch
ludamad Feb 11, 2026
73201c4
feat: tag EC2 CI instances with CICommand for cost attribution
ludamad Feb 11, 2026
f89e7ea
feat: add merge queue failure rate tracking with SQLite backfill
ludamad Feb 11, 2026
cb6f67b
feat: add merge queue backfill JSON seed and SQLite loader
ludamad Feb 11, 2026
1524e86
feat: add CI health dashboard with review fixes
ludamad Feb 11, 2026
9b6b704
feat: add CI Insights single-page dashboard
ludamad Feb 11, 2026
a2e450e
fix: clean up CI Insights dashboard
ludamad Feb 11, 2026
1ab7543
feat: add pipeline filter to CI Insights dashboard
ludamad Feb 11, 2026
1c6332e
feat: add billing exploration CLI and extend cache to 365 days
ludamad Feb 11, 2026
8951254
feat: restructure billing into package, persist CI data, improve attr…
ludamad Feb 12, 2026
fb5f3e6
refactor: extract ci-metrics into separate server
ludamad Feb 12, 2026
6b8308d
fix: update deploy scripts to include ci-metrics in Docker image
ludamad Feb 12, 2026
26d9f16
fix: address PR review comments
ludamad Feb 12, 2026
a8a85e2
feat: add test timings dashboard
ludamad Feb 12, 2026
6b266c9
feat: query GCP SKU pricing from BigQuery pricing export
ludamad Feb 12, 2026
632108b
fix: replace thin proxy with proper reverse proxy
ludamad Feb 12, 2026
61df604
fix: use relative paths for ci-metrics links in root menu
ludamad Feb 12, 2026
28db912
fix: proxy strips Accept-Encoding to avoid double-compression
ludamad Feb 12, 2026
1f6a2b8
fix: strip Content-Encoding from proxied responses to prevent ERR_CON…
ludamad Feb 12, 2026
8ff330b
fix: rename db=db.get_db() to conn=db.get_db() to avoid shadowing mod…
ludamad Feb 13, 2026
fb43e29
fix: kill stale ci-metrics process on port before restarting
ludamad Feb 13, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .github/workflows/ci3.yml
Original file line number Diff line number Diff line change
Expand Up @@ -88,6 +88,7 @@ jobs:
PR_COMMITS: ${{ github.event.pull_request.commits }}
PR_NUMBER: ${{ github.event.pull_request.number }}
GITHUB_REF_NAME: ${{ github.ref_name }}
GITHUB_ACTOR: ${{ github.actor }}
# NOTE: $CI_MODE is set in the Determine CI Mode step.
run: ./.github/ci3.sh $CI_MODE

Expand Down
15 changes: 15 additions & 0 deletions ci3/aws_request_instance_type
Original file line number Diff line number Diff line change
Expand Up @@ -86,6 +86,21 @@ if [ -z "${iid:-}" -o "${iid:-}" == "None" ]; then
echo $iid > $iid_path
fi

tags="Key=Name,Value=$name Key=Group,Value=build-instance"
[ -n "${GITHUB_ACTOR:-}" ] && tags+=" Key=GithubActor,Value=$GITHUB_ACTOR"
[ -n "${CI_MODE:-}" ] && tags+=" Key=CICommand,Value=$CI_MODE"
[ -n "${CI_DASHBOARD:-}" ] && tags+=" Key=Dashboard,Value=$CI_DASHBOARD"
if [ "${UNSAFE_AWS_KEEP_ALIVE:-0}" -eq 1 ]; then
echo_stderr "You have set UNSAFE_AWS_KEEP_ALIVE=1, so the instance will not be terminated after 1.5 hours by the reaper script. Make sure you shut the machine down when done."
tags+=" Key=Keep-Alive,Value=true"
fi
aws ec2 create-tags --resources $iid --tags $tags

# Record the instance type so callers can pass it downstream (e.g. into Docker).
echo $instance_type > $state_dir/instance_type
# Record whether this is spot or on-demand.
[ -f "$sir_path" ] && echo spot > $state_dir/spot || echo ondemand > $state_dir/spot

while [ -z "${ip:-}" ]; do
sleep 1
ip=$(aws ec2 describe-instances \
Expand Down
15 changes: 5 additions & 10 deletions ci3/bootstrap_ec2
Original file line number Diff line number Diff line change
Expand Up @@ -89,6 +89,8 @@ if [[ -f "$state_dir/sir" ]]; then
sir=$(cat $state_dir/sir)
fi
iid=$(cat $state_dir/iid)
export EC2_INSTANCE_TYPE=$(cat $state_dir/instance_type 2>/dev/null || echo "unknown")
export EC2_SPOT=$(cat $state_dir/spot 2>/dev/null || echo "unknown")

# If AWS credentials are not set, try to load them from ~/.aws/build_instance_credentials.
if [ -z "${AWS_ACCESS_KEY_ID:-}" ] || [ -z "${AWS_SECRET_ACCESS_KEY:-}" ]; then
Expand Down Expand Up @@ -192,16 +194,6 @@ container_script=$(
log_ci_run FAILED \$ci_log_id
merge_train_failure_slack_notify \$ci_log_id
release_canary_slack_notify \$ci_log_id
ci_failed_data=\$(jq -n \\
--arg status "failed" \\
--arg log_id "\$ci_log_id" \\
--arg ref_name "\${TARGET_BRANCH:-\$REF_NAME}" \\
--arg commit_hash "\$COMMIT_HASH" \\
--arg commit_author "\$COMMIT_AUTHOR" \\
--arg commit_msg "\$COMMIT_MSG" \\
--argjson exit_code "\$code" \\
'{status: \$status, log_id: \$log_id, ref_name: \$ref_name, commit_hash: \$commit_hash, commit_author: \$commit_author, commit_msg: \$commit_msg, exit_code: \$exit_code, timestamp: now | todate}')
redis_publish "ci:run:failed" "\$ci_failed_data"
;;
esac
exit \$code
Expand Down Expand Up @@ -331,6 +323,9 @@ function run {
-e AWS_TOKEN=\$aws_token \
-e NAMESPACE=${NAMESPACE:-} \
-e NETWORK=${NETWORK:-} \
-e GITHUB_ACTOR=${GITHUB_ACTOR:-} \
-e EC2_INSTANCE_TYPE=${EC2_INSTANCE_TYPE:-unknown} \
-e EC2_SPOT=${EC2_SPOT:-unknown} \
--pids-limit=65536 \
--shm-size=2g \
aztecprotocol/devbox:3.0 bash -c $(printf '%q' "$container_script")
Expand Down
11 changes: 11 additions & 0 deletions ci3/ci-metrics/Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
FROM python:3.12

RUN apt update && apt install -y jq redis-tools && rm -rf /var/lib/apt/lists/*

WORKDIR /app
COPY requirements.txt requirements.txt
RUN pip install --no-cache-dir -r requirements.txt gunicorn
RUN git config --global --add safe.directory /aztec-packages
COPY . .
EXPOSE 8081
CMD ["gunicorn", "-w", "4", "-b", "0.0.0.0:8081", "app:app"]
Loading
Loading