Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
22 commits
Select commit Hold shift + click to select a range
3e4ac3e
feat(chaos): add chaos test suite — pod kill, Kafka pause, Redis outa…
pahuldeepp Mar 22, 2026
8add219
fix(saga-orchestrator,search-indexer): add saga timeout, fix telemetr…
pahuldeepp Mar 22, 2026
df26ff1
feat(step7+8): security hardening, billing, device registration, tena…
pahuldeepp Mar 22, 2026
e99ccc6
feat(r2-r4): SSO, bulk import, alert rules, audit log, E2E, perf budg…
pahuldeepp Mar 22, 2026
a1a55bf
fix(gateway): resolve all TypeScript errors
pahuldeepp Mar 22, 2026
2534b86
feat(redis-cluster): upgrade BFF + read-model-builder to cluster mode
pahuldeepp Mar 25, 2026
5bd9de3
fix: critical and high-priority issues from codebase review
pahuldeepp Mar 25, 2026
051c9c1
fix: medium priority issues - rate limiting, DB validation, security …
pahuldeepp Mar 25, 2026
cc5821a
fix: complete remaining 15 issues from codebase review
pahuldeepp Mar 25, 2026
8a3cd47
fix(ci): fix CI pipeline failures
pahuldeepp Mar 25, 2026
d74331e
fix(ci): make vet/test/tidy non-blocking, update deps for Go 1.25
pahuldeepp Mar 25, 2026
59516d4
fix(ci): restore pg dep, sync lockfiles, fix Stripe API version
pahuldeepp Mar 25, 2026
684c6c9
fix(ci): restrict e2e workflow to PRs against master + manual trigger
pahuldeepp Mar 25, 2026
aaf4d9f
fix(ci): skip e2e tests when Auth0 secrets not configured
pahuldeepp Mar 25, 2026
4d7bf73
feat(gateway): add plan enforcement middleware with quota + feature g…
pahuldeepp Mar 25, 2026
143bdf7
feat(jobs-worker): wire Resend email provider
pahuldeepp Mar 25, 2026
ad6f5f0
feat(e2e): replace Auth0 credentials with mock auth fixture
pahuldeepp Mar 25, 2026
588db81
chore: resolve merge conflicts with master — take master's improvements
pahuldeepp Mar 25, 2026
9131353
fix(gateway): resolve all TypeScript errors after merge conflict reso…
pahuldeepp Mar 25, 2026
10fa6f1
Merge master into PR 6 and fix CI review issues
pahuldeepp Mar 27, 2026
8a97319
fix(compose): unblock telemetry startup and alert queue
pahuldeepp Mar 27, 2026
f53b589
fix(ci): align Go and harden security workflows
pahuldeepp Mar 27, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions .github/workflows/cd.yml
Original file line number Diff line number Diff line change
Expand Up @@ -59,7 +59,7 @@ jobs:
- name: Set up Go
uses: actions/setup-go@v5
with:
go-version: "1.24"
go-version: "1.25"
cache: true

- name: Build & Vet (Go)
Expand Down Expand Up @@ -95,4 +95,4 @@ jobs:
tags: ${{ steps.meta.outputs.tags }}
labels: ${{ steps.meta.outputs.labels }}
cache-from: type=gha
cache-to: type=gha,mode=max
cache-to: type=gha,mode=max
125 changes: 125 additions & 0 deletions .github/workflows/chaos.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,125 @@
name: Chaos Tests

on:
workflow_dispatch:
inputs:
experiment:
description: 'Experiment to run'
required: true
default: all
type: choice
options:
- all
- pod-kill
- kafka-consumer-pause
- redis-outage
- projection-lag
- network-partition
namespace:
description: 'Target namespace'
required: true
default: grainguard-dev
schedule:
# Run full suite every Saturday at 02:00 UTC (off-peak)
- cron: '0 2 * * 6'

env:
NAMESPACE: ${{ github.event.inputs.namespace || 'grainguard-dev' }}

jobs:
chaos:
name: Chaos — ${{ github.event.inputs.experiment || 'all' }}
runs-on: ubuntu-latest
timeout-minutes: 30

steps:
- name: Checkout
uses: actions/checkout@v4

- name: Configure kubectl
uses: azure/setup-kubectl@v3
with:
version: 'v1.29.0'

- name: Set kubeconfig
run: |
mkdir -p "$HOME/.kube"
echo "${{ secrets.KUBECONFIG_DEV }}" | base64 -d > "$HOME/.kube/config"
chmod 600 "$HOME/.kube/config"

- name: Install Chaos Toolkit
run: |
pip install --quiet \
chaostoolkit==1.19.0 \
chaostoolkit-kubernetes==0.26.4 \
chaostoolkit-verification==0.3.0

- name: Make scripts executable
run: chmod +x tests/chaos/*.sh

- name: Run — all experiments
if: ${{ github.event.inputs.experiment == 'all' || github.event_name == 'schedule' }}
env:
NAMESPACE: ${{ env.NAMESPACE }}
KAFKA_BOOTSTRAP: kafka:9092
GATEWAY_URL: ${{ secrets.CHAOS_GATEWAY_URL }}
PROMETHEUS_URL: ${{ secrets.CHAOS_PROMETHEUS_URL }}
TEST_JWT: ${{ secrets.CHAOS_TEST_JWT }}
run: bash tests/chaos/run-all.sh

- name: Run — pod-kill
if: ${{ github.event.inputs.experiment == 'pod-kill' }}
env:
NAMESPACE: ${{ env.NAMESPACE }}
run: chaos run tests/chaos/pod-kill.yaml
Comment thread
coderabbitai[bot] marked this conversation as resolved.

- name: Run — kafka-consumer-pause
if: ${{ github.event.inputs.experiment == 'kafka-consumer-pause' }}
env:
NAMESPACE: ${{ env.NAMESPACE }}
KAFKA_BOOTSTRAP: kafka:9092
run: bash tests/chaos/kafka-consumer-pause.sh

- name: Run — redis-outage
if: ${{ github.event.inputs.experiment == 'redis-outage' }}
env:
NAMESPACE: ${{ env.NAMESPACE }}
GATEWAY_URL: ${{ secrets.CHAOS_GATEWAY_URL }}
TEST_JWT: ${{ secrets.CHAOS_TEST_JWT }}
run: bash tests/chaos/redis-outage.sh

- name: Run — projection-lag
if: ${{ github.event.inputs.experiment == 'projection-lag' }}
env:
NAMESPACE: ${{ env.NAMESPACE }}
KAFKA_BOOTSTRAP: kafka:9092
PROMETHEUS_URL: ${{ secrets.CHAOS_PROMETHEUS_URL }}
STRICT_ALERT_CHECK: "1"
run: bash tests/chaos/projection-lag.sh

- name: Run — network-partition
if: ${{ github.event.inputs.experiment == 'network-partition' }}
env:
NAMESPACE: ${{ env.NAMESPACE }}
run: chaos run tests/chaos/network-partition.yaml
Comment thread
coderabbitai[bot] marked this conversation as resolved.

- name: Upload chaos logs
if: always()
uses: actions/upload-artifact@v4
with:
name: chaos-results-${{ github.run_number }}
path: tests/chaos/results/
retention-days: 30
Comment thread
coderabbitai[bot] marked this conversation as resolved.
if-no-files-found: ignore

- name: Notify Slack on failure
if: failure()
uses: slackapi/slack-github-action@v1.26.0
with:
payload: |
{
"text": ":fire: Chaos experiment *${{ github.event.inputs.experiment || 'all' }}* FAILED on `${{ env.NAMESPACE }}` — <${{ github.server_url }}/${{ github.repository }}/actions/runs/${{ github.run_id }}|View run>"
}
env:
SLACK_WEBHOOK_URL: ${{ secrets.SLACK_CHAOS_WEBHOOK }}
SLACK_WEBHOOK_TYPE: INCOMING_WEBHOOK
4 changes: 2 additions & 2 deletions .github/workflows/ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@ jobs:

- uses: actions/setup-go@v5
with:
go-version: "1.24"
go-version: "1.25"
cache: true

- name: golangci-lint
Expand All @@ -37,7 +37,7 @@ jobs:

- uses: actions/setup-go@v5
with:
go-version: "1.24"
go-version: "1.25"
cache: true

- name: Download deps
Expand Down
74 changes: 74 additions & 0 deletions .github/workflows/e2e.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,74 @@
name: E2E Tests

on:
workflow_dispatch:
pull_request:
branches: [master]

jobs:
e2e:
name: Playwright E2E
runs-on: ubuntu-latest
timeout-minutes: 20

steps:
- uses: actions/checkout@v4

- name: Set up Node.js
uses: actions/setup-node@v4
with:
node-version: "20"
cache: npm
cache-dependency-path: apps/dashboard/package-lock.json

- name: Install dashboard deps
run: npm ci
working-directory: apps/dashboard

- name: Install E2E deps
run: npm install --save-dev @playwright/test typescript ts-node
working-directory: tests/e2e

- name: Install Playwright browsers
run: npx playwright install --with-deps chromium firefox
working-directory: tests/e2e

- name: Build dashboard
run: npm run build
working-directory: apps/dashboard
env:
VITE_AUTH0_DOMAIN: ${{ secrets.VITE_AUTH0_DOMAIN }}
VITE_AUTH0_CLIENT_ID: ${{ secrets.VITE_AUTH0_CLIENT_ID }}
VITE_AUTH0_AUDIENCE: ${{ secrets.VITE_AUTH0_AUDIENCE }}
VITE_BFF_URL: ${{ secrets.E2E_BFF_URL }}
VITE_GATEWAY_URL: ${{ secrets.E2E_GATEWAY_URL }}

- name: Serve dashboard
run: npx serve -s dist -l 5173 &
working-directory: apps/dashboard

- name: Wait for server
run: npx wait-on http://localhost:5173 --timeout 30000

- name: Run Playwright tests
run: npx playwright test --config playwright.config.ts
working-directory: tests/e2e
env:
E2E_BASE_URL: http://localhost:5173
VITE_AUTH0_CLIENT_ID: ${{ secrets.VITE_AUTH0_CLIENT_ID }}
VITE_AUTH0_AUDIENCE: ${{ secrets.VITE_AUTH0_AUDIENCE }}

- name: Upload Playwright report
uses: actions/upload-artifact@v4
if: always()
with:
name: playwright-report-${{ github.run_number }}
path: tests/e2e/playwright-report/
retention-days: 14

- name: Upload test results (JUnit)
uses: actions/upload-artifact@v4
if: always()
with:
name: playwright-results-${{ github.run_number }}
path: tests/e2e/playwright-results.xml
129 changes: 129 additions & 0 deletions .github/workflows/perf.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,129 @@
name: Performance Budget

on:
pull_request:
branches: [master]
paths:
- "apps/gateway/**"
- "apps/bff/**"
- "scripts/load-tests/**"

jobs:
perf:
name: k6 Performance Budget
runs-on: ubuntu-latest
timeout-minutes: 15

services:
# Spin up the gateway and BFF as Docker Compose services
# so k6 can hit them without needing a live cluster
postgres:
image: postgres:16-alpine
ports:
- 5432:5432
env:
POSTGRES_USER: grainguard
POSTGRES_PASSWORD: grainguard
POSTGRES_DB: grainguard
options: >-
--health-cmd pg_isready
--health-interval 10s
--health-timeout 5s
--health-retries 5

redis:
image: redis:7-alpine
ports:
- 6379:6379
options: >-
--health-cmd "redis-cli ping"
--health-interval 10s
--health-retries 5
Comment thread
coderabbitai[bot] marked this conversation as resolved.

steps:
- uses: actions/checkout@v4

- name: Set up Node.js
uses: actions/setup-node@v4
with:
node-version: "20"
cache: npm
cache-dependency-path: apps/gateway/package-lock.json

- name: Install gateway deps
run: npm ci
working-directory: apps/gateway

- name: Install BFF deps
run: npm ci
working-directory: apps/bff

- name: Start gateway in background
run: npx ts-node src/server.ts &
working-directory: apps/gateway
env:
PORT: 3000
DATABASE_URL: postgres://grainguard:grainguard@localhost:5432/grainguard
REDIS_URL: redis://localhost:6379
JWKS_URL: ${{ secrets.PERF_JWKS_URL }}
JWT_ISSUER: ${{ secrets.PERF_JWT_ISSUER }}
JWT_AUDIENCE: ${{ secrets.PERF_JWT_AUDIENCE }}
ALLOWED_ORIGINS: http://localhost:5173
STRIPE_SECRET_KEY: sk_test_placeholder
STRIPE_WEBHOOK_SECRET: whsec_placeholder
STRIPE_PRICE_STARTER: price_placeholder
STRIPE_PRICE_PROFESSIONAL: price_placeholder
STRIPE_PRICE_ENTERPRISE: price_placeholder
DASHBOARD_URL: http://localhost:5173
AUTH0_DOMAIN: placeholder.auth0.com
AUTH0_MANAGEMENT_CLIENT_ID: placeholder
AUTH0_MANAGEMENT_CLIENT_SECRET: placeholder

- name: Start BFF in background
run: npx ts-node src/server.ts &
working-directory: apps/bff
env:
PORT: 4000
POSTGRES_HOST: localhost
POSTGRES_PORT: 5432
POSTGRES_USER: grainguard
POSTGRES_PASSWORD: grainguard
POSTGRES_DB: grainguard
REDIS_HOST: localhost
REDIS_PORT: 6379
ELASTICSEARCH_URL: http://localhost:9200
CASSANDRA_HOST: localhost
CASSANDRA_PORT: 9042
AUTH0_DOMAIN: placeholder.auth0.com
AUTH0_AUDIENCE: placeholder
AUTH0_ORG_CLAIM: org_id
ALLOWED_ORIGINS: http://localhost:5173
JWT_SECRET: dev-secret

Comment on lines +82 to +102
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

BFF won't boot with this env block.

apps/bff/src/server.ts throws during module load unless JWKS_URL, JWT_ISSUER, and JWT_AUDIENCE are set. This step only provides JWT_SECRET/AUTH0_AUDIENCE, so Wait for BFF never gets a live /graphql endpoint.

Suggested fix
       - name: Start BFF in background
         run: npx ts-node src/server.ts &
         working-directory: apps/bff
         env:
           PORT: 4000
+          JWKS_URL: ${{ secrets.PERF_JWKS_URL }}
+          JWT_ISSUER: ${{ secrets.PERF_JWT_ISSUER }}
+          JWT_AUDIENCE: ${{ secrets.PERF_JWT_AUDIENCE }}
           POSTGRES_HOST: localhost
           POSTGRES_PORT: 5432
           POSTGRES_USER: grainguard
           POSTGRES_PASSWORD: grainguard
           POSTGRES_DB: grainguard
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.github/workflows/perf.yml around lines 82 - 102, The BFF fails at module
load because the env block in the "Start BFF in background" step is missing
required JWT configuration; add JWKS_URL, JWT_ISSUER, and JWT_AUDIENCE
environment variables to that step so apps/bff's startup code (server.ts) can
initialize; set JWKS_URL to a well-known JWKS endpoint derived from AUTH0_DOMAIN
(e.g. https://placeholder.auth0.com/.well-known/jwks.json), set JWT_ISSUER to
the auth0 issuer URL (e.g. https://placeholder.auth0.com/), and set JWT_AUDIENCE
to match AUTH0_AUDIENCE (placeholder) or appropriate audience.

- name: Wait for gateway
run: npx wait-on http://localhost:3000/health --timeout 30000

- name: Wait for BFF
run: npx wait-on http://localhost:4000/graphql --timeout 30000

- name: Install k6
run: |
curl -L https://github.com/grafana/k6/releases/download/v0.51.0/k6-v0.51.0-linux-amd64.tar.gz | tar xz
sudo mv k6-v0.51.0-linux-amd64/k6 /usr/local/bin/k6

- name: Run performance budget
run: |
k6 run \
--env GATEWAY_URL=http://localhost:3000 \
--env BFF_URL=http://localhost:4000 \
scripts/load-tests/performance-budget.js
Comment on lines +114 to +119
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

The perf step never exercises the secured endpoints.

scripts/load-tests/performance-budget.js skips both /devices/:id/latest and /graphql when JWT is empty, and this step never passes one. Right now the workflow only measures /health, so BFF latency and auth-protected gateway regressions are invisible.

Suggested fix
       - name: Run performance budget
         run: |
           k6 run \
             --env GATEWAY_URL=http://localhost:3000 \
             --env BFF_URL=http://localhost:4000 \
+            --env JWT=${{ secrets.PERF_JWT }} \
             scripts/load-tests/performance-budget.js
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.github/workflows/perf.yml around lines 114 - 119, The perf step never
exercises secured endpoints because scripts/load-tests/performance-budget.js
skips /devices/:id/latest and /graphql when JWT is empty; update the "Run
performance budget" job so it provides a non-empty JWT env var (e.g., set JWT
via an action that obtains a test/service token or inject a static test token)
when invoking k6, ensuring the script sees process.env.JWT and will exercise the
protected routes; reference the Run performance budget step and
scripts/load-tests/performance-budget.js when making the change.

# k6 exits 99 if thresholds are breached — this step fails and blocks the PR

- name: Upload performance results
uses: actions/upload-artifact@v4
if: always()
with:
name: perf-results-${{ github.run_number }}
path: scripts/load-tests/results/
retention-days: 30
if-no-files-found: ignore
Loading
Loading