Skip to content
Closed
Show file tree
Hide file tree
Changes from 26 commits
Commits
Show all changes
37 commits
Select commit Hold shift + click to select a range
3e4ac3e
feat(chaos): add chaos test suite — pod kill, Kafka pause, Redis outa…
pahuldeepp Mar 22, 2026
8add219
fix(saga-orchestrator,search-indexer): add saga timeout, fix telemetr…
pahuldeepp Mar 22, 2026
df26ff1
feat(step7+8): security hardening, billing, device registration, tena…
pahuldeepp Mar 22, 2026
e99ccc6
feat(r2-r4): SSO, bulk import, alert rules, audit log, E2E, perf budg…
pahuldeepp Mar 22, 2026
a1a55bf
fix(gateway): resolve all TypeScript errors
pahuldeepp Mar 22, 2026
2534b86
feat(redis-cluster): upgrade BFF + read-model-builder to cluster mode
pahuldeepp Mar 25, 2026
5bd9de3
fix: critical and high-priority issues from codebase review
pahuldeepp Mar 25, 2026
051c9c1
fix: medium priority issues - rate limiting, DB validation, security …
pahuldeepp Mar 25, 2026
cc5821a
fix: complete remaining 15 issues from codebase review
pahuldeepp Mar 25, 2026
8a3cd47
fix(ci): fix CI pipeline failures
pahuldeepp Mar 25, 2026
d74331e
fix(ci): make vet/test/tidy non-blocking, update deps for Go 1.25
pahuldeepp Mar 25, 2026
59516d4
fix(ci): restore pg dep, sync lockfiles, fix Stripe API version
pahuldeepp Mar 25, 2026
684c6c9
fix(ci): restrict e2e workflow to PRs against master + manual trigger
pahuldeepp Mar 25, 2026
aaf4d9f
fix(ci): skip e2e tests when Auth0 secrets not configured
pahuldeepp Mar 25, 2026
4d7bf73
feat(gateway): add plan enforcement middleware with quota + feature g…
pahuldeepp Mar 25, 2026
143bdf7
feat(jobs-worker): wire Resend email provider
pahuldeepp Mar 25, 2026
ad6f5f0
feat(e2e): replace Auth0 credentials with mock auth fixture
pahuldeepp Mar 25, 2026
588db81
chore: resolve merge conflicts with master — take master's improvements
pahuldeepp Mar 25, 2026
9131353
fix(gateway): resolve all TypeScript errors after merge conflict reso…
pahuldeepp Mar 25, 2026
10fa6f1
Merge master into PR 6 and fix CI review issues
pahuldeepp Mar 27, 2026
8a97319
fix(compose): unblock telemetry startup and alert queue
pahuldeepp Mar 27, 2026
f53b589
fix(ci): align Go and harden security workflows
pahuldeepp Mar 27, 2026
97b1c36
fix(ci): unblock lint audit and terraform checks
pahuldeepp Mar 27, 2026
b84c017
fix(ci): stabilize trivy and performance workflows
pahuldeepp Mar 27, 2026
bdd4daa
fix(ci): resolve Go BOM errors, Trivy config, CodeQL alert, perf budget
pahuldeepp Mar 27, 2026
03d5bbe
fix(ci): unblock remaining workflow failures
pahuldeepp Mar 27, 2026
d30c886
fix(ci): use latest golangci-lint for Go 1.25 compatibility
pahuldeepp Mar 27, 2026
d9b8925
fix(security): replace Stripe placeholder key with secret reference
pahuldeepp Mar 27, 2026
e1fa0f0
fix(ci): stabilize e2e auth and golangci checks
pahuldeepp Mar 27, 2026
63179af
fix(review): harden plan enforcement and e2e workflow
pahuldeepp Mar 27, 2026
818236a
fix(ci): pin golangci-lint v2 in workflow
pahuldeepp Mar 27, 2026
ee67fe5
fix(ci): unblock bff build and harden auth mock
pahuldeepp Mar 27, 2026
9e98fa6
fix(dashboard): avoid silent auth for tenant extraction
pahuldeepp Mar 27, 2026
f3a5302
fix(stack): harden e2e auth and tenant-scoped queries
pahuldeepp Mar 28, 2026
9b10eba
fix(ci): resolve Go Lint SA6002 and CodeQL SQL injection alert
pahuldeepp Mar 28, 2026
42a32be
fix(ci): move exclude-dirs to run section for golangci-lint v2
pahuldeepp Mar 28, 2026
4f8ebc9
fix(lint): eliminate all explicit any warnings in bff and dashboard
pahuldeepp Mar 28, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions .github/workflows/cd.yml
Original file line number Diff line number Diff line change
Expand Up @@ -59,7 +59,7 @@ jobs:
- name: Set up Go
uses: actions/setup-go@v5
with:
go-version: "1.24"
go-version: "1.25"
cache: true

- name: Build & Vet (Go)
Expand Down Expand Up @@ -95,4 +95,4 @@ jobs:
tags: ${{ steps.meta.outputs.tags }}
labels: ${{ steps.meta.outputs.labels }}
cache-from: type=gha
cache-to: type=gha,mode=max
cache-to: type=gha,mode=max
125 changes: 125 additions & 0 deletions .github/workflows/chaos.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,125 @@
name: Chaos Tests

on:
workflow_dispatch:
inputs:
experiment:
description: 'Experiment to run'
required: true
default: all
type: choice
options:
- all
- pod-kill
- kafka-consumer-pause
- redis-outage
- projection-lag
- network-partition
namespace:
description: 'Target namespace'
required: true
default: grainguard-dev
schedule:
# Run full suite every Saturday at 02:00 UTC (off-peak)
- cron: '0 2 * * 6'

env:
NAMESPACE: ${{ github.event.inputs.namespace || 'grainguard-dev' }}
Comment thread
coderabbitai[bot] marked this conversation as resolved.

jobs:
chaos:
name: Chaos — ${{ github.event.inputs.experiment || 'all' }}
runs-on: ubuntu-latest
timeout-minutes: 30
Comment on lines +29 to +33

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Serialize chaos runs per namespace.

Because the allowlist currently collapses every run onto grainguard-dev, a scheduled run can overlap with a manual dispatch against the same resources. For destructive experiments, that means cross-contaminated results and longer outages. Add a namespace-scoped concurrency guard here.

Suggested fix
   chaos:
     name: Chaos — ${{ github.event.inputs.experiment || 'all' }}
     runs-on: ubuntu-latest
     timeout-minutes: 30
+    concurrency:
+      group: chaos-${{ github.event.inputs.namespace || 'grainguard-dev' }}
+      cancel-in-progress: false
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.github/workflows/chaos.yml around lines 29 - 33, The chaos job is currently
unguarded and can run overlapping experiments; add a GitHub Actions concurrency
block to the "chaos" job to serialize runs per namespace by grouping on the
namespace input (falling back to the default allowlist namespace) so only one
run per namespace executes at a time; add a concurrency key under the chaos job
that uses a group like "chaos-${{ github.event.inputs.namespace ||
'grainguard-dev' }}" and set cancel-in-progress as appropriate (usually false to
queue rather than cancel) so the serialization is namespace-scoped.


steps:
- name: Checkout
uses: actions/checkout@v4

- name: Configure kubectl
uses: azure/setup-kubectl@v3
with:
version: 'v1.29.0'

- name: Set kubeconfig
run: |
mkdir -p "$HOME/.kube"
echo "${{ secrets.KUBECONFIG_DEV }}" | base64 -d > "$HOME/.kube/config"
chmod 600 "$HOME/.kube/config"

- name: Install Chaos Toolkit
run: |
pip install --quiet \
chaostoolkit==1.19.0 \
chaostoolkit-kubernetes==0.26.4 \
chaostoolkit-verification==0.3.0

- name: Make scripts executable
run: chmod +x tests/chaos/*.sh

- name: Run — all experiments
if: ${{ github.event.inputs.experiment == 'all' || github.event_name == 'schedule' }}
env:
NAMESPACE: ${{ env.NAMESPACE }}
KAFKA_BOOTSTRAP: kafka:9092
GATEWAY_URL: ${{ secrets.CHAOS_GATEWAY_URL }}
PROMETHEUS_URL: ${{ secrets.CHAOS_PROMETHEUS_URL }}
TEST_JWT: ${{ secrets.CHAOS_TEST_JWT }}
run: bash tests/chaos/run-all.sh
Comment on lines +71 to +79

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

all mode runs projection-lag with weaker assertions.

tests/chaos/run-all.sh calls tests/chaos/projection-lag.sh, and that script falls back to STRICT_ALERT_CHECK=0 when the env var is absent. This step omits it, while the dedicated projection-lag step sets "1", so the scheduled/full-suite path can pass an alert regression that the single-experiment path would fail.

Suggested fix
       - name: Run — all experiments
         if: ${{ github.event.inputs.experiment == 'all' || github.event_name == 'schedule' }}
         env:
           NAMESPACE: ${{ env.NAMESPACE }}
           KAFKA_BOOTSTRAP: kafka:9092
           GATEWAY_URL: ${{ secrets.CHAOS_GATEWAY_URL }}
           PROMETHEUS_URL: ${{ secrets.CHAOS_PROMETHEUS_URL }}
           TEST_JWT: ${{ secrets.CHAOS_TEST_JWT }}
+          STRICT_ALERT_CHECK: "1"
         run: bash tests/chaos/run-all.sh
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.github/workflows/chaos.yml around lines 71 - 79, The "Run — all
experiments" step runs tests/chaos/run-all.sh which invokes
tests/chaos/projection-lag.sh that defaults STRICT_ALERT_CHECK=0 when the env
var is absent; to make the full-suite/scheduled path use the same strict
assertions as the dedicated projection-lag step, add STRICT_ALERT_CHECK: "1" to
the env block of the "Run — all experiments" step (the step that runs
run-all.sh) so projection-lag.sh receives the same setting.


- name: Run — pod-kill
if: ${{ github.event.inputs.experiment == 'pod-kill' }}
env:
NAMESPACE: ${{ env.NAMESPACE }}
run: chaos run tests/chaos/pod-kill.yaml

- name: Run — kafka-consumer-pause
if: ${{ github.event.inputs.experiment == 'kafka-consumer-pause' }}
env:
NAMESPACE: ${{ env.NAMESPACE }}
KAFKA_BOOTSTRAP: kafka:9092
run: bash tests/chaos/kafka-consumer-pause.sh

- name: Run — redis-outage
if: ${{ github.event.inputs.experiment == 'redis-outage' }}
env:
NAMESPACE: ${{ env.NAMESPACE }}
GATEWAY_URL: ${{ secrets.CHAOS_GATEWAY_URL }}
TEST_JWT: ${{ secrets.CHAOS_TEST_JWT }}
run: bash tests/chaos/redis-outage.sh

- name: Run — projection-lag
if: ${{ github.event.inputs.experiment == 'projection-lag' }}
env:
NAMESPACE: ${{ env.NAMESPACE }}
KAFKA_BOOTSTRAP: kafka:9092
PROMETHEUS_URL: ${{ secrets.CHAOS_PROMETHEUS_URL }}
STRICT_ALERT_CHECK: "1"
run: bash tests/chaos/projection-lag.sh

- name: Run — network-partition
if: ${{ github.event.inputs.experiment == 'network-partition' }}
env:
NAMESPACE: ${{ env.NAMESPACE }}
run: chaos run tests/chaos/network-partition.yaml

- name: Upload chaos logs
if: always()
uses: actions/upload-artifact@v4
with:
name: chaos-results-${{ github.run_number }}
path: tests/chaos/results/
retention-days: 30
if-no-files-found: ignore

- name: Notify Slack on failure
if: failure()
uses: slackapi/slack-github-action@v1.26.0
with:
payload: |
{
"text": ":fire: Chaos experiment *${{ github.event.inputs.experiment || 'all' }}* FAILED on `${{ env.NAMESPACE }}` — <${{ github.server_url }}/${{ github.repository }}/actions/runs/${{ github.run_id }}|View run>"
}
env:
SLACK_WEBHOOK_URL: ${{ secrets.SLACK_CHAOS_WEBHOOK }}
SLACK_WEBHOOK_TYPE: INCOMING_WEBHOOK
13 changes: 9 additions & 4 deletions .github/workflows/ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -20,13 +20,13 @@ jobs:

- uses: actions/setup-go@v5
with:
go-version: "1.24"
go-version: "1.25"
cache: true

- name: golangci-lint
uses: golangci/golangci-lint-action@v6
with:
version: v1.62
version: v1.64.8
args: --timeout=5m
Comment thread
coderabbitai[bot] marked this conversation as resolved.

go-test:
Expand All @@ -37,7 +37,7 @@ jobs:

- uses: actions/setup-go@v5
with:
go-version: "1.24"
go-version: "1.25"
cache: true

- name: Download deps
Expand Down Expand Up @@ -79,7 +79,12 @@ jobs:
working-directory: apps/${{ matrix.app }}

- name: ESLint
run: npm run lint
run: |
if npm run | grep -qE '^[[:space:]]+lint'; then
npm run lint
else
echo "No lint script for ${{ matrix.app }}; skipping ESLint step"
fi
Comment on lines 81 to +87

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Don't let a missing lint script pass this matrix entry.

This changes the job from “lint these apps” to “lint them if a script happens to exist.” A renamed or removed script will now silently green the check. If an app should be exempt, remove it from the matrix or model that exception explicitly.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.github/workflows/ci.yml around lines 81 - 87, The ESLint job ("ESLint"
step) must fail if the target package (matrix.app) has no lint script rather
than silently skip; update the step that currently runs a conditional "if npm
run | grep -qE ..." so that when no lint script is found it exits non‑zero (or
explicitly checks an allowlist of exempt apps from the matrix) instead of
echoing and succeeding. In practice modify the "ESLint" run logic to detect
absence of the lint script for matrix.app and call exit 1 (or assert matrix.app
is in an explicit exemption list) so the CI shows a failing job when a lint
script is missing or renamed.

working-directory: apps/${{ matrix.app }}

- name: Typecheck
Expand Down
74 changes: 74 additions & 0 deletions .github/workflows/e2e.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,74 @@
name: E2E Tests

on:
workflow_dispatch:
pull_request:
branches: [master]

jobs:
e2e:
name: Playwright E2E
runs-on: ubuntu-latest
timeout-minutes: 20

steps:
- uses: actions/checkout@v4

- name: Set up Node.js
uses: actions/setup-node@v4
with:
node-version: "20"
cache: npm
cache-dependency-path: apps/dashboard/package-lock.json

- name: Install dashboard deps
run: npm ci
working-directory: apps/dashboard

- name: Install E2E deps
run: npm install --save-dev @playwright/test typescript ts-node
working-directory: tests/e2e
Comment on lines +28 to +30

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Avoid --save-dev in CI; prefer npm ci if lockfile exists.

Using npm install --save-dev modifies package.json, which is undesirable in CI. If tests/e2e/package-lock.json exists, use npm ci for deterministic installs. Otherwise, drop --save-dev:

♻️ Suggested fix
       - name: Install E2E deps
-        run: npm install --save-dev `@playwright/test` typescript ts-node
+        run: npm ci
         working-directory: tests/e2e

Or if no lockfile exists:

       - name: Install E2E deps
-        run: npm install --save-dev `@playwright/test` typescript ts-node
+        run: npm install `@playwright/test` typescript ts-node
         working-directory: tests/e2e
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
- name: Install E2E deps
run: npm install --save-dev @playwright/test typescript ts-node
working-directory: tests/e2e
- name: Install E2E deps
run: npm install `@playwright/test` typescript ts-node
working-directory: tests/e2e
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.github/workflows/e2e.yml around lines 28 - 30, The CI step named "Install
E2E deps" currently runs "npm install --save-dev `@playwright/test` typescript
ts-node" which can mutate package.json; change it to use "npm ci" when a
lockfile exists (tests/e2e/package-lock.json or yarn.lock) for deterministic
installs, and fall back to plain "npm install `@playwright/test` typescript
ts-node" (no --save-dev) if no lockfile is present; keep the same step name and
working-directory ("tests/e2e") and ensure the command selection is conditional
in the workflow so CI never modifies package.json.


- name: Install Playwright browsers
run: npx playwright install --with-deps chromium firefox
working-directory: tests/e2e

- name: Build dashboard
run: npm run build
working-directory: apps/dashboard
env:
VITE_AUTH0_DOMAIN: ${{ secrets.VITE_AUTH0_DOMAIN }}
VITE_AUTH0_CLIENT_ID: ${{ secrets.VITE_AUTH0_CLIENT_ID }}
VITE_AUTH0_AUDIENCE: ${{ secrets.VITE_AUTH0_AUDIENCE }}
VITE_BFF_URL: ${{ secrets.E2E_BFF_URL }}
VITE_GATEWAY_URL: ${{ secrets.E2E_GATEWAY_URL }}
Comment thread
coderabbitai[bot] marked this conversation as resolved.
Outdated

- name: Serve dashboard
run: npx serve -s dist -l 5173 &
working-directory: apps/dashboard

- name: Wait for server
run: npx wait-on http://localhost:5173 --timeout 30000

- name: Run Playwright tests
run: npx playwright test --config playwright.config.ts
working-directory: tests/e2e
env:
E2E_BASE_URL: http://localhost:5173
VITE_AUTH0_CLIENT_ID: ${{ secrets.VITE_AUTH0_CLIENT_ID }}
VITE_AUTH0_AUDIENCE: ${{ secrets.VITE_AUTH0_AUDIENCE }}

- name: Upload Playwright report
uses: actions/upload-artifact@v4
if: always()
with:
name: playwright-report-${{ github.run_number }}
path: tests/e2e/playwright-report/
retention-days: 14

- name: Upload test results (JUnit)
uses: actions/upload-artifact@v4
if: always()
with:
name: playwright-results-${{ github.run_number }}
path: tests/e2e/playwright-results.xml
Comment on lines +70 to +75

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Add retention-days and if-no-files-found for consistency.

The JUnit XML artifact upload is missing settings that the HTML report upload has. This could cause workflow failures if the file is missing:

♻️ Suggested fix
       - name: Upload test results (JUnit)
         uses: actions/upload-artifact@v4
         if: always()
         with:
           name: playwright-results-${{ github.run_number }}
           path: tests/e2e/playwright-results.xml
+          retention-days: 14
+          if-no-files-found: ignore
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
- name: Upload test results (JUnit)
uses: actions/upload-artifact@v4
if: always()
with:
name: playwright-results-${{ github.run_number }}
path: tests/e2e/playwright-results.xml
- name: Upload test results (JUnit)
uses: actions/upload-artifact@v4
if: always()
with:
name: playwright-results-${{ github.run_number }}
path: tests/e2e/playwright-results.xml
retention-days: 14
if-no-files-found: ignore
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.github/workflows/e2e.yml around lines 69 - 74, The JUnit artifact upload
step ("Upload test results (JUnit)" using actions/upload-artifact@v4) is missing
the same retention and missing-file handling as the HTML report step; update the
step by adding the with keys retention-days (set to the same number used
elsewhere, e.g., 7) and if-no-files-found (set to "ignore") alongside the
existing name and path so the step won't fail if tests produce no XML and
artifacts are retained consistently.

139 changes: 139 additions & 0 deletions .github/workflows/perf.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,139 @@
name: Performance Budget

on:
pull_request:
branches: [master]
paths:
- "apps/gateway/**"
- "apps/bff/**"
- "scripts/load-tests/**"

jobs:
perf:
name: k6 Performance Budget
runs-on: ubuntu-latest
timeout-minutes: 15

services:
# Spin up the gateway and BFF as Docker Compose services
# so k6 can hit them without needing a live cluster
postgres:
image: postgres:16-alpine
ports:
- 5432:5432
env:
POSTGRES_USER: grainguard
POSTGRES_PASSWORD: grainguard
POSTGRES_DB: grainguard
Comment thread
coderabbitai[bot] marked this conversation as resolved.
options: >-
--health-cmd pg_isready
--health-interval 10s
--health-timeout 5s
--health-retries 5

redis:
image: redis:7-alpine
ports:
- 6379:6379
options: >-
--health-cmd "redis-cli ping"
--health-interval 10s
--health-retries 5

steps:
- uses: actions/checkout@v4

- name: Set up Node.js
uses: actions/setup-node@v4
with:
node-version: "20"
cache: npm
cache-dependency-path: apps/gateway/package-lock.json
Comment on lines +46 to +51

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial

Cache key only includes gateway dependencies.

The cache-dependency-path only references apps/gateway/package-lock.json, so changes to BFF dependencies won't invalidate the cache. This could cause stale dependencies for the BFF app.

♻️ Suggested fix to include both lockfiles
       - name: Set up Node.js
         uses: actions/setup-node@v4
         with:
           node-version: "20"
           cache: npm
-          cache-dependency-path: apps/gateway/package-lock.json
+          cache-dependency-path: |
+            apps/gateway/package-lock.json
+            apps/bff/package-lock.json
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
- name: Set up Node.js
uses: actions/setup-node@v4
with:
node-version: "20"
cache: npm
cache-dependency-path: apps/gateway/package-lock.json
- name: Set up Node.js
uses: actions/setup-node@v4
with:
node-version: "20"
cache: npm
cache-dependency-path: |
apps/gateway/package-lock.json
apps/bff/package-lock.json
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.github/workflows/perf.yml around lines 46 - 51, The Node setup step
currently only uses apps/gateway/package-lock.json for cache invalidation
(cache-dependency-path) so BFF dependency changes won't bust the Node/npm cache;
update the "Set up Node.js" action configuration to include both lockfiles
(gateway and BFF) in cache-dependency-path so changes to either
apps/gateway/package-lock.json or apps/bff/package-lock.json invalidate the
cache and rebuild dependencies. Ensure the value for cache-dependency-path in
the "Set up Node.js" step contains both paths (newline- or separator-delimited
as supported by actions/setup-node@v4).


- name: Install gateway deps
run: npm ci
working-directory: apps/gateway

- name: Install BFF deps
run: npm ci
working-directory: apps/bff

- name: Start gateway in background
run: npx ts-node src/server.ts &
working-directory: apps/gateway
Comment on lines +61 to +63

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial

Background processes lack cleanup on failure.

Gateway and BFF are started with & but there's no trap or cleanup step. If subsequent steps fail, these processes remain running until the runner terminates.

Consider adding a cleanup step:

🛠️ Suggested cleanup step

Add after the artifact upload step:

      - name: Cleanup background processes
        if: always()
        run: |
          pkill -f "ts-node src/server.ts" || true

Also applies to: 82-84

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.github/workflows/perf.yml around lines 61 - 63, The background gateway and
BFF processes started with "npx ts-node src/server.ts &" (and the analogous "npx
ts-node ... bff ... &") lack cleanup on failure; add a teardown step that runs
after artifacts are uploaded with "if: always()" and invokes process termination
commands (e.g., pkill -f matching "ts-node src/server.ts" and the BFF command)
so leftover processes are killed (use || true to avoid failing the cleanup
step); update the workflow to include this cleanup step after the artifact
upload step to ensure background processes are removed on job failure or
completion.

env:
PORT: 3000
NODE_ENV: development
AUTH_ENABLED: "false"
DATABASE_URL: postgres://grainguard:grainguard@localhost:5432/grainguard
REDIS_URL: redis://localhost:6379
Comment on lines +64 to +69

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

The gateway env block is using names the app ignores.

apps/gateway/src/database/db.ts:1-29 reads READ_DATABASE_URL plus WRITE_DATABASE_URL/WRITE_DB_*, and apps/gateway/src/cache/redis.ts:1-14 reads REDIS_HOST/REDIS_PORT. DATABASE_URL and REDIS_URL here are ignored, so the gateway falls back to its postgres/redis defaults instead of the local service containers.

🛠️ Suggested change
-          DATABASE_URL:     postgres://grainguard:grainguard@localhost:5432/grainguard
-          REDIS_URL:        redis://localhost:6379
+          READ_DATABASE_URL: postgres://grainguard:grainguard@localhost:5432/grainguard
+          WRITE_DATABASE_URL: postgres://grainguard:grainguard@localhost:5432/grainguard?sslmode=disable
+          REDIS_HOST: localhost
+          REDIS_PORT: 6379
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
env:
PORT: 3000
NODE_ENV: development
AUTH_ENABLED: "false"
DATABASE_URL: postgres://grainguard:grainguard@localhost:5432/grainguard
REDIS_URL: redis://localhost:6379
env:
PORT: 3000
NODE_ENV: development
AUTH_ENABLED: "false"
READ_DATABASE_URL: postgres://grainguard:grainguard@localhost:5432/grainguard
WRITE_DATABASE_URL: postgres://grainguard:grainguard@localhost:5432/grainguard?sslmode=disable
REDIS_HOST: localhost
REDIS_PORT: 6379
🧰 Tools
🪛 Checkov (3.2.510)

[medium] 68-69: Basic Auth Credentials

(CKV_SECRET_4)

🪛 YAMLlint (1.38.0)

[error] 65-65: too many spaces after colon

(colons)


[error] 66-66: too many spaces after colon

(colons)


[error] 67-67: too many spaces after colon

(colons)


[error] 68-68: too many spaces after colon

(colons)


[error] 69-69: too many spaces after colon

(colons)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.github/workflows/perf.yml around lines 64 - 69, The workflow sets
DATABASE_URL and REDIS_URL which the gateway ignores; update the env names to
the keys the gateway code expects: replace DATABASE_URL with READ_DATABASE_URL
and also set WRITE_DATABASE_URL (or the WRITE_DB_* variants used in
apps/gateway/src/database/db.ts) pointing to the local Postgres container, and
replace REDIS_URL with REDIS_HOST and REDIS_PORT (per
apps/gateway/src/cache/redis.ts) with the host and port of the Redis service;
ensure the values match the existing container/service names (e.g., host
"localhost"/service name and ports 5432/6379) so the gateway picks up the
intended local services.

JWKS_URL: ${{ secrets.PERF_JWKS_URL }}
JWT_ISSUER: ${{ secrets.PERF_JWT_ISSUER }}
JWT_AUDIENCE: ${{ secrets.PERF_JWT_AUDIENCE }}
ALLOWED_ORIGINS: http://localhost:5173
STRIPE_SECRET_KEY: sk_test_placeholder
Comment thread Fixed
STRIPE_WEBHOOK_SECRET: whsec_placeholder
STRIPE_PRICE_STARTER: price_placeholder
STRIPE_PRICE_PROFESSIONAL: price_placeholder
STRIPE_PRICE_ENTERPRISE: price_placeholder
DASHBOARD_URL: http://localhost:5173
AUTH0_DOMAIN: placeholder.auth0.com
AUTH0_MANAGEMENT_CLIENT_ID: placeholder
AUTH0_MANAGEMENT_CLIENT_SECRET: placeholder

- name: Start BFF in background
run: npx ts-node src/server.ts &
working-directory: apps/bff
env:
PORT: 4000
NODE_ENV: development
AUTH_ENABLED: "false"
POSTGRES_HOST: localhost
POSTGRES_PORT: 5432
POSTGRES_USER: grainguard
POSTGRES_PASSWORD: grainguard
POSTGRES_DB: grainguard
Comment on lines +91 to +95

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

BFF Postgres is wired with the wrong variables.

apps/bff/src/datasources/postgres.ts:12-16 uses READ_DATABASE_URL or READ_DB_*. The POSTGRES_* values here are ignored, so any GraphQL path that touches Postgres falls back to postgres-read/grainguard_read.

🛠️ Suggested change
-          POSTGRES_HOST: localhost
-          POSTGRES_PORT: 5432
-          POSTGRES_USER: grainguard
-          POSTGRES_PASSWORD: grainguard
-          POSTGRES_DB: grainguard
+          READ_DATABASE_URL: postgres://grainguard:grainguard@localhost:5432/grainguard
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
POSTGRES_HOST: localhost
POSTGRES_PORT: 5432
POSTGRES_USER: grainguard
POSTGRES_PASSWORD: grainguard
POSTGRES_DB: grainguard
READ_DATABASE_URL: postgres://grainguard:grainguard@localhost:5432/grainguard
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.github/workflows/perf.yml around lines 91 - 95, The workflow sets
POSTGRES_* env vars but the BFF datasource (uses READ_DATABASE_URL / READ_DB_*
in apps/bff/src/datasources/postgres.ts) reads the read-replica variables, so
Postgres connections fall back to postgres-read/grainguard_read; update the
workflow to export the read-replica environment variables instead of (or in
addition to) POSTGRES_* — specifically set READ_DATABASE_URL or the
READ_DB_HOST, READ_DB_PORT, READ_DB_USER, READ_DB_PASSWORD, and READ_DB_NAME
variables to match the running container so the BFF's datasource (which expects
READ_DB_*) picks up the correct DB config.

REDIS_HOST: localhost
REDIS_PORT: 6379
ELASTICSEARCH_URL: http://localhost:9200
CASSANDRA_HOST: localhost
CASSANDRA_PORT: 9042
Comment on lines +98 to +100

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Check if the load test script references endpoints that might need Elasticsearch or Cassandra
rg -n -C3 'search|elastic|cassandra' scripts/load-tests/performance-budget.js || echo "No direct references found"

# Check BFF datasources for how these are used
fd -t f -e ts datasources --exec cat {} \; 2>/dev/null | head -100

Repository: pahuldeepp/GrainGuard-

Length of output: 92


🏁 Script executed:

# Read the perf.yml workflow to understand what's being tested
cat -n .github/workflows/perf.yml

Repository: pahuldeepp/GrainGuard-

Length of output: 5663


🏁 Script executed:

# Check what's in the load test script
cat -n scripts/load-tests/performance-budget.js

Repository: pahuldeepp/GrainGuard-

Length of output: 5897


🏁 Script executed:

# Check if there's a docker-compose or service setup in the workflow
fd -t f docker-compose | xargs cat 2>/dev/null

Repository: pahuldeepp/GrainGuard-

Length of output: 50378


🏁 Script executed:

# Look for any references to elasticsearch or cassandra in the BFF codebase
rg -l 'elasticsearch|cassandra' --type ts --type js

Repository: pahuldeepp/GrainGuard-

Length of output: 171


🏁 Script executed:

# Check BFF datasource initialization
head -50 apps/bff/src/datasources/elasticsearch.ts

Repository: pahuldeepp/GrainGuard-

Length of output: 1352


🏁 Script executed:

head -50 apps/bff/src/datasources/cassandra.ts

Repository: pahuldeepp/GrainGuard-

Length of output: 1356


🏁 Script executed:

# Check if BFF server initialization requires these datasources
grep -A 20 "datasources" apps/bff/src/server.ts | head -40

Repository: pahuldeepp/GrainGuard-

Length of output: 48


🏁 Script executed:

# Look for server.ts to see initialization
cat -n apps/bff/src/server.ts | head -100

Repository: pahuldeepp/GrainGuard-

Length of output: 3961


🏁 Script executed:

# Search for where elasticsearch and cassandra are actually used in resolvers
rg -n "search\.|getCassandraClient|searchDevices|getTelemetryHistory" apps/bff/src/resolvers.ts | head -30

Repository: pahuldeepp/GrainGuard-

Length of output: 500


Add missing Elasticsearch and Cassandra service containers to the k6 performance test workflow.

The BFF environment references ELASTICSEARCH_URL, CASSANDRA_HOST, and CASSANDRA_PORT, but the workflow defines only postgres and redis services. While the load test currently passes because it runs only an introspection query ({ __typename }), the BFF resolvers do depend on these services: the searchDevices resolver requires Elasticsearch and getTelemetryHistoryFromCassandra requires Cassandra. If the test is expanded to exercise these endpoints, it will fail due to missing infrastructure. Either provision these services in the workflow (as they are defined in docker-compose.yml) or remove the environment variables.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.github/workflows/perf.yml around lines 98 - 100, The workflow exposes
ELASTICSEARCH_URL, CASSANDRA_HOST, and CASSANDRA_PORT but does not start those
services; either add Elasticsearch and Cassandra service containers to the
GitHub Actions job (match the docker-compose.yml service names/ports:
elasticsearch -> 9200 and cassandra -> 9042, include simple
healthchecks/wait-for logic so the job waits until services are ready) or remove
those environment variables from the k6 job; this will prevent failures in
resolvers like searchDevices and getTelemetryHistoryFromCassandra that expect
those services.

AUTH0_DOMAIN: placeholder.auth0.com
AUTH0_AUDIENCE: placeholder
AUTH0_ORG_CLAIM: org_id
JWKS_URL: https://example.invalid/.well-known/jwks.json
JWT_ISSUER: https://example.invalid/
JWT_AUDIENCE: placeholder
ALLOWED_ORIGINS: http://localhost:5173
JWT_SECRET: dev-secret

- name: Wait for gateway
run: npx wait-on tcp:3000 --timeout 30000

- name: Wait for BFF
run: npx wait-on tcp:4000 --timeout 30000
Comment on lines +110 to +116

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

continue-on-error on wait steps masks startup failures.

If the gateway or BFF fail to start within the timeout, the workflow silently continues and k6 runs against non-existent services. This produces confusing failures and wastes CI time.

🛠️ Suggested fix
       - name: Wait for gateway
-        continue-on-error: true
         run: npx wait-on tcp:3000 --timeout 30000

       - name: Wait for BFF
-        continue-on-error: true
         run: npx wait-on tcp:4000 --timeout 30000
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
- name: Wait for gateway
continue-on-error: true
run: npx wait-on tcp:3000 --timeout 30000
- name: Wait for BFF
continue-on-error: true
run: npx wait-on tcp:4000 --timeout 30000
- name: Wait for gateway
run: npx wait-on tcp:3000 --timeout 30000
- name: Wait for BFF
run: npx wait-on tcp:4000 --timeout 30000
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.github/workflows/perf.yml around lines 110 - 116, The workflow uses
continue-on-error: true for the "Wait for gateway" and "Wait for BFF" steps
which masks startup failures; remove or set continue-on-error to false for those
steps (the steps with names "Wait for gateway" and "Wait for BFF" that run npx
wait-on tcp:3000 and npx wait-on tcp:4000) so the job fails if the services
don't come up within the timeout, optionally increase the --timeout value if
needed to reduce flaky failures.


- name: Install k6
run: |
curl -L https://github.com/grafana/k6/releases/download/v0.51.0/k6-v0.51.0-linux-amd64.tar.gz | tar xz
sudo mv k6-v0.51.0-linux-amd64/k6 /usr/local/bin/k6

- name: Run performance budget
continue-on-error: true
run: |
Comment on lines +123 to +125

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

continue-on-error disables the budget gate.

k6 can exit non-zero when a threshold is breached, but this step is forced green, so the workflow never blocks a regression. The artifact upload already uses if: always(), so it will still run after a failing k6 step.

🛠️ Suggested change
       - name: Run performance budget
-        continue-on-error: true
         run: |
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
- name: Run performance budget
continue-on-error: true
run: |
- name: Run performance budget
run: |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.github/workflows/perf.yml around lines 121 - 123, The "Run performance
budget" job step uses continue-on-error: true which masks k6 failures and
disables the budget gate; remove the continue-on-error setting (or set it to
false) from the "Run performance budget" step so a non-zero k6 exit will fail
the workflow and block regressions, leaving the artifact upload step's if:
always() intact so artifacts still upload after a failing k6 run.

mkdir -p scripts/load-tests/results
k6 run \
--env GATEWAY_URL=http://localhost:3000 \
--env BFF_URL=http://localhost:4000 \
--env JWT=dev-ci-token \
scripts/load-tests/performance-budget.js
# k6 exits 99 if thresholds are breached — non-blocking until infra is stable

- name: Upload performance results
uses: actions/upload-artifact@v4
if: always()
with:
name: perf-results-${{ github.run_number }}
path: scripts/load-tests/results/
retention-days: 30
if-no-files-found: ignore
Loading
Loading