feat(chart): real-worker smoke test gates Helm upgrades on actual run completion#878
Conversation
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Plus Run ID: 📒 Files selected for processing (6)
🚧 Files skipped from review as they are similar to previous changes (4)
📝 WalkthroughWalkthroughAdds a worker smoke-test capability: Helm values + Job changes to run an optional worker-smoke phase, a new internal POST /api/internal/smoke/dispatch endpoint (auth, idempotent run enqueue, pg_notify), tests, and job-side polling of agent_transcript_snapshot for terminal completion. ChangesWorker Smoke Test End-to-End
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Warning There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure. 🔧 ESLint
ESLint skipped: no ESLint configuration detected in root package.json. To enable, add Comment |
|
Codecov Report✅ All modified and coverable lines are covered by tests. 📢 Thoughts on this report? Let us know! |
There was a problem hiding this comment.
Actionable comments posted: 3
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@charts/lobu/templates/smoke-test-job.yaml`:
- Around line 34-50: The smoke-test Job currently only pulls DATABASE_URL from
.Values.secrets.stringData or the deployment secret (via include
"lobu.secretName") and can accidentally skip Phase 3 when requiredSchema or
DATABASE_URL is missing; update the env/envFrom logic to also read DATABASE_URL
from .Values.database.existingSecret (and/or prefer that secret when present)
and make the Job fail-closed when workerSmoke.enabled/$workerSmokeEnabled is
true and DATABASE_URL is not available (remove or guard the early `exit 0` path
used around requiredSchema checks so the Job errors out instead of exiting
successfully); touch the template bits referencing $secretName, include
"lobu.secretName", .Values.secrets.create,
.Values.secrets.stringData["DATABASE_URL"], .Values.database.existingSecret, and
the requiredSchema/Phase 3 exit logic so the smoke gate cannot be bypassed when
workerSmoke is enabled.
In `@charts/lobu/values.yaml`:
- Around line 291-297: The values.yaml exposes conversationIdPrefix which the
server (packages/server/src/gateway/routes/internal/smoke.ts) hard-validates to
start with "smoke-", causing 400s if operators change it; remove the
conversationIdPrefix knob from charts/lobu/values.yaml (or render it only when
equal to the default "smoke-") so Helm cannot set a non-default value, and/or
add a template-time check that fails the release if conversationIdPrefix !=
"smoke-" to ensure workers always send a prefix accepted by the smoke route.
In `@packages/server/src/gateway/routes/internal/smoke.ts`:
- Around line 100-110: The code calls .trim() on body fields that may be
non-strings which can throw; before trimming the fields from the parsed body
(from c.req.json()), validate that body.agentId, body.organizationId,
body.conversationId, and body.messageText are strings (and not null/undefined)
and reject with c.json({ error: "Invalid JSON body" }, 400) or a more specific
400 when they are not strings; then safely call .trim() (or provide defaults
like "smoke-test ping" for messageText) and assign to agentId, organizationId,
conversationId, messageText—update the SmokeDispatchBody handling so non-string
values do not reach .trim().
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro Plus
Run ID: a0d241b9-ce0e-4ceb-843f-5314dfcd52cf
📒 Files selected for processing (5)
charts/lobu/templates/smoke-test-job.yamlcharts/lobu/values.yamlpackages/server/src/gateway/__tests__/smoke-dispatch.test.tspackages/server/src/gateway/routes/internal/smoke.tspackages/server/src/index.ts
| let body: SmokeDispatchBody; | ||
| try { | ||
| body = (await c.req.json()) as SmokeDispatchBody; | ||
| } catch { | ||
| return c.json({ error: "Invalid JSON body" }, 400); | ||
| } | ||
|
|
||
| const agentId = body.agentId?.trim(); | ||
| const organizationId = body.organizationId?.trim(); | ||
| const conversationId = body.conversationId?.trim(); | ||
| const messageText = body.messageText?.trim() || "smoke-test ping"; |
There was a problem hiding this comment.
Reject non-string fields before calling .trim().
await c.req.json() can return null or objects with non-string values, and { "agentId": 1 } currently throws here before your 400-path runs. That turns a bad request into a 500.
💡 Proposed fix
- let body: SmokeDispatchBody;
+ let body: unknown;
try {
- body = (await c.req.json()) as SmokeDispatchBody;
+ body = await c.req.json();
} catch {
return c.json({ error: "Invalid JSON body" }, 400);
}
- const agentId = body.agentId?.trim();
- const organizationId = body.organizationId?.trim();
- const conversationId = body.conversationId?.trim();
- const messageText = body.messageText?.trim() || "smoke-test ping";
+ if (!body || typeof body !== "object" || Array.isArray(body)) {
+ return c.json({ error: "Request body must be a JSON object" }, 400);
+ }
+
+ const parsed = body as SmokeDispatchBody;
+ const agentId =
+ typeof parsed.agentId === "string" ? parsed.agentId.trim() : undefined;
+ const organizationId =
+ typeof parsed.organizationId === "string"
+ ? parsed.organizationId.trim()
+ : undefined;
+ const conversationId =
+ typeof parsed.conversationId === "string"
+ ? parsed.conversationId.trim()
+ : undefined;
+ const messageText =
+ typeof parsed.messageText === "string" && parsed.messageText.trim().length > 0
+ ? parsed.messageText.trim()
+ : "smoke-test ping";📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| let body: SmokeDispatchBody; | |
| try { | |
| body = (await c.req.json()) as SmokeDispatchBody; | |
| } catch { | |
| return c.json({ error: "Invalid JSON body" }, 400); | |
| } | |
| const agentId = body.agentId?.trim(); | |
| const organizationId = body.organizationId?.trim(); | |
| const conversationId = body.conversationId?.trim(); | |
| const messageText = body.messageText?.trim() || "smoke-test ping"; | |
| let body: unknown; | |
| try { | |
| body = await c.req.json(); | |
| } catch { | |
| return c.json({ error: "Invalid JSON body" }, 400); | |
| } | |
| if (!body || typeof body !== "object" || Array.isArray(body)) { | |
| return c.json({ error: "Request body must be a JSON object" }, 400); | |
| } | |
| const parsed = body as SmokeDispatchBody; | |
| const agentId = | |
| typeof parsed.agentId === "string" ? parsed.agentId.trim() : undefined; | |
| const organizationId = | |
| typeof parsed.organizationId === "string" | |
| ? parsed.organizationId.trim() | |
| : undefined; | |
| const conversationId = | |
| typeof parsed.conversationId === "string" | |
| ? parsed.conversationId.trim() | |
| : undefined; | |
| const messageText = | |
| typeof parsed.messageText === "string" && parsed.messageText.trim().length > 0 | |
| ? parsed.messageText.trim() | |
| : "smoke-test ping"; |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@packages/server/src/gateway/routes/internal/smoke.ts` around lines 100 - 110,
The code calls .trim() on body fields that may be non-strings which can throw;
before trimming the fields from the parsed body (from c.req.json()), validate
that body.agentId, body.organizationId, body.conversationId, and
body.messageText are strings (and not null/undefined) and reject with c.json({
error: "Invalid JSON body" }, 400) or a more specific 400 when they are not
strings; then safely call .trim() (or provide defaults like "smoke-test ping"
for messageText) and assign to agentId, organizationId, conversationId,
messageText—update the SmokeDispatchBody handling so non-string values do not
reach .trim().
e652ba0 to
884638e
Compare
|
Codex round 1 flagged two real issues — both fixed: 1. Ingress-bypass defense. The route is mounted on the main app which is exposed by the public Ingress. Bearer auth alone isn't enough: a leaked token could be replayed from outside. Fix: reject any request that carries 2. Server-side namespace pinning. The first draft accepted caller-supplied Chart wiring updated to match: smoke Job no longer passes the identifiers in the body; the in-cluster app pod resolves them from its own envFrom-loaded Secret. Total tests: 18 pass (was 13). Helm template render still clean both modes. Latest commit: |
… completion Adds a Phase-3 worker-smoke gate to the post-upgrade smoke Job: the Job POSTs to a new internal endpoint that inserts a synthetic chat_message run, then polls `agent_transcript_snapshot` for that run's terminal status. If the snapshot doesn't materialise with `terminal_status='completed'` inside the configured window, the deploy fails and Helm rolls back to the previous chart. The current smoke-test-job only curls /api/health — every recent regression (Phase 5 env flip, runs denormalize/revert, JobEventSchema, action_input JSONB shape) passed that check while workers couldn't process a single chat message. Driving an end-to-end run through the same runs-queue + worker-spawn pipeline that real traffic uses catches that class. Backend: POST /api/internal/smoke/dispatch (auth: bearer SMOKE_TEST_TOKEN, constant-time compare). Inserts a chat_message run with a synthetic platform="smoke" payload. Refuses to dispatch unless agentId, organizationId, and conversationId are explicit AND the conversationId starts with "smoke-". Idempotent on conversationId so repeated calls don't flood the queue. Chart: new releaseGates.smokeTest.workerSmoke block — disabled by default. Operators preprovision a synthetic agent matching `agentId` and add SMOKE_TEST_TOKEN to the deployment Secret before flipping it on. Job picks up the token via envFrom on the existing Secret, polls the snapshot row over the existing DATABASE_URL.
884638e to
52058ab
Compare
|
Round 2 fix: added Host-header allowlist for stronger ingress-bypass defense. Layer 1 (already in round 1): Layer 2 (new): Host header must match Both layers fire before the bearer check. The chart wires Total tests: 23 pass (was 18). Latest commit: |
Summary
The current
charts/lobu/templates/smoke-test-job.yamlonly curls/api/health. Every recent regression — Phase-5 env flip, runs denormalize/revert, JobEventSchema, today'saction_inputJSONB-shape bug — passed that check cleanly while workers couldn't process a single chat message in prod.This PR adds a Phase-3 worker-smoke gate that drives an end-to-end run through the runs-queue + worker-spawn pipeline:
POST /api/internal/smoke/dispatch(auth: shared bearerSMOKE_TEST_TOKEN, constant-time compare). Inserts a syntheticchat_messagerun intopublic.runswithplatform="smoke"and an explicitsmoke-org/smoke-test/smoke-<release>-<revision>namespace. Refuses to dispatch unless the caller passes all three identifiers AND the conversationId carries thesmoke-prefix.MessageConsumer(running in the app pod) claims the row, spawns a worker subprocess, the worker runs end-to-end, and on terminal cleanup writes a row intoagent_transcript_snapshot.DATABASE_URL. Ifterminal_status='completed'materialises withinworkerSmoke.timeoutSeconds, exit 0; otherwise fail the deploy and Helm rolls back.This makes the "gateway boots fine but workers are broken" regression class un-shippable.
What's intentionally not here
workerSmoke.enableddefaults to FALSE. Per the issue notes, do not flip it to true in the helmrelease until the siblingfix/runs-action-input-jsonb-shapePR has landed and prod has been verified to actually process chats. Otherwise this gate would intentionally break every deploy until that fix lands.values.yaml. The chart can't preprovision a row in someone else's DB.lobu-ai/owletto, so the chart change must mirror — done in lobu-ai/owletto#chore/worker-smoke-test-gate.Files
packages/server/src/gateway/routes/internal/smoke.ts— new internal endpointpackages/server/src/index.ts— mount before the mcpAuth middlewarepackages/server/src/gateway/__tests__/smoke-dispatch.test.ts— auth + validation + insert + idempotencycharts/lobu/values.yaml— newreleaseGates.smokeTest.workerSmokeblockcharts/lobu/templates/smoke-test-job.yaml— phase-3 dispatch + poll loop, gated byworkerSmoke.enabledTest plan
make typecheckcleanmake build-packagescleanbun test packages/server/src/gateway/__tests__/smoke-dispatch.test.ts— 13 tests pass (auth contract, missing-field rejection, prefix enforcement, run insert, idempotency)helm template charts/lobu(defaults) — renders, no worker-smoke env vars emittedhelm template charts/lobu --set releaseGates.smokeTest.workerSmoke.enabled=true— renders, envFrom + WORKER_SMOKE_* env present, conv-id and dispatch loop presentworkerSmoke.enabled=truein the staging helmrelease, verify a known-good chart upgrade passes the gate and a deliberately-broken one (e.g. revert the action_input fix) fails it.Summary by CodeRabbit
New Features
Documentation
Tests