Skip to content

feat: wire guardrails runtime end-to-end + secret-scan and forbidden-tools built-ins#919

Merged
buremba merged 4 commits into
mainfrom
feat/guardrails-runtime
May 19, 2026
Merged

feat: wire guardrails runtime end-to-end + secret-scan and forbidden-tools built-ins#919
buremba merged 4 commits into
mainfrom
feat/guardrails-runtime

Conversation

@buremba
Copy link
Copy Markdown
Member

@buremba buremba commented May 19, 2026

Summary

PR A of the 3-PR guardrails rollout — wires the runtime end-to-end and ships the first two real guardrails. The guardrail primitive (@lobu/core Guardrail<stage> / GuardrailRegistry / runGuardrails) and the config/storage plumbing (agents.guardrails jsonb, AgentSettings.guardrails, lobu apply) were already in place. This PR replaces the three TODO call sites with real runGuardrails() invocations and ships the first built-ins. Sibling PRs cover schema extensions (PR B) and owletto UI (PR C).

  • Wired call sites:
    • MessageConsumer.handleMessage (input) — dispatch is short-circuited on trip, Message rejected: <reason> pushed to thread_response.
    • ChatResponseBridge.handleDelta (output, per delta) — in-flight stream disposed on trip, Message blocked by guardrail: <reason> posted, partial buffer not written to history.
    • McpProxy.handleProxyRequest (pre-tool) — JSON-RPC isError with generic Tool call blocked by policy.; specific reason intentionally not leaked.
  • Built-ins (packages/server/src/gateway/guardrails/builtins.ts):
    • secret-scan (output) — OpenAI/GitHub PAT/AWS/JWT regex.
    • forbidden-tools (pre-tool) — deny-list [delete_repo, delete_branch, drop_table].
  • Registered at boot in CoreServices.initialize() via registerBuiltinGuardrails(...); exposed via getGuardrailRegistry().
  • Every trip writes one semantic_type='guardrail-trip' event with {guardrail, stage, reason, agent_id, user_id, conversation_id} metadata.
  • AGENTS.md #### Guardrails updated to reflect what ships.

Codex review rounds

Round 1 (commit b90dbdc0)

Five concerns from the high-effort codex review, all fixed:

  1. Split-chunk regression in chat-response-bridge.ts — per-delta regex scan missed secrets straddling chunk boundaries ("sk-an" then "t-..."). Fixed via a per-stream rolling tail (OUTPUT_GUARDRAIL_TAIL_CHARS = 256); the scan operates on (previous tail + current delta) so the regex still matches on the second chunk. Tail is dropped on completion, full-replacement, and trip so a blocked stream's bytes never leak into the next stream on the same key.
  2. Audit silently dropped trips on missing organizationId — security log gap. Bridge falls back to instance.connection.organizationId; MessageConsumer and McpProxy fall back to agentSettingsStore.getMetadata(agentId). Audit logs at error when both fail.
  3. Fake e2e in guardrails-runtime.test.ts — prior test only called runGuardrails(...) directly. Rewritten to instantiate real ChatResponseBridge, MessageConsumer (subclassed TestableMessageConsumer), McpProxy and drive their public methods.
  4. Flaky setTimeout(50) — replaced with flushPendingGuardrailAudits(), a production helper that drains in-flight audit writes. Wired call sites still use void recordGuardrailTrip(...).
  5. (platformMetadata as any).field — new connections/platform-metadata.ts declares PlatformMetadata + readPlatformMetadata + platformMetadataString.

Round 2 (commit 2eb575f7)

Two remaining items, both fixed:

  1. handleDelta ordering with fullReplacement — the rolling tail was updated/scanned BEFORE the replacement was handled, so the scan ran on (previous-delta-tail + new-replacement-text). Joining two unrelated text bodies can synthesize a regex match that exists in neither piece on its own — a false-positive trip on the first delta of an unrelated reply. Fixed by reordering: detect isFullReplacement first, dispose the prior stream + clear the tail, then scan. A regression test asserts the scenario: prior delta ending "...sk-a" followed by a fullReplacement delta starting "bcdefghij..." (each safe alone, naive concat would match openai-key) must NOT trip. Verified to fail against the pre-fix code before the reorder.

  2. Residual payload.platformMetadata as Record<...> casts — three call sites in chat-response-bridge.ts (handleError, handleStatusUpdate, handleEphemeral) plus one data?.platformMetadata?.traceparent as string | undefined in message-consumer.ts. Replaced with readPlatformMetadata(...) / platformMetadataString(...). resolveTarget and buildCurrentMessageFromMetadata signatures now take PlatformMetadata directly. No production-code platformMetadata as Record / platformMetadata as any casts remain.

Round 3 simplification pass (commit 8e9508ca)

Post-review cleanup — no behavior change, 7/7 + 887/887 tests still green.

  1. Collapsed the rolling-tail helpers in chat-response-bridge.ts. updateGuardrailTail(key, delta) and clearGuardrailTail(key) became one scanWindowWithTail(key, delta) plus inlined this.guardrailTails.delete(key) at the three call sites. The previous updateGuardrailTail computed nextTail = combined.slice(-512) AND separately computed scanSlice = (previous + delta).slice(-512) — the same window twice with slightly different conditionals. One slice now serves both roles. clearGuardrailTail added no meaning over the Map call it wrapped.
  2. Stripped review-justifying comments. Per-method docs on resolveAgentId/resolveOrganizationId, the file-level audit.ts block, the Fire-and-forget — Tests use flushPendingGuardrailAudits to drain repeats in bridge/proxy/message-consumer, the Fail open — same rationale as the pre-tool path in McpProxy comments. Kept the terse invariant on the isFullReplacement ordering (the regression test exercises exactly that order).
  3. Left intentionally as-is: audit.ts 3-concept shape (recordGuardrailTrip + pendingAudits Set + flushPendingGuardrailAudits). The user prompt's suggestion (return promise from recordGuardrailTrip, callers void) is already what it does. Removing the tracking would force the test to either change the production call sites to expose the promise or accept a sleep(50) — both are worse than the current private-tracker shape.

Reproducer

$ cd packages/server && LOBU_TEST_BACKEND=pglite bunx vitest run src/__tests__/guardrails-runtime.test.ts

✅ PGlite ready
🗄️  Running migrations...
✅ Test database ready.

[guardrail-runner] Guardrail tripped {"guardrail":"secret-scan","stage":"output","reason":"... openai-key"}
[guardrail-runner] Guardrail tripped {"guardrail":"secret-scan","stage":"output","reason":"... openai-key"}    ← split-chunk
[guardrail-runner] Guardrail tripped {"guardrail":"secret-scan","stage":"output","reason":"... aws-access-key"} ← connection org-id fallback
[orchestrator] Input guardrail tripped — message dropped {"guardrail":"test-input-tripper"}
[orchestrator] Input guardrail tripped — message dropped {"guardrail":"test-input-tripper"} ← metadata orgId fallback
[mcp-proxy] Pre-tool guardrail tripped — returning generic policy block to worker {"guardrail":"forbidden-tools"}

 ✓ src/__tests__/guardrails-runtime.test.ts (7 tests) 601ms
 Test Files  1 passed (1)      Tests  7 passed (7)

Tests:

  • ChatResponseBridge — full-secret-in-one-delta, split-chunk-across-two-deltas, fullReplacement-clears-tail (no false positive), connection.organizationId fallback
  • MessageConsumer — trip blocks worker enqueue + emits thread_response, metadata-orgId fallback
  • McpProxydelete_repo returns Tool call blocked by policy. (no reason leak), audit row records reason for operators

Test plan

  • make build-packages clean
  • bun run typecheck clean
  • bun test src/gateway/__tests__/ — 887/887 tests pass (no regressions)
  • LOBU_TEST_BACKEND=pglite bunx vitest run src/__tests__/guardrails-runtime.test.ts — 7/7 pass
  • Regression test (isFullReplacement clears the rolling tail before scanning) verified to FAIL against pre-fix chat-response-bridge.ts before the reorder
  • Core guardrail tests — 28/28 pass
  • AGENTS.md #### Guardrails describes the shipped behavior

Summary by CodeRabbit

  • New Features

    • Runtime guardrails enabled: secret-scanning for outputs, forbidden-tool blocking for tool calls, and input-stage checks. Blocked outputs post a user-facing notice; blocked tool calls return a generic "Tool call blocked by policy" response; rejected inputs enqueue a thread rejection.
  • Documentation

    • Updated guardrail configuration and per-stage semantics documented.

Summary by CodeRabbit

  • New Features

    • Runtime guardrails enabled: secret-scanning for outputs, forbidden-tool blocking for tool calls, and input-stage checks; blocked outputs post a user notice, blocked tool calls return a generic "Tool call blocked by policy", and rejected inputs enqueue a thread rejection.
  • Documentation

    • Guardrail configuration and per-stage semantics updated and clarified.
  • Tests

    • End-to-end guardrails test suite added with database-backed audit verification.
  • Other

    • Guardrail trips are recorded as append-only audit events; infra errors fail open.

Review Change Stack

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 19, 2026

Warning

Rate limit exceeded

@buremba has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 10 seconds before requesting another review.

You’ve run out of usage credits. Purchase more in the billing tab.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: 814ea101-62ba-4360-8605-43990aea233b

📥 Commits

Reviewing files that changed from the base of the PR and between 8e9508c and b42f368.

📒 Files selected for processing (11)
  • AGENTS.md
  • packages/server/src/__tests__/guardrails-runtime.test.ts
  • packages/server/src/gateway/auth/mcp/proxy.ts
  • packages/server/src/gateway/connections/chat-response-bridge.ts
  • packages/server/src/gateway/connections/platform-metadata.ts
  • packages/server/src/gateway/guardrails/audit.ts
  • packages/server/src/gateway/guardrails/builtins.ts
  • packages/server/src/gateway/orchestration/index.ts
  • packages/server/src/gateway/orchestration/message-consumer.ts
  • packages/server/src/gateway/services/core-services.ts
  • packages/server/src/lobu/gateway.ts
📝 Walkthrough

Walkthrough

This PR implements a three-stage guardrails enforcement system for the Lobu gateway: input-stage blocking (message consumer), output-stage redaction (chat stream), and pre-tool-stage denial (MCP proxy). The system includes audit trails, two built-in guardrails (secret-scan and forbidden-tools), core bootstrap wiring, and comprehensive E2E tests.

Changes

Guardrails Enforcement System

Layer / File(s) Summary
Guardrail trip auditing foundation
packages/server/src/gateway/guardrails/audit.ts
RecordGuardrailTripParams interface, pendingAudits tracking, flushPendingGuardrailAudits(), and recordGuardrailTrip() write append-only guardrail-trip events with deterministic originId; failures are caught and logged and the API returns a Promise.
Built-in guardrails: secret-scan and forbidden-tools
packages/server/src/gateway/guardrails/builtins.ts
secretScanGuardrail (output stage) detects credential-shaped text via prioritized regexes; forbiddenToolsGuardrail (pre-tool stage) denies destructive tool names; registerBuiltinGuardrails() registers both on a provided GuardrailRegistry.
Core services guardrail registry initialization
packages/server/src/gateway/services/core-services.ts
CoreServices constructs a GuardrailRegistry in initialize(), registers built-ins, exposes getGuardrailRegistry(), and passes the registry into McpProxy on construction.
Orchestrator integration of guardrails
packages/server/src/gateway/orchestration/index.ts
Orchestrator.injectCoreServices() signature extended with optional guardrailRegistry and agentSettingsStore; when provided, wires them into MessageConsumer via setGuardrails().
Gateway bootstrap and bridge wiring
packages/server/src/lobu/gateway.ts
initLobuGateway() passes getGuardrailRegistry() and getAgentSettingsStore() into injectCoreServices() and configures ChatResponseBridge with guardrails before attaching it to the unified consumer.
Platform metadata helpers
packages/server/src/gateway/connections/platform-metadata.ts
Adds PlatformMetadata type, readPlatformMetadata() and platformMetadataString() helpers for safe metadata extraction used across bridge/handlers.
Output-stage guardrails in chat stream
packages/server/src/gateway/connections/chat-response-bridge.ts
ChatResponseBridge.setGuardrails() injects registry and settings; handleDelta() runs per-delta output guardrails, marks blocked streams, disposes in-flight strategy streams, posts a block notice to the resolved response thread/channel, and handleCompletion() suppresses buffered output when blocked.
Input-stage guardrails in message handler
packages/server/src/gateway/orchestration/message-consumer.ts
MessageConsumer.setGuardrails() injects dependencies; handleMessage() runs input-stage guardrails, records trips, enqueues a thread_response rejection payload, and skips worker dispatch on trip; guardrail infra errors fail open.
Pre-tool guardrails in MCP proxy
packages/server/src/gateway/auth/mcp/proxy.ts
McpProxy constructor accepts agentSettingsStore and guardrailRegistry; during JSON-RPC tools/call, pre-tool guardrails are run before approval evaluation, trips are recorded, and a generic JSON-RPC error is returned to block the tool call; infra errors fail open.
End-to-end testing and documentation
AGENTS.md, packages/server/src/__tests__/guardrails-runtime.test.ts
Adds E2E vitest file exercising output/input/pre-tool stages against a Postgres test DB and updates AGENTS.md to document per-agent lobu.toml configuration, wired call sites, stage-specific trip semantics, and the guardrail-trip event shape.

Sequence Diagram(s)

sequenceDiagram
  participant Client
  participant McpProxy
  participant GuardrailRegistry
  participant AuditModule
  participant PostgresDB
  Client->>McpProxy: tools/call request (toolName, args)
  McpProxy->>GuardrailRegistry: runGuardrails("pre-tool", ctx)
  GuardrailRegistry-->>McpProxy: tripped / pass
  McpProxy->>AuditModule: recordGuardrailTrip(params) (if tripped)
  AuditModule->>PostgresDB: insertEvent(semanticType="guardrail-trip", metadata)
  PostgresDB-->>AuditModule: insert result
  McpProxy-->>Client: JSON-RPC error "Tool call blocked by policy." (if tripped) or continue flow
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Suggested labels

skip-size-check

Poem

🐰 I hop through code and sniff the trails,
Input gates, output nets, and pre-tool rails,
Secrets shushed, tools kept tame, audits take a note,
Streams that trip get gently smote,
Hooray — the rabbit signs the log with a joyful quote.

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 66.67% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title 'feat: wire guardrails runtime end-to-end + secret-scan and forbidden-tools built-ins' accurately reflects the main changes: implementing guardrails runtime across three call sites and shipping two built-in guardrails. It is specific, concise, and clearly summarizes the primary deliverables.
Description check ✅ Passed The description comprehensively covers objectives (wiring three call sites, built-ins, audit events), addresses test plan items (build, typecheck, unit tests, e2e test reproducer), documents known issues and fixes across three review rounds, and updates AGENTS.md. All required template sections are complete.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/guardrails-runtime

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
packages/server/src/lobu/gateway.ts (1)

149-150: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Guardrails are wired after queue consumption starts, creating a boot-time bypass window.

orchestrator.start() starts MessageConsumer first, then guardrails are injected later. Any queued message processed in that window can skip input-stage guardrails.

Suggested reorder
-    logger.info('[Lobu] Starting embedded orchestrator');
-    orchestrator = new Orchestrator(gatewayConfig.orchestration);
-    await orchestrator.start();
-    logger.info('[Lobu] Embedded orchestrator started');
+    logger.info('[Lobu] Creating embedded orchestrator');
+    orchestrator = new Orchestrator(gatewayConfig.orchestration);
@@
     coreServices = gateway.getCoreServices();
     await orchestrator.injectCoreServices(
       coreServices.getSecretStore(),
       coreServices.getProviderCatalogService(),
       coreServices.getGrantStore() ?? undefined,
       coreServices.getPolicyStore() ?? undefined,
       coreServices.getGuardrailRegistry() ?? undefined,
       coreServices.getAgentSettingsStore() ?? undefined
     );
+    await orchestrator.start();
+    logger.info('[Lobu] Embedded orchestrator started');

As per coding guidelines: Guardrail wired call sites: MessageConsumer.handleMessage (input), ChatResponseBridge.handleDelta (output), and McpProxy.handleProxyRequest (pre-tool).

Also applies to: 175-182

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@packages/server/src/lobu/gateway.ts` around lines 149 - 150, The orchestrator
currently calls orchestrator.start() which begins the MessageConsumer before
guardrails are attached, allowing a boot-time bypass; move the guardrail wiring
so it runs before starting the orchestrator/consumer. Specifically, ensure calls
that inject/input-guardrails (the wiring code that sets up
MessageConsumer.handleMessage), output-guardrails for
ChatResponseBridge.handleDelta, and pre-tool guards for
McpProxy.handleProxyRequest are executed prior to invoking orchestrator.start()
(and similarly before any start/launch calls currently around the other block
referenced), so consumers cannot process messages before guardrails are in
place.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@packages/server/src/__tests__/guardrails-runtime.test.ts`:
- Line 18: Replace the timing-sensitive setTimeout/sleep usage in the
guardrails-runtime.test.ts tests with a deterministic polling wait (e.g., a
waitFor helper) that repeatedly queries for the expected audit row until it
appears or a timeout elapses; specifically, find the test(s) that call
setTimeout(50) or sleep (and the occurrences around the audit insert assertions)
and change them to call waitFor(async () => { const row = await
<your-audit-query>(); expect(row).toBeDefined(); }); so the test actively polls
the DB for the inserted row instead of sleeping (apply the same replacement for
the other occurrences noted near the end of the file).
- Around line 79-80: Replace the hard-coded token-like fixture string (the
literal containing "sk-abcdefghij0123456789AB", e.g. "here is your secret
sk-abcdefghij0123456789AB please") with a runtime-built value to avoid
checked-in credential-shaped literals; create a small test helper (e.g.,
buildFakeToken or generatePlaceholderToken) or concatenate parts (e.g., "sk-" +
"REDACTED" or join short random segments) and use that helper wherever the
literal appears so the test payload still looks like a token but no long
credential-shaped string is stored in the repo.

In `@packages/server/src/gateway/guardrails/audit.ts`:
- Line 41: Replace the non-deterministic Date.now() in the originId generation
with a retry-stable identifier: use an existing stable field on params (e.g.,
params.requestId or params.runId if available) as the suffix for originId, or if
none exists compute a deterministic hash from the stable inputs (e.g., hash of a
deterministic serialization of params.stage, params.guardrail, params.agentId
and any run-specific id) so originId =
`guardrail_trip_${params.stage}_${params.guardrail}_${params.agentId}_${stableId}`;
update the code at the originId assignment (the originId variable using params)
and ensure the hash/serialization routine is deterministic (sorted keys) and
uses only stable params so retries/replays produce the same originId.

In `@packages/server/src/gateway/guardrails/builtins.ts`:
- Line 31: The "openai-key" guardrail regex is too restrictive (only
alphanumerics) and can miss valid API keys like "sk-proj-…" or keys containing
hyphens/underscores; update the re value for the object with name "openai-key"
in builtins.ts to a broader pattern that still anchors on the "sk-" prefix but
allows additional hyphens and underscores and a reasonable variable length for
the remainder (i.e., permit characters like [A-Za-z0-9_-] and relax the
minimum/maximum count) so the guardrail catches project-scoped and variant key
formats.

---

Outside diff comments:
In `@packages/server/src/lobu/gateway.ts`:
- Around line 149-150: The orchestrator currently calls orchestrator.start()
which begins the MessageConsumer before guardrails are attached, allowing a
boot-time bypass; move the guardrail wiring so it runs before starting the
orchestrator/consumer. Specifically, ensure calls that inject/input-guardrails
(the wiring code that sets up MessageConsumer.handleMessage), output-guardrails
for ChatResponseBridge.handleDelta, and pre-tool guards for
McpProxy.handleProxyRequest are executed prior to invoking orchestrator.start()
(and similarly before any start/launch calls currently around the other block
referenced), so consumers cannot process messages before guardrails are in
place.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: 1ba66f5d-4bef-4773-94e3-2e60c484daef

📥 Commits

Reviewing files that changed from the base of the PR and between f8f087b and 7501ce5.

📒 Files selected for processing (10)
  • AGENTS.md
  • packages/server/src/__tests__/guardrails-runtime.test.ts
  • packages/server/src/gateway/auth/mcp/proxy.ts
  • packages/server/src/gateway/connections/chat-response-bridge.ts
  • packages/server/src/gateway/guardrails/audit.ts
  • packages/server/src/gateway/guardrails/builtins.ts
  • packages/server/src/gateway/orchestration/index.ts
  • packages/server/src/gateway/orchestration/message-consumer.ts
  • packages/server/src/gateway/services/core-services.ts
  • packages/server/src/lobu/gateway.ts

Comment thread packages/server/src/__tests__/guardrails-runtime.test.ts Outdated
Comment on lines +79 to +80
text: 'here is your secret sk-abcdefghij0123456789AB please',
platform: 'api',
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Avoid checked-in credential-shaped literals in test fixtures.

These literals can trip secret scanners/push protection and create noisy security alerts. Build them at runtime instead of embedding full token-like strings.

Suggested diff
@@
-    const outcome = await runGuardrails(registry, 'output', enabled, {
+    const openAiLike = `sk-${'a'.repeat(24)}`;
+    const outcome = await runGuardrails(registry, 'output', enabled, {
       agentId,
       userId: 'user-1',
-      text: 'here is your secret sk-abcdefghij0123456789AB please',
+      text: `here is your secret ${openAiLike} please`,
       platform: 'api',
       conversationId: 'conv-1',
     });
@@
-    const outcome = await secretScanGuardrail.run({
+    const githubPatLike = `ghp_${'a'.repeat(36)}`;
+    const outcome = await secretScanGuardrail.run({
       agentId,
       userId: 'u',
-      text: 'token: ghp_abcdefghij0123456789ABCDEFGH01234567',
+      text: `token: ${githubPatLike}`,
       platform: 'api',
     });
@@
-    const outcome = await secretScanGuardrail.run({
+    const awsAccessKeyLike = `AKIA${'A'.repeat(16)}`;
+    const outcome = await secretScanGuardrail.run({
       agentId,
       userId: 'u',
-      text: 'AWS_ACCESS_KEY_ID=AKIAIOSFODNN7EXAMPLE',
+      text: `AWS_ACCESS_KEY_ID=${awsAccessKeyLike}`,
       platform: 'api',
     });

Also applies to: 166-167, 177-178

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@packages/server/src/__tests__/guardrails-runtime.test.ts` around lines 79 -
80, Replace the hard-coded token-like fixture string (the literal containing
"sk-abcdefghij0123456789AB", e.g. "here is your secret sk-abcdefghij0123456789AB
please") with a runtime-built value to avoid checked-in credential-shaped
literals; create a small test helper (e.g., buildFakeToken or
generatePlaceholderToken) or concatenate parts (e.g., "sk-" + "REDACTED" or join
short random segments) and use that helper wherever the literal appears so the
test payload still looks like a token but no long credential-shaped string is
stored in the repo.

return;
}

const originId = `guardrail_trip_${params.stage}_${params.guardrail}_${params.agentId}_${Date.now()}`;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | 🏗️ Heavy lift

Use a deterministic, retry-stable originId instead of Date.now().

Date.now() makes the audit event identifier non-deterministic across retries/replays, which breaks deterministic origin semantics for this layer.

Suggested direction
 interface RecordGuardrailTripParams {
+  tripId: string; // caller-provided stable id (message id / run id / request id)
   organizationId: string | undefined;
   agentId: string;
@@
-  const originId = `guardrail_trip_${params.stage}_${params.guardrail}_${params.agentId}_${Date.now()}`;
+  const originId = `guardrail_trip_${params.stage}_${params.guardrail}_${params.agentId}_${params.tripId}`;
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@packages/server/src/gateway/guardrails/audit.ts` at line 41, Replace the
non-deterministic Date.now() in the originId generation with a retry-stable
identifier: use an existing stable field on params (e.g., params.requestId or
params.runId if available) as the suffix for originId, or if none exists compute
a deterministic hash from the stable inputs (e.g., hash of a deterministic
serialization of params.stage, params.guardrail, params.agentId and any
run-specific id) so originId =
`guardrail_trip_${params.stage}_${params.guardrail}_${params.agentId}_${stableId}`;
update the code at the originId assignment (the originId variable using params)
and ensure the hash/serialization routine is deterministic (sorted keys) and
uses only stable params so retries/replays produce the same originId.

* pattern is the slowest, so it runs last on a streaming hot path.
*/
const SECRET_PATTERNS: Array<{ name: string; re: RegExp }> = [
{ name: "openai-key", re: /sk-[a-zA-Z0-9]{20,}/ },
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

🧩 Analysis chain

🌐 Web query:

What are the currently documented OpenAI API key prefixes/formats (including project-scoped keys) in official OpenAI documentation?

💡 Result:

OpenAI’s official documentation does not provide a single authoritative “master list” page enumerating every API key prefix literal string (including all legacy variants) in one place. However, the official OpenAI API reference pages do document the following API key formats via the “redacted_value” examples returned by key-management endpoints: 1) Project-scoped API keys (Project API keys) - Documented redacted format: “sk-…”, as shown in the Project API Key objects’ redacted_value (example: “sk-abc…def”) [1][2]. - These keys are represented as an object type “organization.project.api_key” in the Project API Keys API reference [3][2]. 2) Admin API keys (for Administration API) - Documented redacted format: “sk-admin…”, as shown in the Admin API Key objects’ redacted_value (example: “sk-admin…def”) [2]. - Admin API keys are explicitly for the Administration API and “cannot be used for non-administration endpoints” [4][5]. 3) How keys are used (not a prefix, but the official documented format in requests) - OpenAI authentication is documented as HTTP Bearer auth: Authorization: Bearer $OPENAI_API_KEY [6][7]. What is not explicitly confirmed by the cited official docs above - The official docs pages surfaced here show “sk-…” and “sk-admin…” (redacted examples), but they do not explicitly spell out the longer literal “sk-proj-…” prefix string in the official snippets we captured. (A community thread mentions “sk-proj-” concretely, but it is not official documentation.) [8][9] If you want, I can do one more targeted round to find an official OpenAI doc page that explicitly states the literal “sk-proj-…” prefix string (not just the redacted “sk-…“ form).

Citations:


Broaden OpenAI key pattern to reduce false negatives in the security guardrail.

Official OpenAI documentation does not explicitly detail the full character set allowed in API keys—only redacted examples (sk-…, sk-admin…) are shown. Community discussions mention project-scoped key variants like sk-proj-, suggesting the format may be more varied than the current pattern allows. The current regex restricts characters to alphanumerics only, which risks missing keys that contain hyphens or underscores. Since this is a security guardrail, a defensive stance (broader pattern matching) is appropriate to avoid under-detection.

Suggested fix
-  { name: "openai-key", re: /sk-[a-zA-Z0-9]{20,}/ },
+  { name: "openai-key", re: /\bsk-[A-Za-z0-9_-]{20,}\b/ },
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
{ name: "openai-key", re: /sk-[a-zA-Z0-9]{20,}/ },
{ name: "openai-key", re: /\bsk-[A-Za-z0-9_-]{20,}\b/ },
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@packages/server/src/gateway/guardrails/builtins.ts` at line 31, The
"openai-key" guardrail regex is too restrictive (only alphanumerics) and can
miss valid API keys like "sk-proj-…" or keys containing hyphens/underscores;
update the re value for the object with name "openai-key" in builtins.ts to a
broader pattern that still anchors on the "sk-" prefix but allows additional
hyphens and underscores and a reasonable variable length for the remainder
(i.e., permit characters like [A-Za-z0-9_-] and relax the minimum/maximum count)
so the guardrail catches project-scoped and variant key formats.

@codecov-commenter
Copy link
Copy Markdown

⚠️ Please install the 'codecov app svg image' to ensure uploads and comments are reliably processed by Codecov.

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

buremba added a commit that referenced this pull request May 19, 2026
…allback, awaitable audit, typed metadata, real e2e

Addresses 5 concerns from the codex review of PR #919.

1. Per-stream rolling buffer for output guardrails: secrets split across
   streaming chunks (e.g. "sk-an" then "t-...") used to bypass every regex.
   The bridge now keeps a 256-char tail per stream key and scans
   `(tail + delta)` so a pattern straddling a chunk boundary still matches.
   Tail is dropped on completion / full-replacement / guardrail trip so a
   blocked stream's bytes don't leak into the next stream on the same key.

2. Audit org-id fallback: a guardrail trip with no organizationId in
   platformMetadata used to silently skip the events row. Both the bridge
   and MessageConsumer now fall back to the connection record / agent
   metadata; only when both fail do we log at error level (a security log
   gap, not a warn). McpProxy adds the same fallback path via
   `agentSettingsStore.getMetadata`.

3. Real e2e against the wired services: the prior test only called
   `runGuardrails` directly. The rewritten suite constructs `ChatResponseBridge`,
   `MessageConsumer` (subclassed to expose `handleMessage` to the test), and
   `McpProxy` and drives their public methods. Assertions cover the block
   message delivery, the audit row shape, the split-chunk regression, and
   the org-id fallback.

4. Awaitable audit: `recordGuardrailTrip` now tracks every in-flight insert
   in a module-level set. Production call sites use `void` so the user-facing
   block isn't gated on the DB write; tests await `flushPendingGuardrailAudits()`
   to drain. The 50ms `setTimeout` from the prior test is gone.

5. Typed `PlatformMetadata`: new `packages/server/src/gateway/connections/platform-metadata.ts`
   declares the shape and ships `readPlatformMetadata()` /
   `platformMetadataString()` narrowing helpers. `chat-response-bridge.ts`
   no longer uses `(metadata as any).field` for connectionId, chatId,
   responseThreadId, agentId, organizationId, sessionReset, senderId etc.
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
packages/server/src/gateway/auth/mcp/proxy.ts (1)

982-1083: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Extract pre-tool guardrails out of the approval-only branch.

This only runs when grantStore is configured, so deployments without approvals silently skip pre-tool guardrails entirely. It also leaves handleCallTool()'s REST path unguarded, which means policies like forbidden-tools are still bypassable through a different entrypoint. Please move this into a shared pre-tool gate and call it before approval/upstream dispatch in every tool-execution path.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@packages/server/src/gateway/auth/mcp/proxy.ts` around lines 982 - 1083, The
pre-tool guardrail logic is erroneously nested behind a grantStore check so
deployments without approvals and the REST path handled by handleCallTool() skip
guardrails; extract the pre-tool guardrail block (the logic using
this.guardrailRegistry, this.agentSettingsStore, runGuardrails,
recordGuardrailTrip, and the logger/response behavior) into a reusable helper
(e.g., preToolGuardrailCheck) that accepts agentId, tokenData, toolName and
toolArgs, and returns the guardrail outcome or null; then invoke that helper at
the top of every tool-execution path (both the approval/upstream dispatch flow
and the REST handleCallTool() path) before any approval or dispatch logic so
forbidden-tools and other policies are enforced uniformly across entry points.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Outside diff comments:
In `@packages/server/src/gateway/auth/mcp/proxy.ts`:
- Around line 982-1083: The pre-tool guardrail logic is erroneously nested
behind a grantStore check so deployments without approvals and the REST path
handled by handleCallTool() skip guardrails; extract the pre-tool guardrail
block (the logic using this.guardrailRegistry, this.agentSettingsStore,
runGuardrails, recordGuardrailTrip, and the logger/response behavior) into a
reusable helper (e.g., preToolGuardrailCheck) that accepts agentId, tokenData,
toolName and toolArgs, and returns the guardrail outcome or null; then invoke
that helper at the top of every tool-execution path (both the approval/upstream
dispatch flow and the REST handleCallTool() path) before any approval or
dispatch logic so forbidden-tools and other policies are enforced uniformly
across entry points.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: b2c108ed-9293-4786-b210-650b78f2b87a

📥 Commits

Reviewing files that changed from the base of the PR and between 2eb575f and 8e9508c.

📒 Files selected for processing (4)
  • packages/server/src/gateway/auth/mcp/proxy.ts
  • packages/server/src/gateway/connections/chat-response-bridge.ts
  • packages/server/src/gateway/guardrails/audit.ts
  • packages/server/src/gateway/orchestration/message-consumer.ts
🚧 Files skipped from review as they are similar to previous changes (3)
  • packages/server/src/gateway/guardrails/audit.ts
  • packages/server/src/gateway/connections/chat-response-bridge.ts
  • packages/server/src/gateway/orchestration/message-consumer.ts

buremba added 4 commits May 19, 2026 17:02
…tools built-ins

Replaces the three guardrail TODO call sites with real runGuardrails() calls
and ships the first two built-in guardrails registered at gateway boot.

Wired call sites:
- MessageConsumer.handleMessage (input stage) — on trip, dispatch is
  short-circuited and "Message rejected: <reason>" is pushed to the
  thread_response queue so the user sees rejection in-thread.
- ChatResponseBridge.handleDelta (output stage, per streaming delta) — on
  trip, the in-flight platform stream is disposed, "Message blocked by
  guardrail: <reason>" is posted, and the rest of the worker's stream for
  that conversation is suppressed; partial buffer is not written to history.
- McpProxy.handleProxyRequest (pre-tool stage) — on trip, the worker
  receives a JSON-RPC isError reply with the literal "Tool call blocked by
  policy." (specific reason intentionally not surfaced — evasion surface).

All three call sites fail open on infrastructure errors.

Built-ins (packages/server/src/gateway/guardrails/builtins.ts):
- secret-scan (output) — regex scan for OpenAI/GitHub PAT/AWS access key
  and JWT-shaped tokens; cheap enough for per-delta scanning.
- forbidden-tools (pre-tool) — hardcoded deny-list (delete_repo,
  delete_branch, drop_table).

Audit trail: every trip writes a semantic_type='guardrail-trip' event row
(origin_type='guardrail-<stage>', metadata {guardrail, stage, reason,
agent_id, user_id, conversation_id}). Fire-and-forget — enforcement does
not depend on the audit write.

GuardrailRegistry now lives on CoreServices, populated with built-ins
during initialize(). It is injected into McpProxy at construction and into
MessageConsumer / ChatResponseBridge via setter — preserving the existing
post-construction wiring shape.

E2E gate: packages/server/src/__tests__/guardrails-runtime.test.ts boots
PGlite, persists guardrails=['secret-scan'] on an agent via the real
config store, exercises the runGuardrails path, verifies the trip outcome,
and asserts the guardrail-trip event row lands in the events table with
the documented metadata shape. 7/7 tests green.
…allback, awaitable audit, typed metadata, real e2e

Addresses 5 concerns from the codex review of PR #919.

1. Per-stream rolling buffer for output guardrails: secrets split across
   streaming chunks (e.g. "sk-an" then "t-...") used to bypass every regex.
   The bridge now keeps a 256-char tail per stream key and scans
   `(tail + delta)` so a pattern straddling a chunk boundary still matches.
   Tail is dropped on completion / full-replacement / guardrail trip so a
   blocked stream's bytes don't leak into the next stream on the same key.

2. Audit org-id fallback: a guardrail trip with no organizationId in
   platformMetadata used to silently skip the events row. Both the bridge
   and MessageConsumer now fall back to the connection record / agent
   metadata; only when both fail do we log at error level (a security log
   gap, not a warn). McpProxy adds the same fallback path via
   `agentSettingsStore.getMetadata`.

3. Real e2e against the wired services: the prior test only called
   `runGuardrails` directly. The rewritten suite constructs `ChatResponseBridge`,
   `MessageConsumer` (subclassed to expose `handleMessage` to the test), and
   `McpProxy` and drives their public methods. Assertions cover the block
   message delivery, the audit row shape, the split-chunk regression, and
   the org-id fallback.

4. Awaitable audit: `recordGuardrailTrip` now tracks every in-flight insert
   in a module-level set. Production call sites use `void` so the user-facing
   block isn't gated on the DB write; tests await `flushPendingGuardrailAudits()`
   to drain. The 50ms `setTimeout` from the prior test is gone.

5. Typed `PlatformMetadata`: new `packages/server/src/gateway/connections/platform-metadata.ts`
   declares the shape and ships `readPlatformMetadata()` /
   `platformMetadataString()` narrowing helpers. `chat-response-bridge.ts`
   no longer uses `(metadata as any).field` for connectionId, chatId,
   responseThreadId, agentId, organizationId, sessionReset, senderId etc.
…dual platformMetadata casts

1. fullReplacement ordering in chat-response-bridge.ts. The rolling tail was
   being updated and scanned BEFORE the fullReplacement dispose, so when a
   replacement stream landed the scan ran against
   (previous-delta-tail + new-replacement-text). Joining two unrelated
   text bodies can synthesize a regex match that exists in neither piece
   on its own → false-positive trip on the first delta of an unrelated
   reply. Fix: dispose the prior stream and clear the tail BEFORE the
   scan + tail update. Order is now:

     1. Skip if stream is already blocked
     2. Detect isFullReplacement → dispose + clearGuardrailTail
     3. updateGuardrailTail(delta) → scanText
     4. runOutputGuardrails(scanText)
     5. (trip → block) or (pass → forward to strategy.handleDelta)

   Regression test asserts the scenario: a prior delta ending "...sk-a"
   followed by a fullReplacement delta starting "bcdefghij..." (each safe
   in isolation, joined would match openai-key) must NOT trip.

2. Residual `payload.platformMetadata as Record<string, unknown> | undefined`
   casts in production code. Three call sites in chat-response-bridge.ts
   (handleError, handleStatusUpdate, handleEphemeral) plus one
   `data?.platformMetadata?.traceparent as string | undefined` in
   message-consumer.ts. Replaced with `readPlatformMetadata(...)` /
   `platformMetadataString(...)` from connections/platform-metadata.ts.
   resolveTarget and buildCurrentMessageFromMetadata signatures updated
   to accept `PlatformMetadata` directly. No production-code
   `platformMetadata as Record` or `platformMetadata as any` casts
   remain; the two `(chat as any).createThread` calls in resolveTarget
   are unrelated (runtime-accessed Chat SDK method, not metadata).

Validation:
- make build-packages clean
- bun run typecheck clean
- LOBU_TEST_BACKEND=pglite vitest src/__tests__/guardrails-runtime.test.ts → 7/7 pass
- bun test src/gateway/__tests__/ → 887/887 pass (no regressions)
… comments

- chat-response-bridge: collapse updateGuardrailTail + clearGuardrailTail
  into a single scanWindowWithTail() helper. The two-step
  (compute-next-tail then compute-scan-slice) was the same window
  computed twice with slightly different conditionals; one slice does
  it. clearGuardrailTail had three call sites that now inline
  this.guardrailTails.delete(key) directly — the helper added no
  meaning over the underlying Map call.
- Trim doc/comment text that read as "this PR did X" rather than
  describing an invariant: per-method docs on resolveAgentId/
  resolveOrganizationId, the file-level audit.ts block, the
  fire-and-forget repetitions in proxy.ts + message-consumer.ts +
  bridge. The ordering rationale on isFullReplacement is kept
  (terse) because the regression test exercises exactly that order.
- No behavior change. 7/7 guardrails-runtime tests + 887/887 server
  gateway tests green. The pre-fix regression test still fails when
  the fullReplacement ordering is reverted — confirmed.
@buremba buremba force-pushed the feat/guardrails-runtime branch from 8e9508c to b42f368 Compare May 19, 2026 16:03
@buremba buremba merged commit a66c00d into main May 19, 2026
2 checks passed
@buremba buremba deleted the feat/guardrails-runtime branch May 19, 2026 16:03
buremba added a commit that referenced this pull request May 19, 2026
Bumps packages/owletto from 970eb500 → 4f7c7571 (origin/main), bringing
in the per-agent guardrail toggle + recent trips view (owletto PR #194).

Companion to the guardrails runtime work merged in #919 (call sites +
secret-scan + forbidden-tools built-ins) and #915 (schema + judge engine
+ pii-scan). With this bump the guardrails UI ships end-to-end.
buremba added a commit that referenced this pull request May 19, 2026
…lock image builds (#927)

PR #911 added `examples/personal-finance` to root `package.json`'s
`workspaces` field but didn't update the Dockerfiles, which only COPY
`packages/*/package.json` for the install layer. `bun install` inside
the Docker build then errored:

    error: Workspace not found "examples/personal-finance"
        at /app/package.json:8:5

Every image build on `main` since #911 merged (13:25 UTC today) has
been red: #911#913 (+revert) → #914#915#919#923#924#912#925 — all sitting on `main` un-deployable, including the
`principal_kind` migration from #923 and my own loading-skeletons
shipping artifacts.

Two ways to fix it:

1. **Add stubs to all three Dockerfiles** for the example. Treats the
   symptom; couples prod build pipeline to whatever's under `examples/`,
   wrong direction.
2. **Take the example out of root workspaces.** Examples are
   documentation/demos for users to clone + run; they don't belong in
   the prod build graph. Cleaner separation.

Going with (2). Side effects:

- Example's dependency on `@lobu/promptfoo-provider` switched from
  `workspace:*` (workspace-protocol-only) to
  `file:../../packages/promptfoo-provider`. Resolves locally without
  requiring the example to be in a workspace; consumers run
  `cd examples/personal-finance && bun install` standalone (after
  building the provider once: `cd packages/promptfoo-provider && bun
  run build`).
- `bun.lock` regenerated. Most of the diff is bun's "linked
  workspaces" table shrinking — no upstream version churn.

Verified: simulated Docker build context (root files + stubbed
packages/* manifests + provider stub, no examples/) runs `bun install`
cleanly. No "Workspace not found" error.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants