feat: wire guardrails runtime end-to-end + secret-scan and forbidden-tools built-ins by buremba · Pull Request #919 · lobu-ai/lobu

buremba · 2026-05-19T14:31:07Z

Summary

PR A of the 3-PR guardrails rollout — wires the runtime end-to-end and ships the first two real guardrails. The guardrail primitive (@lobu/core Guardrail<stage> / GuardrailRegistry / runGuardrails) and the config/storage plumbing (agents.guardrails jsonb, AgentSettings.guardrails, lobu apply) were already in place. This PR replaces the three TODO call sites with real runGuardrails() invocations and ships the first built-ins. Sibling PRs cover schema extensions (PR B) and owletto UI (PR C).

Wired call sites:
- MessageConsumer.handleMessage (input) — dispatch is short-circuited on trip, Message rejected: <reason> pushed to thread_response.
- ChatResponseBridge.handleDelta (output, per delta) — in-flight stream disposed on trip, Message blocked by guardrail: <reason> posted, partial buffer not written to history.
- McpProxy.handleProxyRequest (pre-tool) — JSON-RPC isError with generic Tool call blocked by policy.; specific reason intentionally not leaked.
Built-ins (packages/server/src/gateway/guardrails/builtins.ts):
- secret-scan (output) — OpenAI/GitHub PAT/AWS/JWT regex.
- forbidden-tools (pre-tool) — deny-list [delete_repo, delete_branch, drop_table].
Registered at boot in CoreServices.initialize() via registerBuiltinGuardrails(...); exposed via getGuardrailRegistry().
Every trip writes one semantic_type='guardrail-trip' event with {guardrail, stage, reason, agent_id, user_id, conversation_id} metadata.
AGENTS.md #### Guardrails updated to reflect what ships.

Codex review rounds

Round 1 (commit `b90dbdc0`)

Five concerns from the high-effort codex review, all fixed:

Split-chunk regression in chat-response-bridge.ts — per-delta regex scan missed secrets straddling chunk boundaries ("sk-an" then "t-..."). Fixed via a per-stream rolling tail (OUTPUT_GUARDRAIL_TAIL_CHARS = 256); the scan operates on (previous tail + current delta) so the regex still matches on the second chunk. Tail is dropped on completion, full-replacement, and trip so a blocked stream's bytes never leak into the next stream on the same key.
Audit silently dropped trips on missing organizationId — security log gap. Bridge falls back to instance.connection.organizationId; MessageConsumer and McpProxy fall back to agentSettingsStore.getMetadata(agentId). Audit logs at error when both fail.
Fake e2e in guardrails-runtime.test.ts — prior test only called runGuardrails(...) directly. Rewritten to instantiate real ChatResponseBridge, MessageConsumer (subclassed TestableMessageConsumer), McpProxy and drive their public methods.
Flaky setTimeout(50) — replaced with flushPendingGuardrailAudits(), a production helper that drains in-flight audit writes. Wired call sites still use void recordGuardrailTrip(...).
(platformMetadata as any).field — new connections/platform-metadata.ts declares PlatformMetadata + readPlatformMetadata + platformMetadataString.

Round 2 (commit `2eb575f7`)

Two remaining items, both fixed:

handleDelta ordering with fullReplacement — the rolling tail was updated/scanned BEFORE the replacement was handled, so the scan ran on (previous-delta-tail + new-replacement-text). Joining two unrelated text bodies can synthesize a regex match that exists in neither piece on its own — a false-positive trip on the first delta of an unrelated reply. Fixed by reordering: detect isFullReplacement first, dispose the prior stream + clear the tail, then scan. A regression test asserts the scenario: prior delta ending "...sk-a" followed by a fullReplacement delta starting "bcdefghij..." (each safe alone, naive concat would match openai-key) must NOT trip. Verified to fail against the pre-fix code before the reorder.
Residual payload.platformMetadata as Record<...> casts — three call sites in chat-response-bridge.ts (handleError, handleStatusUpdate, handleEphemeral) plus one data?.platformMetadata?.traceparent as string | undefined in message-consumer.ts. Replaced with readPlatformMetadata(...) / platformMetadataString(...). resolveTarget and buildCurrentMessageFromMetadata signatures now take PlatformMetadata directly. No production-code platformMetadata as Record / platformMetadata as any casts remain.

Round 3 simplification pass (commit `8e9508ca`)

Post-review cleanup — no behavior change, 7/7 + 887/887 tests still green.

Collapsed the rolling-tail helpers in chat-response-bridge.ts. updateGuardrailTail(key, delta) and clearGuardrailTail(key) became one scanWindowWithTail(key, delta) plus inlined this.guardrailTails.delete(key) at the three call sites. The previous updateGuardrailTail computed nextTail = combined.slice(-512) AND separately computed scanSlice = (previous + delta).slice(-512) — the same window twice with slightly different conditionals. One slice now serves both roles. clearGuardrailTail added no meaning over the Map call it wrapped.
Stripped review-justifying comments. Per-method docs on resolveAgentId/resolveOrganizationId, the file-level audit.ts block, the Fire-and-forget — Tests use flushPendingGuardrailAudits to drain repeats in bridge/proxy/message-consumer, the Fail open — same rationale as the pre-tool path in McpProxy comments. Kept the terse invariant on the isFullReplacement ordering (the regression test exercises exactly that order).
Left intentionally as-is: audit.ts 3-concept shape (recordGuardrailTrip + pendingAudits Set + flushPendingGuardrailAudits). The user prompt's suggestion (return promise from recordGuardrailTrip, callers void) is already what it does. Removing the tracking would force the test to either change the production call sites to expose the promise or accept a sleep(50) — both are worse than the current private-tracker shape.

Reproducer

$ cd packages/server && LOBU_TEST_BACKEND=pglite bunx vitest run src/__tests__/guardrails-runtime.test.ts

✅ PGlite ready
🗄️  Running migrations...
✅ Test database ready.

[guardrail-runner] Guardrail tripped {"guardrail":"secret-scan","stage":"output","reason":"... openai-key"}
[guardrail-runner] Guardrail tripped {"guardrail":"secret-scan","stage":"output","reason":"... openai-key"}    ← split-chunk
[guardrail-runner] Guardrail tripped {"guardrail":"secret-scan","stage":"output","reason":"... aws-access-key"} ← connection org-id fallback
[orchestrator] Input guardrail tripped — message dropped {"guardrail":"test-input-tripper"}
[orchestrator] Input guardrail tripped — message dropped {"guardrail":"test-input-tripper"} ← metadata orgId fallback
[mcp-proxy] Pre-tool guardrail tripped — returning generic policy block to worker {"guardrail":"forbidden-tools"}

 ✓ src/__tests__/guardrails-runtime.test.ts (7 tests) 601ms
 Test Files  1 passed (1)      Tests  7 passed (7)

Tests:

ChatResponseBridge — full-secret-in-one-delta, split-chunk-across-two-deltas, fullReplacement-clears-tail (no false positive), connection.organizationId fallback
MessageConsumer — trip blocks worker enqueue + emits thread_response, metadata-orgId fallback
McpProxy — delete_repo returns Tool call blocked by policy. (no reason leak), audit row records reason for operators

Test plan

make build-packages clean
bun run typecheck clean
bun test src/gateway/__tests__/ — 887/887 tests pass (no regressions)
LOBU_TEST_BACKEND=pglite bunx vitest run src/__tests__/guardrails-runtime.test.ts — 7/7 pass
Regression test (isFullReplacement clears the rolling tail before scanning) verified to FAIL against pre-fix chat-response-bridge.ts before the reorder
Core guardrail tests — 28/28 pass
AGENTS.md #### Guardrails describes the shipped behavior

Summary by CodeRabbit

New Features
- Runtime guardrails enabled: secret-scanning for outputs, forbidden-tool blocking for tool calls, and input-stage checks. Blocked outputs post a user-facing notice; blocked tool calls return a generic "Tool call blocked by policy" response; rejected inputs enqueue a thread rejection.
Documentation
- Updated guardrail configuration and per-stage semantics documented.

Summary by CodeRabbit

New Features
- Runtime guardrails enabled: secret-scanning for outputs, forbidden-tool blocking for tool calls, and input-stage checks; blocked outputs post a user notice, blocked tool calls return a generic "Tool call blocked by policy", and rejected inputs enqueue a thread rejection.
Documentation
- Guardrail configuration and per-stage semantics updated and clarified.
Tests
- End-to-end guardrails test suite added with database-backed audit verification.
Other
- Guardrail trips are recorded as append-only audit events; infra errors fail open.

coderabbitai · 2026-05-19T14:32:42Z

Warning

Rate limit exceeded

@buremba has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 10 seconds before requesting another review.

You’ve run out of usage credits. Purchase more in the billing tab.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: 814ea101-62ba-4360-8605-43990aea233b

📥 Commits

Reviewing files that changed from the base of the PR and between 8e9508c and b42f368.

📒 Files selected for processing (11)

AGENTS.md
packages/server/src/__tests__/guardrails-runtime.test.ts
packages/server/src/gateway/auth/mcp/proxy.ts
packages/server/src/gateway/connections/chat-response-bridge.ts
packages/server/src/gateway/connections/platform-metadata.ts
packages/server/src/gateway/guardrails/audit.ts
packages/server/src/gateway/guardrails/builtins.ts
packages/server/src/gateway/orchestration/index.ts
packages/server/src/gateway/orchestration/message-consumer.ts
packages/server/src/gateway/services/core-services.ts
packages/server/src/lobu/gateway.ts

📝 Walkthrough

Walkthrough

This PR implements a three-stage guardrails enforcement system for the Lobu gateway: input-stage blocking (message consumer), output-stage redaction (chat stream), and pre-tool-stage denial (MCP proxy). The system includes audit trails, two built-in guardrails (secret-scan and forbidden-tools), core bootstrap wiring, and comprehensive E2E tests.

Changes

Guardrails Enforcement System

Layer / File(s)	Summary
Guardrail trip auditing foundation `packages/server/src/gateway/guardrails/audit.ts`	`RecordGuardrailTripParams` interface, `pendingAudits` tracking, `flushPendingGuardrailAudits()`, and `recordGuardrailTrip()` write append-only `guardrail-trip` events with deterministic `originId`; failures are caught and logged and the API returns a Promise.
Built-in guardrails: secret-scan and forbidden-tools `packages/server/src/gateway/guardrails/builtins.ts`	`secretScanGuardrail` (output stage) detects credential-shaped text via prioritized regexes; `forbiddenToolsGuardrail` (pre-tool stage) denies destructive tool names; `registerBuiltinGuardrails()` registers both on a provided `GuardrailRegistry`.
Core services guardrail registry initialization `packages/server/src/gateway/services/core-services.ts`	`CoreServices` constructs a `GuardrailRegistry` in `initialize()`, registers built-ins, exposes `getGuardrailRegistry()`, and passes the registry into `McpProxy` on construction.
Orchestrator integration of guardrails `packages/server/src/gateway/orchestration/index.ts`	`Orchestrator.injectCoreServices()` signature extended with optional `guardrailRegistry` and `agentSettingsStore`; when provided, wires them into `MessageConsumer` via `setGuardrails()`.
Gateway bootstrap and bridge wiring `packages/server/src/lobu/gateway.ts`	`initLobuGateway()` passes `getGuardrailRegistry()` and `getAgentSettingsStore()` into `injectCoreServices()` and configures `ChatResponseBridge` with guardrails before attaching it to the unified consumer.
Platform metadata helpers `packages/server/src/gateway/connections/platform-metadata.ts`	Adds `PlatformMetadata` type, `readPlatformMetadata()` and `platformMetadataString()` helpers for safe metadata extraction used across bridge/handlers.
Output-stage guardrails in chat stream `packages/server/src/gateway/connections/chat-response-bridge.ts`	`ChatResponseBridge.setGuardrails()` injects registry and settings; `handleDelta()` runs per-delta output guardrails, marks blocked streams, disposes in-flight strategy streams, posts a block notice to the resolved response thread/channel, and `handleCompletion()` suppresses buffered output when blocked.
Input-stage guardrails in message handler `packages/server/src/gateway/orchestration/message-consumer.ts`	`MessageConsumer.setGuardrails()` injects dependencies; `handleMessage()` runs input-stage guardrails, records trips, enqueues a thread_response rejection payload, and skips worker dispatch on trip; guardrail infra errors fail open.
Pre-tool guardrails in MCP proxy `packages/server/src/gateway/auth/mcp/proxy.ts`	`McpProxy` constructor accepts `agentSettingsStore` and `guardrailRegistry`; during JSON-RPC `tools/call`, pre-tool guardrails are run before approval evaluation, trips are recorded, and a generic JSON-RPC error is returned to block the tool call; infra errors fail open.
End-to-end testing and documentation `AGENTS.md`, `packages/server/src/__tests__/guardrails-runtime.test.ts`	Adds E2E vitest file exercising output/input/pre-tool stages against a Postgres test DB and updates AGENTS.md to document per-agent `lobu.toml` configuration, wired call sites, stage-specific trip semantics, and the `guardrail-trip` event shape.

Sequence Diagram(s)

sequenceDiagram
  participant Client
  participant McpProxy
  participant GuardrailRegistry
  participant AuditModule
  participant PostgresDB
  Client->>McpProxy: tools/call request (toolName, args)
  McpProxy->>GuardrailRegistry: runGuardrails("pre-tool", ctx)
  GuardrailRegistry-->>McpProxy: tripped / pass
  McpProxy->>AuditModule: recordGuardrailTrip(params) (if tripped)
  AuditModule->>PostgresDB: insertEvent(semanticType="guardrail-trip", metadata)
  PostgresDB-->>AuditModule: insert result
  McpProxy-->>Client: JSON-RPC error "Tool call blocked by policy." (if tripped) or continue flow

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Suggested labels

skip-size-check

Poem

🐰 I hop through code and sniff the trails,
Input gates, output nets, and pre-tool rails,
Secrets shushed, tools kept tame, audits take a note,
Streams that trip get gently smote,
Hooray — the rabbit signs the log with a joyful quote.

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 66.67% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title 'feat: wire guardrails runtime end-to-end + secret-scan and forbidden-tools built-ins' accurately reflects the main changes: implementing guardrails runtime across three call sites and shipping two built-in guardrails. It is specific, concise, and clearly summarizes the primary deliverables.
Description check	✅ Passed	The description comprehensively covers objectives (wiring three call sites, built-ins, audit events), addresses test plan items (build, typecheck, unit tests, e2e test reproducer), documents known issues and fixes across three review rounds, and updates AGENTS.md. All required template sections are complete.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch feat/guardrails-runtime

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 4

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

packages/server/src/lobu/gateway.ts (1)

149-150: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Guardrails are wired after queue consumption starts, creating a boot-time bypass window.

orchestrator.start() starts MessageConsumer first, then guardrails are injected later. Any queued message processed in that window can skip input-stage guardrails.

Suggested reorder

-    logger.info('[Lobu] Starting embedded orchestrator');
-    orchestrator = new Orchestrator(gatewayConfig.orchestration);
-    await orchestrator.start();
-    logger.info('[Lobu] Embedded orchestrator started');
+    logger.info('[Lobu] Creating embedded orchestrator');
+    orchestrator = new Orchestrator(gatewayConfig.orchestration);
@@
     coreServices = gateway.getCoreServices();
     await orchestrator.injectCoreServices(
       coreServices.getSecretStore(),
       coreServices.getProviderCatalogService(),
       coreServices.getGrantStore() ?? undefined,
       coreServices.getPolicyStore() ?? undefined,
       coreServices.getGuardrailRegistry() ?? undefined,
       coreServices.getAgentSettingsStore() ?? undefined
     );
+    await orchestrator.start();
+    logger.info('[Lobu] Embedded orchestrator started');

As per coding guidelines: Guardrail wired call sites: MessageConsumer.handleMessage (input), ChatResponseBridge.handleDelta (output), and McpProxy.handleProxyRequest (pre-tool).

Also applies to: 175-182

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@packages/server/src/lobu/gateway.ts` around lines 149 - 150, The orchestrator
currently calls orchestrator.start() which begins the MessageConsumer before
guardrails are attached, allowing a boot-time bypass; move the guardrail wiring
so it runs before starting the orchestrator/consumer. Specifically, ensure calls
that inject/input-guardrails (the wiring code that sets up
MessageConsumer.handleMessage), output-guardrails for
ChatResponseBridge.handleDelta, and pre-tool guards for
McpProxy.handleProxyRequest are executed prior to invoking orchestrator.start()
(and similarly before any start/launch calls currently around the other block
referenced), so consumers cannot process messages before guardrails are in
place.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@packages/server/src/__tests__/guardrails-runtime.test.ts`:
- Line 18: Replace the timing-sensitive setTimeout/sleep usage in the
guardrails-runtime.test.ts tests with a deterministic polling wait (e.g., a
waitFor helper) that repeatedly queries for the expected audit row until it
appears or a timeout elapses; specifically, find the test(s) that call
setTimeout(50) or sleep (and the occurrences around the audit insert assertions)
and change them to call waitFor(async () => { const row = await
<your-audit-query>(); expect(row).toBeDefined(); }); so the test actively polls
the DB for the inserted row instead of sleeping (apply the same replacement for
the other occurrences noted near the end of the file).
- Around line 79-80: Replace the hard-coded token-like fixture string (the
literal containing "sk-abcdefghij0123456789AB", e.g. "here is your secret
sk-abcdefghij0123456789AB please") with a runtime-built value to avoid
checked-in credential-shaped literals; create a small test helper (e.g.,
buildFakeToken or generatePlaceholderToken) or concatenate parts (e.g., "sk-" +
"REDACTED" or join short random segments) and use that helper wherever the
literal appears so the test payload still looks like a token but no long
credential-shaped string is stored in the repo.

In `@packages/server/src/gateway/guardrails/audit.ts`:
- Line 41: Replace the non-deterministic Date.now() in the originId generation
with a retry-stable identifier: use an existing stable field on params (e.g.,
params.requestId or params.runId if available) as the suffix for originId, or if
none exists compute a deterministic hash from the stable inputs (e.g., hash of a
deterministic serialization of params.stage, params.guardrail, params.agentId
and any run-specific id) so originId =
`guardrail_trip_${params.stage}_${params.guardrail}_${params.agentId}_${stableId}`;
update the code at the originId assignment (the originId variable using params)
and ensure the hash/serialization routine is deterministic (sorted keys) and
uses only stable params so retries/replays produce the same originId.

In `@packages/server/src/gateway/guardrails/builtins.ts`:
- Line 31: The "openai-key" guardrail regex is too restrictive (only
alphanumerics) and can miss valid API keys like "sk-proj-…" or keys containing
hyphens/underscores; update the re value for the object with name "openai-key"
in builtins.ts to a broader pattern that still anchors on the "sk-" prefix but
allows additional hyphens and underscores and a reasonable variable length for
the remainder (i.e., permit characters like [A-Za-z0-9_-] and relax the
minimum/maximum count) so the guardrail catches project-scoped and variant key
formats.

---

Outside diff comments:
In `@packages/server/src/lobu/gateway.ts`:
- Around line 149-150: The orchestrator currently calls orchestrator.start()
which begins the MessageConsumer before guardrails are attached, allowing a
boot-time bypass; move the guardrail wiring so it runs before starting the
orchestrator/consumer. Specifically, ensure calls that inject/input-guardrails
(the wiring code that sets up MessageConsumer.handleMessage), output-guardrails
for ChatResponseBridge.handleDelta, and pre-tool guards for
McpProxy.handleProxyRequest are executed prior to invoking orchestrator.start()
(and similarly before any start/launch calls currently around the other block
referenced), so consumers cannot process messages before guardrails are in
place.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: 1ba66f5d-4bef-4773-94e3-2e60c484daef

📥 Commits

Reviewing files that changed from the base of the PR and between f8f087b and 7501ce5.

📒 Files selected for processing (10)

AGENTS.md
packages/server/src/__tests__/guardrails-runtime.test.ts
packages/server/src/gateway/auth/mcp/proxy.ts
packages/server/src/gateway/connections/chat-response-bridge.ts
packages/server/src/gateway/guardrails/audit.ts
packages/server/src/gateway/guardrails/builtins.ts
packages/server/src/gateway/orchestration/index.ts
packages/server/src/gateway/orchestration/message-consumer.ts
packages/server/src/gateway/services/core-services.ts
packages/server/src/lobu/gateway.ts

coderabbitai · 2026-05-19T14:41:46Z

+      text: 'here is your secret sk-abcdefghij0123456789AB please',
+      platform: 'api',


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Avoid checked-in credential-shaped literals in test fixtures.

These literals can trip secret scanners/push protection and create noisy security alerts. Build them at runtime instead of embedding full token-like strings.

Suggested diff

@@ - const outcome = await runGuardrails(registry, 'output', enabled, { + const openAiLike = `sk-${'a'.repeat(24)}`; + const outcome = await runGuardrails(registry, 'output', enabled, { agentId, userId: 'user-1', - text: 'here is your secret sk-abcdefghij0123456789AB please', + text: `here is your secret ${openAiLike} please`, platform: 'api', conversationId: 'conv-1', }); @@ - const outcome = await secretScanGuardrail.run({ + const githubPatLike = `ghp_${'a'.repeat(36)}`; + const outcome = await secretScanGuardrail.run({ agentId, userId: 'u', - text: 'token: ghp_abcdefghij0123456789ABCDEFGH01234567', + text: `token: ${githubPatLike}`, platform: 'api', }); @@ - const outcome = await secretScanGuardrail.run({ + const awsAccessKeyLike = `AKIA${'A'.repeat(16)}`; + const outcome = await secretScanGuardrail.run({ agentId, userId: 'u', - text: 'AWS_ACCESS_KEY_ID=AKIAIOSFODNN7EXAMPLE', + text: `AWS_ACCESS_KEY_ID=${awsAccessKeyLike}`, platform: 'api', });

Also applies to: 166-167, 177-178

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@packages/server/src/__tests__/guardrails-runtime.test.ts` around lines 79 - 80, Replace the hard-coded token-like fixture string (the literal containing "sk-abcdefghij0123456789AB", e.g. "here is your secret sk-abcdefghij0123456789AB please") with a runtime-built value to avoid checked-in credential-shaped literals; create a small test helper (e.g., buildFakeToken or generatePlaceholderToken) or concatenate parts (e.g., "sk-" + "REDACTED" or join short random segments) and use that helper wherever the literal appears so the test payload still looks like a token but no long credential-shaped string is stored in the repo.

coderabbitai · 2026-05-19T14:41:46Z

+    return;
+  }
+
+  const originId = `guardrail_trip_${params.stage}_${params.guardrail}_${params.agentId}_${Date.now()}`;


⚠️ Potential issue | 🟠 Major | 🏗️ Heavy lift

Use a deterministic, retry-stable originId instead of Date.now().

Date.now() makes the audit event identifier non-deterministic across retries/replays, which breaks deterministic origin semantics for this layer.

Suggested direction

interface RecordGuardrailTripParams { + tripId: string; // caller-provided stable id (message id / run id / request id) organizationId: string | undefined; agentId: string; @@ - const originId = `guardrail_trip_${params.stage}_${params.guardrail}_${params.agentId}_${Date.now()}`; + const originId = `guardrail_trip_${params.stage}_${params.guardrail}_${params.agentId}_${params.tripId}`;

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@packages/server/src/gateway/guardrails/audit.ts` at line 41, Replace the non-deterministic Date.now() in the originId generation with a retry-stable identifier: use an existing stable field on params (e.g., params.requestId or params.runId if available) as the suffix for originId, or if none exists compute a deterministic hash from the stable inputs (e.g., hash of a deterministic serialization of params.stage, params.guardrail, params.agentId and any run-specific id) so originId = `guardrail_trip_${params.stage}_${params.guardrail}_${params.agentId}_${stableId}`; update the code at the originId assignment (the originId variable using params) and ensure the hash/serialization routine is deterministic (sorted keys) and uses only stable params so retries/replays produce the same originId.

coderabbitai · 2026-05-19T14:41:46Z

+ * pattern is the slowest, so it runs last on a streaming hot path.
+ */
+const SECRET_PATTERNS: Array<{ name: string; re: RegExp }> = [
+  { name: "openai-key", re: /sk-[a-zA-Z0-9]{20,}/ },


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

🧩 Analysis chain

🌐 Web query:

What are the currently documented OpenAI API key prefixes/formats (including project-scoped keys) in official OpenAI documentation?

💡 Result:

OpenAI’s official documentation does not provide a single authoritative “master list” page enumerating every API key prefix literal string (including all legacy variants) in one place. However, the official OpenAI API reference pages do document the following API key formats via the “redacted_value” examples returned by key-management endpoints: 1) Project-scoped API keys (Project API keys) - Documented redacted format: “sk-…”, as shown in the Project API Key objects’ redacted_value (example: “sk-abc…def”) [1][2]. - These keys are represented as an object type “organization.project.api_key” in the Project API Keys API reference [3][2]. 2) Admin API keys (for Administration API) - Documented redacted format: “sk-admin…”, as shown in the Admin API Key objects’ redacted_value (example: “sk-admin…def”) [2]. - Admin API keys are explicitly for the Administration API and “cannot be used for non-administration endpoints” [4][5]. 3) How keys are used (not a prefix, but the official documented format in requests) - OpenAI authentication is documented as HTTP Bearer auth: Authorization: Bearer $OPENAI_API_KEY [6][7]. What is not explicitly confirmed by the cited official docs above - The official docs pages surfaced here show “sk-…” and “sk-admin…” (redacted examples), but they do not explicitly spell out the longer literal “sk-proj-…” prefix string in the official snippets we captured. (A community thread mentions “sk-proj-” concretely, but it is not official documentation.) [8][9] If you want, I can do one more targeted round to find an official OpenAI doc page that explicitly states the literal “sk-proj-…” prefix string (not just the redacted “sk-…“ form).

Citations:

1: https://developers.openai.com/api/reference/resources/admin/subresources/organization/subresources/projects/subresources/api_keys/methods/retrieve

2: https://platform.openai.com/docs/api-reference/admin-api-keys/list?referral_type=blog

3: https://platform.openai.com/docs/api-reference/project-api-keys

4: http://platform.openai.com/docs/api-reference/administration

5: https://developers.openai.com/api/docs/guides/admin-apis

6: https://platform.openai.com/docs/quickstart?api-=

7: https://platform.openai.com/docs/api-reference/assistants/createAssistant.html

8: https://community.openai.com/t/how-to-create-an-api-secret-key-with-prefix-sk-only-always-creates-sk-proj-keys/1263531

9: https://community.openai.com/t/openai-api-key-cannot-use-project-key-but-i-cannot-create-a-non-project-key/1286693

Broaden OpenAI key pattern to reduce false negatives in the security guardrail.

Official OpenAI documentation does not explicitly detail the full character set allowed in API keys—only redacted examples (sk-…, sk-admin…) are shown. Community discussions mention project-scoped key variants like sk-proj-, suggesting the format may be more varied than the current pattern allows. The current regex restricts characters to alphanumerics only, which risks missing keys that contain hyphens or underscores. Since this is a security guardrail, a defensive stance (broader pattern matching) is appropriate to avoid under-detection.

Suggested fix

- { name: "openai-key", re: /sk-[a-zA-Z0-9]{20,}/ }, + { name: "openai-key", re: /\bsk-[A-Za-z0-9_-]{20,}\b/ },

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

{ name: "openai-key", re: /sk-[a-zA-Z0-9]{20,}/ },

{ name: "openai-key", re: /\bsk-[A-Za-z0-9_-]{20,}\b/ },

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@packages/server/src/gateway/guardrails/builtins.ts` at line 31, The "openai-key" guardrail regex is too restrictive (only alphanumerics) and can miss valid API keys like "sk-proj-…" or keys containing hyphens/underscores; update the re value for the object with name "openai-key" in builtins.ts to a broader pattern that still anchors on the "sk-" prefix but allows additional hyphens and underscores and a reasonable variable length for the remainder (i.e., permit characters like [A-Za-z0-9_-] and relax the minimum/maximum count) so the guardrail catches project-scoped and variant key formats.

codecov-commenter · 2026-05-19T14:43:37Z

⚠️ Please install the to ensure uploads and comments are reliably processed by Codecov.

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

…allback, awaitable audit, typed metadata, real e2e Addresses 5 concerns from the codex review of PR #919. 1. Per-stream rolling buffer for output guardrails: secrets split across streaming chunks (e.g. "sk-an" then "t-...") used to bypass every regex. The bridge now keeps a 256-char tail per stream key and scans `(tail + delta)` so a pattern straddling a chunk boundary still matches. Tail is dropped on completion / full-replacement / guardrail trip so a blocked stream's bytes don't leak into the next stream on the same key. 2. Audit org-id fallback: a guardrail trip with no organizationId in platformMetadata used to silently skip the events row. Both the bridge and MessageConsumer now fall back to the connection record / agent metadata; only when both fail do we log at error level (a security log gap, not a warn). McpProxy adds the same fallback path via `agentSettingsStore.getMetadata`. 3. Real e2e against the wired services: the prior test only called `runGuardrails` directly. The rewritten suite constructs `ChatResponseBridge`, `MessageConsumer` (subclassed to expose `handleMessage` to the test), and `McpProxy` and drives their public methods. Assertions cover the block message delivery, the audit row shape, the split-chunk regression, and the org-id fallback. 4. Awaitable audit: `recordGuardrailTrip` now tracks every in-flight insert in a module-level set. Production call sites use `void` so the user-facing block isn't gated on the DB write; tests await `flushPendingGuardrailAudits()` to drain. The 50ms `setTimeout` from the prior test is gone. 5. Typed `PlatformMetadata`: new `packages/server/src/gateway/connections/platform-metadata.ts` declares the shape and ships `readPlatformMetadata()` / `platformMetadataString()` narrowing helpers. `chat-response-bridge.ts` no longer uses `(metadata as any).field` for connectionId, chatId, responseThreadId, agentId, organizationId, sessionReset, senderId etc.

coderabbitai

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

packages/server/src/gateway/auth/mcp/proxy.ts (1)
982-1083: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Extract pre-tool guardrails out of the approval-only branch.

This only runs when grantStore is configured, so deployments without approvals silently skip pre-tool guardrails entirely. It also leaves handleCallTool()'s REST path unguarded, which means policies like forbidden-tools are still bypassable through a different entrypoint. Please move this into a shared pre-tool gate and call it before approval/upstream dispatch in every tool-execution path.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@packages/server/src/gateway/auth/mcp/proxy.ts` around lines 982 - 1083, The
pre-tool guardrail logic is erroneously nested behind a grantStore check so
deployments without approvals and the REST path handled by handleCallTool() skip
guardrails; extract the pre-tool guardrail block (the logic using
this.guardrailRegistry, this.agentSettingsStore, runGuardrails,
recordGuardrailTrip, and the logger/response behavior) into a reusable helper
(e.g., preToolGuardrailCheck) that accepts agentId, tokenData, toolName and
toolArgs, and returns the guardrail outcome or null; then invoke that helper at
the top of every tool-execution path (both the approval/upstream dispatch flow
and the REST handleCallTool() path) before any approval or dispatch logic so
forbidden-tools and other policies are enforced uniformly across entry points.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Outside diff comments:
In `@packages/server/src/gateway/auth/mcp/proxy.ts`:
- Around line 982-1083: The pre-tool guardrail logic is erroneously nested
behind a grantStore check so deployments without approvals and the REST path
handled by handleCallTool() skip guardrails; extract the pre-tool guardrail
block (the logic using this.guardrailRegistry, this.agentSettingsStore,
runGuardrails, recordGuardrailTrip, and the logger/response behavior) into a
reusable helper (e.g., preToolGuardrailCheck) that accepts agentId, tokenData,
toolName and toolArgs, and returns the guardrail outcome or null; then invoke
that helper at the top of every tool-execution path (both the approval/upstream
dispatch flow and the REST handleCallTool() path) before any approval or
dispatch logic so forbidden-tools and other policies are enforced uniformly
across entry points.

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: b2c108ed-9293-4786-b210-650b78f2b87a

📥 Commits

Reviewing files that changed from the base of the PR and between 2eb575f and 8e9508c.

📒 Files selected for processing (4)

packages/server/src/gateway/auth/mcp/proxy.ts
packages/server/src/gateway/connections/chat-response-bridge.ts
packages/server/src/gateway/guardrails/audit.ts
packages/server/src/gateway/orchestration/message-consumer.ts

🚧 Files skipped from review as they are similar to previous changes (3)

packages/server/src/gateway/guardrails/audit.ts
packages/server/src/gateway/connections/chat-response-bridge.ts
packages/server/src/gateway/orchestration/message-consumer.ts

…tools built-ins Replaces the three guardrail TODO call sites with real runGuardrails() calls and ships the first two built-in guardrails registered at gateway boot. Wired call sites: - MessageConsumer.handleMessage (input stage) — on trip, dispatch is short-circuited and "Message rejected: <reason>" is pushed to the thread_response queue so the user sees rejection in-thread. - ChatResponseBridge.handleDelta (output stage, per streaming delta) — on trip, the in-flight platform stream is disposed, "Message blocked by guardrail: <reason>" is posted, and the rest of the worker's stream for that conversation is suppressed; partial buffer is not written to history. - McpProxy.handleProxyRequest (pre-tool stage) — on trip, the worker receives a JSON-RPC isError reply with the literal "Tool call blocked by policy." (specific reason intentionally not surfaced — evasion surface). All three call sites fail open on infrastructure errors. Built-ins (packages/server/src/gateway/guardrails/builtins.ts): - secret-scan (output) — regex scan for OpenAI/GitHub PAT/AWS access key and JWT-shaped tokens; cheap enough for per-delta scanning. - forbidden-tools (pre-tool) — hardcoded deny-list (delete_repo, delete_branch, drop_table). Audit trail: every trip writes a semantic_type='guardrail-trip' event row (origin_type='guardrail-<stage>', metadata {guardrail, stage, reason, agent_id, user_id, conversation_id}). Fire-and-forget — enforcement does not depend on the audit write. GuardrailRegistry now lives on CoreServices, populated with built-ins during initialize(). It is injected into McpProxy at construction and into MessageConsumer / ChatResponseBridge via setter — preserving the existing post-construction wiring shape. E2E gate: packages/server/src/__tests__/guardrails-runtime.test.ts boots PGlite, persists guardrails=['secret-scan'] on an agent via the real config store, exercises the runGuardrails path, verifies the trip outcome, and asserts the guardrail-trip event row lands in the events table with the documented metadata shape. 7/7 tests green.

…allback, awaitable audit, typed metadata, real e2e Addresses 5 concerns from the codex review of PR #919. 1. Per-stream rolling buffer for output guardrails: secrets split across streaming chunks (e.g. "sk-an" then "t-...") used to bypass every regex. The bridge now keeps a 256-char tail per stream key and scans `(tail + delta)` so a pattern straddling a chunk boundary still matches. Tail is dropped on completion / full-replacement / guardrail trip so a blocked stream's bytes don't leak into the next stream on the same key. 2. Audit org-id fallback: a guardrail trip with no organizationId in platformMetadata used to silently skip the events row. Both the bridge and MessageConsumer now fall back to the connection record / agent metadata; only when both fail do we log at error level (a security log gap, not a warn). McpProxy adds the same fallback path via `agentSettingsStore.getMetadata`. 3. Real e2e against the wired services: the prior test only called `runGuardrails` directly. The rewritten suite constructs `ChatResponseBridge`, `MessageConsumer` (subclassed to expose `handleMessage` to the test), and `McpProxy` and drives their public methods. Assertions cover the block message delivery, the audit row shape, the split-chunk regression, and the org-id fallback. 4. Awaitable audit: `recordGuardrailTrip` now tracks every in-flight insert in a module-level set. Production call sites use `void` so the user-facing block isn't gated on the DB write; tests await `flushPendingGuardrailAudits()` to drain. The 50ms `setTimeout` from the prior test is gone. 5. Typed `PlatformMetadata`: new `packages/server/src/gateway/connections/platform-metadata.ts` declares the shape and ships `readPlatformMetadata()` / `platformMetadataString()` narrowing helpers. `chat-response-bridge.ts` no longer uses `(metadata as any).field` for connectionId, chatId, responseThreadId, agentId, organizationId, sessionReset, senderId etc.

…dual platformMetadata casts 1. fullReplacement ordering in chat-response-bridge.ts. The rolling tail was being updated and scanned BEFORE the fullReplacement dispose, so when a replacement stream landed the scan ran against (previous-delta-tail + new-replacement-text). Joining two unrelated text bodies can synthesize a regex match that exists in neither piece on its own → false-positive trip on the first delta of an unrelated reply. Fix: dispose the prior stream and clear the tail BEFORE the scan + tail update. Order is now: 1. Skip if stream is already blocked 2. Detect isFullReplacement → dispose + clearGuardrailTail 3. updateGuardrailTail(delta) → scanText 4. runOutputGuardrails(scanText) 5. (trip → block) or (pass → forward to strategy.handleDelta) Regression test asserts the scenario: a prior delta ending "...sk-a" followed by a fullReplacement delta starting "bcdefghij..." (each safe in isolation, joined would match openai-key) must NOT trip. 2. Residual `payload.platformMetadata as Record<string, unknown> | undefined` casts in production code. Three call sites in chat-response-bridge.ts (handleError, handleStatusUpdate, handleEphemeral) plus one `data?.platformMetadata?.traceparent as string | undefined` in message-consumer.ts. Replaced with `readPlatformMetadata(...)` / `platformMetadataString(...)` from connections/platform-metadata.ts. resolveTarget and buildCurrentMessageFromMetadata signatures updated to accept `PlatformMetadata` directly. No production-code `platformMetadata as Record` or `platformMetadata as any` casts remain; the two `(chat as any).createThread` calls in resolveTarget are unrelated (runtime-accessed Chat SDK method, not metadata). Validation: - make build-packages clean - bun run typecheck clean - LOBU_TEST_BACKEND=pglite vitest src/__tests__/guardrails-runtime.test.ts → 7/7 pass - bun test src/gateway/__tests__/ → 887/887 pass (no regressions)

… comments - chat-response-bridge: collapse updateGuardrailTail + clearGuardrailTail into a single scanWindowWithTail() helper. The two-step (compute-next-tail then compute-scan-slice) was the same window computed twice with slightly different conditionals; one slice does it. clearGuardrailTail had three call sites that now inline this.guardrailTails.delete(key) directly — the helper added no meaning over the underlying Map call. - Trim doc/comment text that read as "this PR did X" rather than describing an invariant: per-method docs on resolveAgentId/ resolveOrganizationId, the file-level audit.ts block, the fire-and-forget repetitions in proxy.ts + message-consumer.ts + bridge. The ordering rationale on isFullReplacement is kept (terse) because the regression test exercises exactly that order. - No behavior change. 7/7 guardrails-runtime tests + 887/887 server gateway tests green. The pre-fix regression test still fails when the fullReplacement ordering is reverted — confirmed.

Bumps packages/owletto from 970eb500 → 4f7c7571 (origin/main), bringing in the per-agent guardrail toggle + recent trips view (owletto PR #194). Companion to the guardrails runtime work merged in #919 (call sites + secret-scan + forbidden-tools built-ins) and #915 (schema + judge engine + pii-scan). With this bump the guardrails UI ships end-to-end.

…lock image builds (#927) PR #911 added `examples/personal-finance` to root `package.json`'s `workspaces` field but didn't update the Dockerfiles, which only COPY `packages/*/package.json` for the install layer. `bun install` inside the Docker build then errored: error: Workspace not found "examples/personal-finance" at /app/package.json:8:5 Every image build on `main` since #911 merged (13:25 UTC today) has been red: #911 → #913 (+revert) → #914 → #915 → #919 → #923 → #924 → #912 → #925 — all sitting on `main` un-deployable, including the `principal_kind` migration from #923 and my own loading-skeletons shipping artifacts. Two ways to fix it: 1. **Add stubs to all three Dockerfiles** for the example. Treats the symptom; couples prod build pipeline to whatever's under `examples/`, wrong direction. 2. **Take the example out of root workspaces.** Examples are documentation/demos for users to clone + run; they don't belong in the prod build graph. Cleaner separation. Going with (2). Side effects: - Example's dependency on `@lobu/promptfoo-provider` switched from `workspace:*` (workspace-protocol-only) to `file:../../packages/promptfoo-provider`. Resolves locally without requiring the example to be in a workspace; consumers run `cd examples/personal-finance && bun install` standalone (after building the provider once: `cd packages/promptfoo-provider && bun run build`). - `bun.lock` regenerated. Most of the diff is bun's "linked workspaces" table shrinking — no upstream version churn. Verified: simulated Docker build context (root files + stubbed packages/* manifests + provider stub, no examples/) runs `bun install` cleanly. No "Workspace not found" error.

coderabbitai Bot reviewed May 19, 2026

View reviewed changes

buremba added 4 commits May 19, 2026 17:02

buremba force-pushed the feat/guardrails-runtime branch from 8e9508c to b42f368 Compare May 19, 2026 16:03

buremba merged commit a66c00d into main May 19, 2026
2 checks passed

buremba deleted the feat/guardrails-runtime branch May 19, 2026 16:03

This was referenced May 19, 2026

chore(main): release lobu 8.0.0 #897

Merged

chore: bump owletto submodule to ship guardrails UI #924

Merged

buremba mentioned this pull request May 19, 2026

fix(build): drop examples/personal-finance from root workspaces — unblock image builds #927

Merged

2 tasks

coderabbitai Bot mentioned this pull request May 20, 2026

fix(gateway): surface worker failures to chat clients as terminal errors (#946) #971

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: wire guardrails runtime end-to-end + secret-scan and forbidden-tools built-ins#919

feat: wire guardrails runtime end-to-end + secret-scan and forbidden-tools built-ins#919
buremba merged 4 commits into
mainfrom
feat/guardrails-runtime

buremba commented May 19, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented May 19, 2026 •

edited

Loading

Rate limit exceeded

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Suggested labels

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

coderabbitai Bot May 19, 2026

Uh oh!

coderabbitai Bot May 19, 2026

Uh oh!

coderabbitai Bot May 19, 2026

Uh oh!

codecov-commenter commented May 19, 2026

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		text: 'here is your secret sk-abcdefghij0123456789AB please',
		platform: 'api',

	{ name: "openai-key", re: /sk-[a-zA-Z0-9]{20,}/ },
	{ name: "openai-key", re: /\bsk-[A-Za-z0-9_-]{20,}\b/ },

Conversation

buremba commented May 19, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Codex review rounds

Round 1 (commit b90dbdc0)

Round 2 (commit 2eb575f7)

Round 3 simplification pass (commit 8e9508ca)

Reproducer

Test plan

Summary by CodeRabbit

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented May 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Rate limit exceeded

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Suggested labels

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai Bot May 19, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 19, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 19, 2026

Choose a reason for hiding this comment

Uh oh!

codecov-commenter commented May 19, 2026

Codecov Report

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

buremba commented May 19, 2026 •

edited by coderabbitai Bot

Loading

Round 1 (commit `b90dbdc0`)

Round 2 (commit `2eb575f7`)

Round 3 simplification pass (commit `8e9508ca`)

coderabbitai Bot commented May 19, 2026 •

edited

Loading