feat: wire guardrails runtime end-to-end + secret-scan and forbidden-tools built-ins#919
Conversation
|
Warning Rate limit exceeded
You’ve run out of usage credits. Purchase more in the billing tab. ⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Plus Run ID: 📒 Files selected for processing (11)
📝 WalkthroughWalkthroughThis PR implements a three-stage guardrails enforcement system for the Lobu gateway: input-stage blocking (message consumer), output-stage redaction (chat stream), and pre-tool-stage denial (MCP proxy). The system includes audit trails, two built-in guardrails (secret-scan and forbidden-tools), core bootstrap wiring, and comprehensive E2E tests. ChangesGuardrails Enforcement System
Sequence Diagram(s)sequenceDiagram
participant Client
participant McpProxy
participant GuardrailRegistry
participant AuditModule
participant PostgresDB
Client->>McpProxy: tools/call request (toolName, args)
McpProxy->>GuardrailRegistry: runGuardrails("pre-tool", ctx)
GuardrailRegistry-->>McpProxy: tripped / pass
McpProxy->>AuditModule: recordGuardrailTrip(params) (if tripped)
AuditModule->>PostgresDB: insertEvent(semanticType="guardrail-trip", metadata)
PostgresDB-->>AuditModule: insert result
McpProxy-->>Client: JSON-RPC error "Tool call blocked by policy." (if tripped) or continue flow
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Suggested labels
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 4
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
packages/server/src/lobu/gateway.ts (1)
149-150:⚠️ Potential issue | 🟠 Major | ⚡ Quick winGuardrails are wired after queue consumption starts, creating a boot-time bypass window.
orchestrator.start()startsMessageConsumerfirst, then guardrails are injected later. Any queued message processed in that window can skip input-stage guardrails.Suggested reorder
- logger.info('[Lobu] Starting embedded orchestrator'); - orchestrator = new Orchestrator(gatewayConfig.orchestration); - await orchestrator.start(); - logger.info('[Lobu] Embedded orchestrator started'); + logger.info('[Lobu] Creating embedded orchestrator'); + orchestrator = new Orchestrator(gatewayConfig.orchestration); @@ coreServices = gateway.getCoreServices(); await orchestrator.injectCoreServices( coreServices.getSecretStore(), coreServices.getProviderCatalogService(), coreServices.getGrantStore() ?? undefined, coreServices.getPolicyStore() ?? undefined, coreServices.getGuardrailRegistry() ?? undefined, coreServices.getAgentSettingsStore() ?? undefined ); + await orchestrator.start(); + logger.info('[Lobu] Embedded orchestrator started');As per coding guidelines:
Guardrail wired call sites: MessageConsumer.handleMessage (input), ChatResponseBridge.handleDelta (output), and McpProxy.handleProxyRequest (pre-tool).Also applies to: 175-182
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@packages/server/src/lobu/gateway.ts` around lines 149 - 150, The orchestrator currently calls orchestrator.start() which begins the MessageConsumer before guardrails are attached, allowing a boot-time bypass; move the guardrail wiring so it runs before starting the orchestrator/consumer. Specifically, ensure calls that inject/input-guardrails (the wiring code that sets up MessageConsumer.handleMessage), output-guardrails for ChatResponseBridge.handleDelta, and pre-tool guards for McpProxy.handleProxyRequest are executed prior to invoking orchestrator.start() (and similarly before any start/launch calls currently around the other block referenced), so consumers cannot process messages before guardrails are in place.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@packages/server/src/__tests__/guardrails-runtime.test.ts`:
- Line 18: Replace the timing-sensitive setTimeout/sleep usage in the
guardrails-runtime.test.ts tests with a deterministic polling wait (e.g., a
waitFor helper) that repeatedly queries for the expected audit row until it
appears or a timeout elapses; specifically, find the test(s) that call
setTimeout(50) or sleep (and the occurrences around the audit insert assertions)
and change them to call waitFor(async () => { const row = await
<your-audit-query>(); expect(row).toBeDefined(); }); so the test actively polls
the DB for the inserted row instead of sleeping (apply the same replacement for
the other occurrences noted near the end of the file).
- Around line 79-80: Replace the hard-coded token-like fixture string (the
literal containing "sk-abcdefghij0123456789AB", e.g. "here is your secret
sk-abcdefghij0123456789AB please") with a runtime-built value to avoid
checked-in credential-shaped literals; create a small test helper (e.g.,
buildFakeToken or generatePlaceholderToken) or concatenate parts (e.g., "sk-" +
"REDACTED" or join short random segments) and use that helper wherever the
literal appears so the test payload still looks like a token but no long
credential-shaped string is stored in the repo.
In `@packages/server/src/gateway/guardrails/audit.ts`:
- Line 41: Replace the non-deterministic Date.now() in the originId generation
with a retry-stable identifier: use an existing stable field on params (e.g.,
params.requestId or params.runId if available) as the suffix for originId, or if
none exists compute a deterministic hash from the stable inputs (e.g., hash of a
deterministic serialization of params.stage, params.guardrail, params.agentId
and any run-specific id) so originId =
`guardrail_trip_${params.stage}_${params.guardrail}_${params.agentId}_${stableId}`;
update the code at the originId assignment (the originId variable using params)
and ensure the hash/serialization routine is deterministic (sorted keys) and
uses only stable params so retries/replays produce the same originId.
In `@packages/server/src/gateway/guardrails/builtins.ts`:
- Line 31: The "openai-key" guardrail regex is too restrictive (only
alphanumerics) and can miss valid API keys like "sk-proj-…" or keys containing
hyphens/underscores; update the re value for the object with name "openai-key"
in builtins.ts to a broader pattern that still anchors on the "sk-" prefix but
allows additional hyphens and underscores and a reasonable variable length for
the remainder (i.e., permit characters like [A-Za-z0-9_-] and relax the
minimum/maximum count) so the guardrail catches project-scoped and variant key
formats.
---
Outside diff comments:
In `@packages/server/src/lobu/gateway.ts`:
- Around line 149-150: The orchestrator currently calls orchestrator.start()
which begins the MessageConsumer before guardrails are attached, allowing a
boot-time bypass; move the guardrail wiring so it runs before starting the
orchestrator/consumer. Specifically, ensure calls that inject/input-guardrails
(the wiring code that sets up MessageConsumer.handleMessage), output-guardrails
for ChatResponseBridge.handleDelta, and pre-tool guards for
McpProxy.handleProxyRequest are executed prior to invoking orchestrator.start()
(and similarly before any start/launch calls currently around the other block
referenced), so consumers cannot process messages before guardrails are in
place.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro Plus
Run ID: 1ba66f5d-4bef-4773-94e3-2e60c484daef
📒 Files selected for processing (10)
AGENTS.mdpackages/server/src/__tests__/guardrails-runtime.test.tspackages/server/src/gateway/auth/mcp/proxy.tspackages/server/src/gateway/connections/chat-response-bridge.tspackages/server/src/gateway/guardrails/audit.tspackages/server/src/gateway/guardrails/builtins.tspackages/server/src/gateway/orchestration/index.tspackages/server/src/gateway/orchestration/message-consumer.tspackages/server/src/gateway/services/core-services.tspackages/server/src/lobu/gateway.ts
| text: 'here is your secret sk-abcdefghij0123456789AB please', | ||
| platform: 'api', |
There was a problem hiding this comment.
Avoid checked-in credential-shaped literals in test fixtures.
These literals can trip secret scanners/push protection and create noisy security alerts. Build them at runtime instead of embedding full token-like strings.
Suggested diff
@@
- const outcome = await runGuardrails(registry, 'output', enabled, {
+ const openAiLike = `sk-${'a'.repeat(24)}`;
+ const outcome = await runGuardrails(registry, 'output', enabled, {
agentId,
userId: 'user-1',
- text: 'here is your secret sk-abcdefghij0123456789AB please',
+ text: `here is your secret ${openAiLike} please`,
platform: 'api',
conversationId: 'conv-1',
});
@@
- const outcome = await secretScanGuardrail.run({
+ const githubPatLike = `ghp_${'a'.repeat(36)}`;
+ const outcome = await secretScanGuardrail.run({
agentId,
userId: 'u',
- text: 'token: ghp_abcdefghij0123456789ABCDEFGH01234567',
+ text: `token: ${githubPatLike}`,
platform: 'api',
});
@@
- const outcome = await secretScanGuardrail.run({
+ const awsAccessKeyLike = `AKIA${'A'.repeat(16)}`;
+ const outcome = await secretScanGuardrail.run({
agentId,
userId: 'u',
- text: 'AWS_ACCESS_KEY_ID=AKIAIOSFODNN7EXAMPLE',
+ text: `AWS_ACCESS_KEY_ID=${awsAccessKeyLike}`,
platform: 'api',
});Also applies to: 166-167, 177-178
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@packages/server/src/__tests__/guardrails-runtime.test.ts` around lines 79 -
80, Replace the hard-coded token-like fixture string (the literal containing
"sk-abcdefghij0123456789AB", e.g. "here is your secret sk-abcdefghij0123456789AB
please") with a runtime-built value to avoid checked-in credential-shaped
literals; create a small test helper (e.g., buildFakeToken or
generatePlaceholderToken) or concatenate parts (e.g., "sk-" + "REDACTED" or join
short random segments) and use that helper wherever the literal appears so the
test payload still looks like a token but no long credential-shaped string is
stored in the repo.
| return; | ||
| } | ||
|
|
||
| const originId = `guardrail_trip_${params.stage}_${params.guardrail}_${params.agentId}_${Date.now()}`; |
There was a problem hiding this comment.
Use a deterministic, retry-stable originId instead of Date.now().
Date.now() makes the audit event identifier non-deterministic across retries/replays, which breaks deterministic origin semantics for this layer.
Suggested direction
interface RecordGuardrailTripParams {
+ tripId: string; // caller-provided stable id (message id / run id / request id)
organizationId: string | undefined;
agentId: string;
@@
- const originId = `guardrail_trip_${params.stage}_${params.guardrail}_${params.agentId}_${Date.now()}`;
+ const originId = `guardrail_trip_${params.stage}_${params.guardrail}_${params.agentId}_${params.tripId}`;🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@packages/server/src/gateway/guardrails/audit.ts` at line 41, Replace the
non-deterministic Date.now() in the originId generation with a retry-stable
identifier: use an existing stable field on params (e.g., params.requestId or
params.runId if available) as the suffix for originId, or if none exists compute
a deterministic hash from the stable inputs (e.g., hash of a deterministic
serialization of params.stage, params.guardrail, params.agentId and any
run-specific id) so originId =
`guardrail_trip_${params.stage}_${params.guardrail}_${params.agentId}_${stableId}`;
update the code at the originId assignment (the originId variable using params)
and ensure the hash/serialization routine is deterministic (sorted keys) and
uses only stable params so retries/replays produce the same originId.
| * pattern is the slowest, so it runs last on a streaming hot path. | ||
| */ | ||
| const SECRET_PATTERNS: Array<{ name: string; re: RegExp }> = [ | ||
| { name: "openai-key", re: /sk-[a-zA-Z0-9]{20,}/ }, |
There was a problem hiding this comment.
🧩 Analysis chain
🌐 Web query:
What are the currently documented OpenAI API key prefixes/formats (including project-scoped keys) in official OpenAI documentation?
💡 Result:
OpenAI’s official documentation does not provide a single authoritative “master list” page enumerating every API key prefix literal string (including all legacy variants) in one place. However, the official OpenAI API reference pages do document the following API key formats via the “redacted_value” examples returned by key-management endpoints: 1) Project-scoped API keys (Project API keys) - Documented redacted format: “sk-…”, as shown in the Project API Key objects’ redacted_value (example: “sk-abc…def”) [1][2]. - These keys are represented as an object type “organization.project.api_key” in the Project API Keys API reference [3][2]. 2) Admin API keys (for Administration API) - Documented redacted format: “sk-admin…”, as shown in the Admin API Key objects’ redacted_value (example: “sk-admin…def”) [2]. - Admin API keys are explicitly for the Administration API and “cannot be used for non-administration endpoints” [4][5]. 3) How keys are used (not a prefix, but the official documented format in requests) - OpenAI authentication is documented as HTTP Bearer auth: Authorization: Bearer $OPENAI_API_KEY [6][7]. What is not explicitly confirmed by the cited official docs above - The official docs pages surfaced here show “sk-…” and “sk-admin…” (redacted examples), but they do not explicitly spell out the longer literal “sk-proj-…” prefix string in the official snippets we captured. (A community thread mentions “sk-proj-” concretely, but it is not official documentation.) [8][9] If you want, I can do one more targeted round to find an official OpenAI doc page that explicitly states the literal “sk-proj-…” prefix string (not just the redacted “sk-…“ form).
Citations:
- 1: https://developers.openai.com/api/reference/resources/admin/subresources/organization/subresources/projects/subresources/api_keys/methods/retrieve
- 2: https://platform.openai.com/docs/api-reference/admin-api-keys/list?referral_type=blog
- 3: https://platform.openai.com/docs/api-reference/project-api-keys
- 4: http://platform.openai.com/docs/api-reference/administration
- 5: https://developers.openai.com/api/docs/guides/admin-apis
- 6: https://platform.openai.com/docs/quickstart?api-=
- 7: https://platform.openai.com/docs/api-reference/assistants/createAssistant.html
- 8: https://community.openai.com/t/how-to-create-an-api-secret-key-with-prefix-sk-only-always-creates-sk-proj-keys/1263531
- 9: https://community.openai.com/t/openai-api-key-cannot-use-project-key-but-i-cannot-create-a-non-project-key/1286693
Broaden OpenAI key pattern to reduce false negatives in the security guardrail.
Official OpenAI documentation does not explicitly detail the full character set allowed in API keys—only redacted examples (sk-…, sk-admin…) are shown. Community discussions mention project-scoped key variants like sk-proj-, suggesting the format may be more varied than the current pattern allows. The current regex restricts characters to alphanumerics only, which risks missing keys that contain hyphens or underscores. Since this is a security guardrail, a defensive stance (broader pattern matching) is appropriate to avoid under-detection.
Suggested fix
- { name: "openai-key", re: /sk-[a-zA-Z0-9]{20,}/ },
+ { name: "openai-key", re: /\bsk-[A-Za-z0-9_-]{20,}\b/ },📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| { name: "openai-key", re: /sk-[a-zA-Z0-9]{20,}/ }, | |
| { name: "openai-key", re: /\bsk-[A-Za-z0-9_-]{20,}\b/ }, |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@packages/server/src/gateway/guardrails/builtins.ts` at line 31, The
"openai-key" guardrail regex is too restrictive (only alphanumerics) and can
miss valid API keys like "sk-proj-…" or keys containing hyphens/underscores;
update the re value for the object with name "openai-key" in builtins.ts to a
broader pattern that still anchors on the "sk-" prefix but allows additional
hyphens and underscores and a reasonable variable length for the remainder
(i.e., permit characters like [A-Za-z0-9_-] and relax the minimum/maximum count)
so the guardrail catches project-scoped and variant key formats.
|
Codecov Report✅ All modified and coverable lines are covered by tests. 📢 Thoughts on this report? Let us know! |
…allback, awaitable audit, typed metadata, real e2e Addresses 5 concerns from the codex review of PR #919. 1. Per-stream rolling buffer for output guardrails: secrets split across streaming chunks (e.g. "sk-an" then "t-...") used to bypass every regex. The bridge now keeps a 256-char tail per stream key and scans `(tail + delta)` so a pattern straddling a chunk boundary still matches. Tail is dropped on completion / full-replacement / guardrail trip so a blocked stream's bytes don't leak into the next stream on the same key. 2. Audit org-id fallback: a guardrail trip with no organizationId in platformMetadata used to silently skip the events row. Both the bridge and MessageConsumer now fall back to the connection record / agent metadata; only when both fail do we log at error level (a security log gap, not a warn). McpProxy adds the same fallback path via `agentSettingsStore.getMetadata`. 3. Real e2e against the wired services: the prior test only called `runGuardrails` directly. The rewritten suite constructs `ChatResponseBridge`, `MessageConsumer` (subclassed to expose `handleMessage` to the test), and `McpProxy` and drives their public methods. Assertions cover the block message delivery, the audit row shape, the split-chunk regression, and the org-id fallback. 4. Awaitable audit: `recordGuardrailTrip` now tracks every in-flight insert in a module-level set. Production call sites use `void` so the user-facing block isn't gated on the DB write; tests await `flushPendingGuardrailAudits()` to drain. The 50ms `setTimeout` from the prior test is gone. 5. Typed `PlatformMetadata`: new `packages/server/src/gateway/connections/platform-metadata.ts` declares the shape and ships `readPlatformMetadata()` / `platformMetadataString()` narrowing helpers. `chat-response-bridge.ts` no longer uses `(metadata as any).field` for connectionId, chatId, responseThreadId, agentId, organizationId, sessionReset, senderId etc.
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
packages/server/src/gateway/auth/mcp/proxy.ts (1)
982-1083:⚠️ Potential issue | 🟠 Major | ⚡ Quick winExtract pre-tool guardrails out of the approval-only branch.
This only runs when
grantStoreis configured, so deployments without approvals silently skip pre-tool guardrails entirely. It also leaveshandleCallTool()'s REST path unguarded, which means policies likeforbidden-toolsare still bypassable through a different entrypoint. Please move this into a shared pre-tool gate and call it before approval/upstream dispatch in every tool-execution path.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@packages/server/src/gateway/auth/mcp/proxy.ts` around lines 982 - 1083, The pre-tool guardrail logic is erroneously nested behind a grantStore check so deployments without approvals and the REST path handled by handleCallTool() skip guardrails; extract the pre-tool guardrail block (the logic using this.guardrailRegistry, this.agentSettingsStore, runGuardrails, recordGuardrailTrip, and the logger/response behavior) into a reusable helper (e.g., preToolGuardrailCheck) that accepts agentId, tokenData, toolName and toolArgs, and returns the guardrail outcome or null; then invoke that helper at the top of every tool-execution path (both the approval/upstream dispatch flow and the REST handleCallTool() path) before any approval or dispatch logic so forbidden-tools and other policies are enforced uniformly across entry points.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Outside diff comments:
In `@packages/server/src/gateway/auth/mcp/proxy.ts`:
- Around line 982-1083: The pre-tool guardrail logic is erroneously nested
behind a grantStore check so deployments without approvals and the REST path
handled by handleCallTool() skip guardrails; extract the pre-tool guardrail
block (the logic using this.guardrailRegistry, this.agentSettingsStore,
runGuardrails, recordGuardrailTrip, and the logger/response behavior) into a
reusable helper (e.g., preToolGuardrailCheck) that accepts agentId, tokenData,
toolName and toolArgs, and returns the guardrail outcome or null; then invoke
that helper at the top of every tool-execution path (both the approval/upstream
dispatch flow and the REST handleCallTool() path) before any approval or
dispatch logic so forbidden-tools and other policies are enforced uniformly
across entry points.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro Plus
Run ID: b2c108ed-9293-4786-b210-650b78f2b87a
📒 Files selected for processing (4)
packages/server/src/gateway/auth/mcp/proxy.tspackages/server/src/gateway/connections/chat-response-bridge.tspackages/server/src/gateway/guardrails/audit.tspackages/server/src/gateway/orchestration/message-consumer.ts
🚧 Files skipped from review as they are similar to previous changes (3)
- packages/server/src/gateway/guardrails/audit.ts
- packages/server/src/gateway/connections/chat-response-bridge.ts
- packages/server/src/gateway/orchestration/message-consumer.ts
…tools built-ins
Replaces the three guardrail TODO call sites with real runGuardrails() calls
and ships the first two built-in guardrails registered at gateway boot.
Wired call sites:
- MessageConsumer.handleMessage (input stage) — on trip, dispatch is
short-circuited and "Message rejected: <reason>" is pushed to the
thread_response queue so the user sees rejection in-thread.
- ChatResponseBridge.handleDelta (output stage, per streaming delta) — on
trip, the in-flight platform stream is disposed, "Message blocked by
guardrail: <reason>" is posted, and the rest of the worker's stream for
that conversation is suppressed; partial buffer is not written to history.
- McpProxy.handleProxyRequest (pre-tool stage) — on trip, the worker
receives a JSON-RPC isError reply with the literal "Tool call blocked by
policy." (specific reason intentionally not surfaced — evasion surface).
All three call sites fail open on infrastructure errors.
Built-ins (packages/server/src/gateway/guardrails/builtins.ts):
- secret-scan (output) — regex scan for OpenAI/GitHub PAT/AWS access key
and JWT-shaped tokens; cheap enough for per-delta scanning.
- forbidden-tools (pre-tool) — hardcoded deny-list (delete_repo,
delete_branch, drop_table).
Audit trail: every trip writes a semantic_type='guardrail-trip' event row
(origin_type='guardrail-<stage>', metadata {guardrail, stage, reason,
agent_id, user_id, conversation_id}). Fire-and-forget — enforcement does
not depend on the audit write.
GuardrailRegistry now lives on CoreServices, populated with built-ins
during initialize(). It is injected into McpProxy at construction and into
MessageConsumer / ChatResponseBridge via setter — preserving the existing
post-construction wiring shape.
E2E gate: packages/server/src/__tests__/guardrails-runtime.test.ts boots
PGlite, persists guardrails=['secret-scan'] on an agent via the real
config store, exercises the runGuardrails path, verifies the trip outcome,
and asserts the guardrail-trip event row lands in the events table with
the documented metadata shape. 7/7 tests green.
…allback, awaitable audit, typed metadata, real e2e Addresses 5 concerns from the codex review of PR #919. 1. Per-stream rolling buffer for output guardrails: secrets split across streaming chunks (e.g. "sk-an" then "t-...") used to bypass every regex. The bridge now keeps a 256-char tail per stream key and scans `(tail + delta)` so a pattern straddling a chunk boundary still matches. Tail is dropped on completion / full-replacement / guardrail trip so a blocked stream's bytes don't leak into the next stream on the same key. 2. Audit org-id fallback: a guardrail trip with no organizationId in platformMetadata used to silently skip the events row. Both the bridge and MessageConsumer now fall back to the connection record / agent metadata; only when both fail do we log at error level (a security log gap, not a warn). McpProxy adds the same fallback path via `agentSettingsStore.getMetadata`. 3. Real e2e against the wired services: the prior test only called `runGuardrails` directly. The rewritten suite constructs `ChatResponseBridge`, `MessageConsumer` (subclassed to expose `handleMessage` to the test), and `McpProxy` and drives their public methods. Assertions cover the block message delivery, the audit row shape, the split-chunk regression, and the org-id fallback. 4. Awaitable audit: `recordGuardrailTrip` now tracks every in-flight insert in a module-level set. Production call sites use `void` so the user-facing block isn't gated on the DB write; tests await `flushPendingGuardrailAudits()` to drain. The 50ms `setTimeout` from the prior test is gone. 5. Typed `PlatformMetadata`: new `packages/server/src/gateway/connections/platform-metadata.ts` declares the shape and ships `readPlatformMetadata()` / `platformMetadataString()` narrowing helpers. `chat-response-bridge.ts` no longer uses `(metadata as any).field` for connectionId, chatId, responseThreadId, agentId, organizationId, sessionReset, senderId etc.
…dual platformMetadata casts
1. fullReplacement ordering in chat-response-bridge.ts. The rolling tail was
being updated and scanned BEFORE the fullReplacement dispose, so when a
replacement stream landed the scan ran against
(previous-delta-tail + new-replacement-text). Joining two unrelated
text bodies can synthesize a regex match that exists in neither piece
on its own → false-positive trip on the first delta of an unrelated
reply. Fix: dispose the prior stream and clear the tail BEFORE the
scan + tail update. Order is now:
1. Skip if stream is already blocked
2. Detect isFullReplacement → dispose + clearGuardrailTail
3. updateGuardrailTail(delta) → scanText
4. runOutputGuardrails(scanText)
5. (trip → block) or (pass → forward to strategy.handleDelta)
Regression test asserts the scenario: a prior delta ending "...sk-a"
followed by a fullReplacement delta starting "bcdefghij..." (each safe
in isolation, joined would match openai-key) must NOT trip.
2. Residual `payload.platformMetadata as Record<string, unknown> | undefined`
casts in production code. Three call sites in chat-response-bridge.ts
(handleError, handleStatusUpdate, handleEphemeral) plus one
`data?.platformMetadata?.traceparent as string | undefined` in
message-consumer.ts. Replaced with `readPlatformMetadata(...)` /
`platformMetadataString(...)` from connections/platform-metadata.ts.
resolveTarget and buildCurrentMessageFromMetadata signatures updated
to accept `PlatformMetadata` directly. No production-code
`platformMetadata as Record` or `platformMetadata as any` casts
remain; the two `(chat as any).createThread` calls in resolveTarget
are unrelated (runtime-accessed Chat SDK method, not metadata).
Validation:
- make build-packages clean
- bun run typecheck clean
- LOBU_TEST_BACKEND=pglite vitest src/__tests__/guardrails-runtime.test.ts → 7/7 pass
- bun test src/gateway/__tests__/ → 887/887 pass (no regressions)
… comments - chat-response-bridge: collapse updateGuardrailTail + clearGuardrailTail into a single scanWindowWithTail() helper. The two-step (compute-next-tail then compute-scan-slice) was the same window computed twice with slightly different conditionals; one slice does it. clearGuardrailTail had three call sites that now inline this.guardrailTails.delete(key) directly — the helper added no meaning over the underlying Map call. - Trim doc/comment text that read as "this PR did X" rather than describing an invariant: per-method docs on resolveAgentId/ resolveOrganizationId, the file-level audit.ts block, the fire-and-forget repetitions in proxy.ts + message-consumer.ts + bridge. The ordering rationale on isFullReplacement is kept (terse) because the regression test exercises exactly that order. - No behavior change. 7/7 guardrails-runtime tests + 887/887 server gateway tests green. The pre-fix regression test still fails when the fullReplacement ordering is reverted — confirmed.
8e9508c to
b42f368
Compare
Bumps packages/owletto from 970eb500 → 4f7c7571 (origin/main), bringing in the per-agent guardrail toggle + recent trips view (owletto PR #194). Companion to the guardrails runtime work merged in #919 (call sites + secret-scan + forbidden-tools built-ins) and #915 (schema + judge engine + pii-scan). With this bump the guardrails UI ships end-to-end.
…lock image builds (#927) PR #911 added `examples/personal-finance` to root `package.json`'s `workspaces` field but didn't update the Dockerfiles, which only COPY `packages/*/package.json` for the install layer. `bun install` inside the Docker build then errored: error: Workspace not found "examples/personal-finance" at /app/package.json:8:5 Every image build on `main` since #911 merged (13:25 UTC today) has been red: #911 → #913 (+revert) → #914 → #915 → #919 → #923 → #924 → #912 → #925 — all sitting on `main` un-deployable, including the `principal_kind` migration from #923 and my own loading-skeletons shipping artifacts. Two ways to fix it: 1. **Add stubs to all three Dockerfiles** for the example. Treats the symptom; couples prod build pipeline to whatever's under `examples/`, wrong direction. 2. **Take the example out of root workspaces.** Examples are documentation/demos for users to clone + run; they don't belong in the prod build graph. Cleaner separation. Going with (2). Side effects: - Example's dependency on `@lobu/promptfoo-provider` switched from `workspace:*` (workspace-protocol-only) to `file:../../packages/promptfoo-provider`. Resolves locally without requiring the example to be in a workspace; consumers run `cd examples/personal-finance && bun install` standalone (after building the provider once: `cd packages/promptfoo-provider && bun run build`). - `bun.lock` regenerated. Most of the diff is bun's "linked workspaces" table shrinking — no upstream version churn. Verified: simulated Docker build context (root files + stubbed packages/* manifests + provider stub, no examples/) runs `bun install` cleanly. No "Workspace not found" error.
Summary
PR A of the 3-PR guardrails rollout — wires the runtime end-to-end and ships the first two real guardrails. The guardrail primitive (
@lobu/coreGuardrail<stage>/GuardrailRegistry/runGuardrails) and the config/storage plumbing (agents.guardrails jsonb,AgentSettings.guardrails,lobu apply) were already in place. This PR replaces the three TODO call sites with realrunGuardrails()invocations and ships the first built-ins. Sibling PRs cover schema extensions (PR B) and owletto UI (PR C).MessageConsumer.handleMessage(input) — dispatch is short-circuited on trip,Message rejected: <reason>pushed tothread_response.ChatResponseBridge.handleDelta(output, per delta) — in-flight stream disposed on trip,Message blocked by guardrail: <reason>posted, partial buffer not written to history.McpProxy.handleProxyRequest(pre-tool) — JSON-RPCisErrorwith genericTool call blocked by policy.; specific reason intentionally not leaked.packages/server/src/gateway/guardrails/builtins.ts):secret-scan(output) — OpenAI/GitHub PAT/AWS/JWT regex.forbidden-tools(pre-tool) — deny-list[delete_repo, delete_branch, drop_table].CoreServices.initialize()viaregisterBuiltinGuardrails(...); exposed viagetGuardrailRegistry().semantic_type='guardrail-trip'event with{guardrail, stage, reason, agent_id, user_id, conversation_id}metadata.AGENTS.md#### Guardrailsupdated to reflect what ships.Codex review rounds
Round 1 (commit
b90dbdc0)Five concerns from the high-effort codex review, all fixed:
chat-response-bridge.ts— per-delta regex scan missed secrets straddling chunk boundaries ("sk-an"then"t-..."). Fixed via a per-stream rolling tail (OUTPUT_GUARDRAIL_TAIL_CHARS = 256); the scan operates on(previous tail + current delta)so the regex still matches on the second chunk. Tail is dropped on completion, full-replacement, and trip so a blocked stream's bytes never leak into the next stream on the same key.organizationId— security log gap. Bridge falls back toinstance.connection.organizationId; MessageConsumer and McpProxy fall back toagentSettingsStore.getMetadata(agentId). Audit logs at error when both fail.guardrails-runtime.test.ts— prior test only calledrunGuardrails(...)directly. Rewritten to instantiate realChatResponseBridge,MessageConsumer(subclassedTestableMessageConsumer),McpProxyand drive their public methods.setTimeout(50)— replaced withflushPendingGuardrailAudits(), a production helper that drains in-flight audit writes. Wired call sites still usevoid recordGuardrailTrip(...).(platformMetadata as any).field— newconnections/platform-metadata.tsdeclaresPlatformMetadata+readPlatformMetadata+platformMetadataString.Round 2 (commit
2eb575f7)Two remaining items, both fixed:
handleDeltaordering withfullReplacement— the rolling tail was updated/scanned BEFORE the replacement was handled, so the scan ran on(previous-delta-tail + new-replacement-text). Joining two unrelated text bodies can synthesize a regex match that exists in neither piece on its own — a false-positive trip on the first delta of an unrelated reply. Fixed by reordering: detectisFullReplacementfirst, dispose the prior stream + clear the tail, then scan. A regression test asserts the scenario: prior delta ending"...sk-a"followed by a fullReplacement delta starting"bcdefghij..."(each safe alone, naive concat would matchopenai-key) must NOT trip. Verified to fail against the pre-fix code before the reorder.Residual
payload.platformMetadata as Record<...>casts — three call sites inchat-response-bridge.ts(handleError,handleStatusUpdate,handleEphemeral) plus onedata?.platformMetadata?.traceparent as string | undefinedinmessage-consumer.ts. Replaced withreadPlatformMetadata(...)/platformMetadataString(...).resolveTargetandbuildCurrentMessageFromMetadatasignatures now takePlatformMetadatadirectly. No production-codeplatformMetadata as Record/platformMetadata as anycasts remain.Round 3 simplification pass (commit
8e9508ca)Post-review cleanup — no behavior change, 7/7 + 887/887 tests still green.
chat-response-bridge.ts.updateGuardrailTail(key, delta)andclearGuardrailTail(key)became onescanWindowWithTail(key, delta)plus inlinedthis.guardrailTails.delete(key)at the three call sites. The previousupdateGuardrailTailcomputednextTail = combined.slice(-512)AND separately computedscanSlice = (previous + delta).slice(-512)— the same window twice with slightly different conditionals. One slice now serves both roles.clearGuardrailTailadded no meaning over the Map call it wrapped.resolveAgentId/resolveOrganizationId, the file-level audit.ts block, theFire-and-forget — Tests use flushPendingGuardrailAudits to drainrepeats in bridge/proxy/message-consumer, theFail open — same rationale as the pre-tool path in McpProxycomments. Kept the terse invariant on theisFullReplacementordering (the regression test exercises exactly that order).audit.ts3-concept shape (recordGuardrailTrip+pendingAuditsSet +flushPendingGuardrailAudits). The user prompt's suggestion (return promise fromrecordGuardrailTrip, callersvoid) is already what it does. Removing the tracking would force the test to either change the production call sites to expose the promise or accept asleep(50)— both are worse than the current private-tracker shape.Reproducer
Tests:
ChatResponseBridge— full-secret-in-one-delta, split-chunk-across-two-deltas, fullReplacement-clears-tail (no false positive), connection.organizationId fallbackMessageConsumer— trip blocks worker enqueue + emits thread_response, metadata-orgId fallbackMcpProxy—delete_reporeturnsTool call blocked by policy.(no reason leak), audit row records reason for operatorsTest plan
make build-packagescleanbun run typecheckcleanbun test src/gateway/__tests__/— 887/887 tests pass (no regressions)LOBU_TEST_BACKEND=pglite bunx vitest run src/__tests__/guardrails-runtime.test.ts— 7/7 passisFullReplacement clears the rolling tail before scanning) verified to FAIL against pre-fixchat-response-bridge.tsbefore the reorder#### Guardrailsdescribes the shipped behaviorSummary by CodeRabbit
New Features
Documentation
Summary by CodeRabbit
New Features
Documentation
Tests
Other