refactor: runtime adapter pattern + domain restructuring#58
Conversation
Extract a pluggable RuntimeAdapter interface so the worker is no longer coupled to Firecracker. The executor shrinks from 465 to ~160 lines and delegates all VM lifecycle to the adapter. New packages: - @paws/runtime — RuntimeAdapter interface, registry, PawsError base - @paws/runtime-firecracker — Firecracker adapter (12-step choreography) - @paws/domain-agent — AgentConfig, framework enum, script generation - @paws/domain-policy — GovernanceSchema, GovernanceChecker - @paws/domain-access — Generic ExposedPort (no Pangolin coupling) - @paws/domain-credential — CredentialTransformer, polymorphic injection Domain type cleanup: - Remove tapDevice from NetworkAllocation (moved to FirecrackerAllocation) - Make SnapshotSize.memory optional (not all runtimes have memory snapshots) - Add WorkerType enum, make Worker.snapshot optional - Delete agent.ts and governance.ts from daemon (no re-export shims) Security fixes: - Proxy fetch() now catches errors to prevent credential header leakage - Fix durationMs bug in routes.ts (was Date.now() - Date.now() = always 0) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
📝 WalkthroughWalkthroughThis PR introduces a runtime abstraction layer for the worker executor and extracts governance and agent functionality from Changes
Sequence DiagramsequenceDiagram
participant Executor
participant RuntimeRegistry
participant RuntimeAdapter as Firecracker<br/>RuntimeAdapter
participant Network as Network<br/>Allocation
participant Proxy as TLS<br/>Proxy
participant VM as Firecracker<br/>VM
participant SSH as SSH
participant PortExposure as Port<br/>Exposure
Executor->>RuntimeRegistry: resolve(request)
RuntimeRegistry-->>Executor: RuntimeAdapter
Executor->>RuntimeAdapter: execute(sessionId, request, credentials, options)
RuntimeAdapter->>Network: allocateSubnet()
Network-->>RuntimeAdapter: allocation (IP, TAP)
RuntimeAdapter->>Proxy: start (with CA)
Proxy-->>RuntimeAdapter: proxy handle
RuntimeAdapter->>VM: restore snapshot
VM-->>RuntimeAdapter: VM handle
RuntimeAdapter->>SSH: waitForSsh()
SSH-->>RuntimeAdapter: connected
RuntimeAdapter->>SSH: injectCA certificate
SSH-->>RuntimeAdapter: done
RuntimeAdapter->>PortExposure: expose(ports)
PortExposure-->>RuntimeAdapter: ExposedPortResult[]
RuntimeAdapter->>SSH: write workload script
RuntimeAdapter->>SSH: execute workload (timeout wrapped)
SSH-->>RuntimeAdapter: exitCode, stdout, stderr
RuntimeAdapter->>PortExposure: cleanup(sessionId)
PortExposure-->>RuntimeAdapter: done
RuntimeAdapter->>VM: stop
RuntimeAdapter->>Proxy: stop
RuntimeAdapter->>Network: release
RuntimeAdapter-->>Executor: SessionResult { exitCode, stdout, stderr, output?, durationMs, exposedPorts? }
Estimated code review effort🎯 4 (Complex) | ⏱️ ~50 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Comment |
Deploying getpaws with
|
| Latest commit: |
e831974
|
| Status: | ✅ Deploy successful! |
| Preview URL: | https://58f5646f.getpaws-6m4.pages.dev |
| Branch Preview URL: | https://refactor-runtime-adapter-dom.getpaws-6m4.pages.dev |
There was a problem hiding this comment.
Actionable comments posted: 11
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
packages/firecracker/src/network/ip-pool.ts (1)
70-89:⚠️ Potential issue | 🟠 MajorValidate
maxSlotsagainst the real subnet capacity.If
maxSlotsis larger thanMAX_SUBNET_INDEX + 1,allocated.size >= maxSlotsnever becomes true after the real 16,384 subnets are exhausted, and thewhile (allocated.has(nextCandidate))loop spins forever. Reject or clamp out-of-range values when constructing the pool.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/firecracker/src/network/ip-pool.ts` around lines 70 - 89, The createIpPool constructor currently accepts a maxSlots larger than the real capacity (MAX_SUBNET_INDEX + 1) which can cause an infinite loop; update createIpPool to validate or clamp maxSlots: inside the createIpPool function (referencing parameters maxSlots, constants MAX_SUBNET_INDEX and allocation state allocated/nextCandidate) either reject invalid values by throwing or returning an err when maxSlots > MAX_SUBNET_INDEX + 1 or silently clamp maxSlots = Math.min(maxSlots, MAX_SUBNET_INDEX + 1) so allocated.size >= maxSlots will reflect the true subnet capacity and prevent the while(allocated.has(nextCandidate)) spin.packages/domains/agent/src/script.ts (1)
31-32:⚠️ Potential issue | 🟠 MajorPotential shell injection via unescaped prompt.
The prompt is wrapped in double quotes but not escaped. If
agent.promptcontains double quotes, backslashes, or command substitution sequences like$(...), the generated script will be malformed or could execute unintended commands.🛡️ Proposed fix to escape the prompt
+function escapeShellArg(arg: string): string { + // Escape single quotes and wrap in single quotes for safe shell argument + return `'${arg.replace(/'/g, "'\\''")}'`; +} + function generateClaudeCodeScript(agent: AgentConfig): string { // ... - const prompt = agent.prompt || '$TRIGGER_PAYLOAD'; - args.push(`"${prompt}"`); + const prompt = agent.prompt || '$TRIGGER_PAYLOAD'; + // Use single quotes for literal strings, but allow $TRIGGER_PAYLOAD expansion + if (prompt === '$TRIGGER_PAYLOAD') { + args.push('"$TRIGGER_PAYLOAD"'); + } else { + args.push(escapeShellArg(prompt)); + }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/domains/agent/src/script.ts` around lines 31 - 32, The prompt is interpolated directly into a shell argument (args.push(`"${prompt}"`)) risking shell injection; replace this with a proper escaping helper (e.g., implement escapeShellArg(prompt) or shellEscape(prompt)) and use args.push(`"${escapeShellArg(prompt)}"` ) or better wrap with single-quote-safe escaping (replace ' with '"'"') to ensure quotes, backslashes, $(), backticks etc. are escaped; add and use the new helper where the code currently references agent.prompt / prompt in args.push so all special characters are sanitized before emitting the script.
🧹 Nitpick comments (9)
packages/domains/session/src/types.ts (1)
68-86: Align the remaining session-level doc wording.This runtime-agnostic update is good, but Line 110 still mentions “VM via Pangolin tunnel.” Consider updating that comment too to keep the model consistent across the file.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/domains/session/src/types.ts` around lines 68 - 86, The field comments were updated to be runtime-agnostic for ExposedPortSchema, but another comment later still references "VM via Pangolin tunnel"; find that comment (related to ExposedPortSchema / the session-level docstring or any constant describing exposed ports or tunnel access) and reword it to match the generic phrasing used here (e.g., remove "VM via Pangolin tunnel" and replace with a runtime-agnostic description like "publicly accessible runtime port" or "exposed port via tunnel/provider"), keeping references to access/pin/shareLink semantics intact.packages/proxy/src/server.ts (1)
69-81: Good 502 fallback; consider adding redacted telemetry on this path.This safely handles upstream failures. Consider adding structured/sanitized logging for the rejection path so outages can be diagnosed without exposing credential-bearing request context.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/proxy/src/server.ts` around lines 69 - 81, The rejection branch of the fetch promise (the second .then handler where upstream is rejected and you return new Response(...)) currently swallows the error but emits no safe telemetry; add a structured, redacted telemetry/log call there that records non-sensitive context (hostname, route/path, timestamp, and a sanitized/truncated error message or error.code) and increments a failure metric, ensuring you never log request headers, bodies, auth tokens, or full URLs containing credentials; use the same symbols in this file (the rejection handler for upstream, hostname) and a project telemetry/logger API (e.g., telemetry.record or processLogger.error) to capture the event while explicitly redacting sensitive fields before sending.packages/runtime/tsconfig.tsbuildinfo (1)
1-1: Add*.tsbuildinfoto.gitignore.Two tsbuildinfo files are currently tracked in the repository (
packages/runtime/tsconfig.tsbuildinfoandpackages/runtime-firecracker/tsconfig.tsbuildinfo). These are auto-generated TypeScript incremental build cache files containing local machine-specific paths and file hashes. They should be excluded from version control to avoid unnecessary merge conflicts and repository bloat.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/runtime/tsconfig.tsbuildinfo` at line 1, The repo is tracking TypeScript incremental build files (tsconfig.tsbuildinfo); add a .gitignore entry for *.tsbuildinfo, then remove the currently tracked tsbuildinfo files from Git's index (e.g., git rm --cached <tracked tsbuildinfo filenames>) and commit the change so those files (like tsconfig.tsbuildinfo) are no longer versioned; ensure .gitignore contains the literal pattern *.tsbuildinfo and push the commit.packages/domains/fleet/src/worker.ts (1)
29-41: Move.default()to the field level to keepWorkerTypea plain enum schema.Exporting
z.enum(...).default('firecracker')creates aZodDefaultwrapper, making thetypefield acceptundefinedand apply the default at the input validation level. Since this schema is used only once inWorkerSchemaand is not re-exported, move the default to the field itself to preserve the enum's type strictness and intent.♻️ Proposed fix
-export const WorkerType = z - .enum(['firecracker', 'container', 'lightweight']) - .default('firecracker'); +export const WorkerType = z.enum(['firecracker', 'container', 'lightweight']); @@ - type: WorkerType, + type: WorkerType.default('firecracker'),🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/domains/fleet/src/worker.ts` around lines 29 - 41, The WorkerType export currently wraps the enum in a Zod default; change WorkerType to be a plain enum schema (remove .default('firecracker')) and then apply the default at the WorkerSchema field level by setting the type property to use WorkerType with .default('firecracker'); update nothing else—keep export type WorkerType = z.infer<typeof WorkerType> and adjust the WorkerSchema entry for "type" to apply the default so the enum stays strict while the field still defaults to 'firecracker'.apps/worker/src/server.ts (1)
208-208: Consider reusing the resolved runtime reference.The runtime is resolved twice: once on line 127 for the executor and again here for logging. You could store the resolved runtime in a variable and reuse it.
♻️ Suggested refactor
+// Resolve runtime once and reuse +const runtime = runtimeRegistry.resolve(); + // Create the executor (thin wrapper around runtime) const executor = createExecutor({ - runtime: runtimeRegistry.resolve(), + runtime, semaphore, workerName: WORKER_NAME, portExposure, llmGateway, }); // ... later in logging ... -const runtimeName = runtimeRegistry.resolve().name; +const runtimeName = runtime.name;🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@apps/worker/src/server.ts` at line 208, The code calls runtimeRegistry.resolve() twice (once when creating the executor and again to set runtimeName); capture the result once in a local variable (e.g., const resolvedRuntime = runtimeRegistry.resolve()) and reuse resolvedRuntime when constructing the executor and when deriving runtimeName (e.g., const runtimeName = resolvedRuntime.name) to avoid duplicate resolution and ensure consistency between executor and logging.apps/worker/src/tunnel/port-exposure.ts (1)
31-37: Unused_tunnelsparameter in cleanup signature.The
_tunnelsparameter is ignored; cleanup relies on the internalsessionTunnelsmap. This is fine if the interface contract requires the parameter, but consider documenting why it's unused or verifying thePortExposureProviderinterface signature.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@apps/worker/src/tunnel/port-exposure.ts` around lines 31 - 37, The cleanup method's signature includes an unused _tunnels parameter; update the PortExposureProvider implementation to either remove the parameter if the interface allows, or keep it but document/underscore it consistently and satisfy the interface. Locate the async cleanup(sessionId, _tunnels) method and either (a) change the signature to async cleanup(sessionId) and update the interface/type for PortExposureProvider, or (b) keep async cleanup(sessionId, _tunnels) but add a brief comment explaining it's intentionally ignored and ensure the parameter name uses the conventional underscore prefix (or adjust the interface to mark it optional) so linters and readers know it's intentionally unused.packages/runtime/src/registry.ts (2)
39-48: Consider returningResultinstead of throwing.Per coding guidelines, packages should use neverthrow (
Result/ResultAsync) for operations that can fail. Theresolve()method throws aRuntimeErrorwhen no runtime is registered, which is inconsistent with the guideline.♻️ Suggested refactor using neverthrow
+import { ok, err, type Result } from 'neverthrow'; + // In interface: - resolve(request?: RuntimeSessionRequest): RuntimeAdapter; + resolve(request?: RuntimeSessionRequest): Result<RuntimeAdapter, RuntimeError>; // In implementation: - resolve(_request) { + resolve(_request): Result<RuntimeAdapter, RuntimeError> { const first = runtimes.values().next(); if (first.done) { - throw new RuntimeError( + return err(new RuntimeError( RuntimeErrorCode.NO_RUNTIME, 'No runtime adapter registered. Configure at least one runtime (e.g., firecracker).', - ); + )); } - return first.value; + return ok(first.value); },As per coding guidelines: "Use neverthrow (ResultAsync/Result) in packages/ for operations that can fail".
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/runtime/src/registry.ts` around lines 39 - 48, The resolve method currently throws RuntimeError when runtimes is empty; change its API to return a neverthrow Result instead: update the resolve signature to return Result<RuntimeAdapter, RuntimeError>, stop throwing, and return ok(first.value) when a runtime exists or err(new RuntimeError(RuntimeErrorCode.NO_RUNTIME, 'No runtime adapter registered. Configure at least one runtime (e.g., firecracker).')) when first.done is true; update any callers of resolve to handle the Result (or wrap with ResultAsync where needed). Ensure you import/return the correct types from neverthrow and keep references to resolve, runtimes, RuntimeError, and RuntimeErrorCode to locate the change.
31-33: Silent overwrite on duplicate registration.
register()silently replaces an existing adapter if one with the same name is already registered. Consider logging a warning or returning a boolean to indicate whether a new adapter was added vs. replaced.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/runtime/src/registry.ts` around lines 31 - 33, The register(adapter) call currently overwrites an existing entry in the runtimes Map without notice; modify the register function to detect duplicates by checking runtimes.has(adapter.name) before setting, then either log a warning via your logger (e.g., processLogger.warn or similar) or return a boolean indicating whether it was a new add (true) or a replacement (false); make sure to reference runtimes and adapter.name in the check and preserve existing behavior unless the caller needs replacement semantics.packages/domains/agent/src/script.ts (1)
11-13: Use typed error instead of genericError.Per coding guidelines, use typed errors with error codes. Consider creating an
AgentErrorwith a specific error code for unknown frameworks.♻️ Suggested refactor
+import { AgentError, AgentErrorCode } from './errors.js'; + // In generateAgentScript: default: - throw new Error(`Unknown agent framework: ${agent.framework}`); + throw new AgentError( + AgentErrorCode.UNKNOWN_FRAMEWORK, + `Unknown agent framework: ${agent.framework}`, + );As per coding guidelines: "Use typed errors with error codes (e.g., DaemonsError with error codes) instead of generic Error".
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/domains/agent/src/script.ts` around lines 11 - 13, Replace the generic throw in the switch default with a typed error: create or use an AgentError (or add a new class AgentError) and throw new AgentError('UNKNOWN_FRAMEWORK', `Unknown agent framework: ${agent.framework}`) from the default branch in packages/domains/agent/src/script.ts (the switch handling agent.framework); ensure the error class accepts an error code and message per project conventions and import or export it appropriately so callers can handle the specific error code.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@apps/worker/src/session/executor.ts`:
- Around line 71-93: The finally block unconditionally calls
config.semaphore.release() even if config.semaphore.acquire() failed; wrap the
acquire in a guarded flag (e.g., let acquired = false) and set acquired = true
immediately after await config.semaphore.acquire(); then in the finally only
call config.semaphore.release(), set session.status = 'stopping', and
sessions.delete(sessionId) if acquired is true; update references in executor.ts
around toRuntimeRequest, resolveCredentials, and config.runtime.execute to
preserve behavior but ensure release and session cleanup are conditional on the
successful acquire.
- Around line 155-165: The current loop that applies the LLM gateway settings
(variable gateway and gateway.domains) unconditionally adds entries to domains,
effectively expanding the allowlist; instead, restrict changes to only rewrite
destinations that were already permitted by the session (check existing domains
entries or consult network.credentials / network.allowOut) and skip any
gateway.domains that do not already exist in domains or are not explicitly
allowed by network settings. Update the loop in executor.ts so it first verifies
domains[domain] exists or the domain is present in the session's allowed lists
before setting headers/target on that domain, leaving unknown domains untouched.
In `@packages/domains/credential/src/types.ts`:
- Around line 9-45: DomainCredentialSchema and the credential injection types
are only defined in TypeScript, so malformed payloads escape runtime validation;
add Zod schemas for each variant (e.g., HeaderCredentialInjectionSchema,
OAuthCredentialInjectionSchema, MtlsCredentialInjectionSchema) matching the
existing interfaces (including the literal "type" field), then create a
discriminated union schema (using z.discriminatedUnion or equivalent) named
CredentialInjectionSchema and export it alongside DomainCredentialSchema and
CredentialInjectionType; update the exported types to infer from these schemas
(e.g., export type HeaderCredentialInjection = z.infer<typeof
HeaderCredentialInjectionSchema>, etc.) so all external credential payloads are
validated at parse time.
In `@packages/runtime-firecracker/src/adapter.ts`:
- Around line 317-320: The code only saves tunnelResults on the session but not
the PortExposureProvider, so cleanupSession() cannot call cleanup() and tunnels
are orphaned; fix by also saving the provider (e.g.,
session.portExposureProvider = options.portExposure or similar) when you call
options.portExposure.expose(sessionId, exposePorts, hostPorts) in adapter.ts,
and update cleanupSession()/dispose logic to read that stored provider and call
its cleanup()/dispose() when the session ends or on error; reference
options.portExposure, expose(), exposedPortResults/session.exposedPortResults,
PortExposureProvider, and cleanupSession()/cleanup() when making the change.
- Around line 128-138: resolveSnapshotDir currently interpolates snapshotId into
config.snapshotBaseDir and silently falls back to config.snapshotDir when
vmstate.snap is missing, allowing path-traversal and accidental fallback; change
resolveSnapshotDir to validate snapshotId (reject any values containing "..",
path separators or null bytes), construct the candidate path safely (use a safe
join and then verify the resulting resolved path is a child of
config.snapshotBaseDir), confirm the file `${resolved}/vmstate.snap` exists and
has size > 0, and if any check fails throw an error (fail closed) instead of
returning config.snapshotDir so callers cannot boot an unintended default image.
- Around line 404-425: The teardown block currently fails fast if any await
rejects, risking leaked resources; update cleanup in the adapter (around
teardownInboundPort, config.portPool.release, stopVm, session.proxyHandle.stop,
teardownIptables, deleteTap, ipPool.release) to be best-effort: either wrap each
await call in its own try/catch and log errors to avoid aborting subsequent
steps, or collect the individual cleanup promises and run Promise.allSettled to
ensure every step is attempted and failures are logged but do not prevent
further cleanup.
- Around line 351-364: The code treats sshExec transport failures as a
successful run (leaving exitCode at 0); change the logic around
sshExec/execResult so that if execResult.isErr() you propagate/throw the
transport error (or return a SessionResult that represents a failure) instead of
continuing with exitCode 0; keep the existing handling that reads guest exit
codes from /tmp/exit_code only when sshExec succeeded (execResult.isOk()) and
include execResult.error text in the propagated error/failed SessionResult to
make transport errors visible.
In `@packages/runtime-firecracker/src/ssh.ts`:
- Around line 95-137: sshWriteFile currently bypasses the injected executor and
calls Bun.spawn directly; change it to use the injected shell abstraction by
adding a stdin-capable exec to SshOptions (e.g., opts.execWithStdin or a
dedicated opts.spawn wrapper) and update sshWriteFile to call that injected
function (passing command args and the content as stdin) instead of Bun.spawn,
preserving the same SSH args, exit-code check, and RuntimeError mapping
(RuntimeErrorCode.SSH_FAILED) so behavior and error semantics remain unchanged;
ensure callers/tests are updated to provide the new execWithStdin/spawn or
fallback to a default Bun.spawn-based implementation.
- Around line 47-73: waitForSsh() currently underreports the real timeout
because each ssh probe uses ConnectTimeout=30s from baseSshArgs(), so 60
attempts can take ~1800s; fix by bounding probe time or using an absolute
deadline: either pass a short per-probe timeout override (e.g., add/override '-o
ConnectTimeout=<probeSec>' to the args passed in exec inside waitForSsh) or
implement a wall-clock deadline (record start time and stop once Date.now()
exceeds start+totalTimeoutMs) and use that when looping; also update the thrown
Error message to compute and report the actual total wait time (use tracked
start+end or maxAttempts*(probeTimeout+intervalMs)) and reference waitForSsh,
baseSshArgs, maxAttempts, intervalMs, and ConnectTimeout when making the change.
- Around line 95-145: The remotePath is passed directly into guest shell
commands in sshWriteFile and sshReadFile (the `cat > ${remotePath}` and `cat
${remotePath}` usages), which lets spaces or shell metacharacters break or
hijack the command; import and use the existing shellEscape helper from
adapter.ts to sanitize remotePath before interpolating it into the command
strings in sshWriteFile and sshReadFile, updating the command arguments to use
the escaped path and adding the import at the top of the file.
In `@packages/runtime/src/types.ts`:
- Around line 55-60: PortExposureProvider.expose currently accepts ports as
Array<{ port; protocol?; label? }>, which drops ACL fields from
RuntimeSessionRequest.exposePorts; update the provider contract so the ports
parameter preserves ACL metadata (e.g., include access and allowedEmails on the
port descriptor) and adapt the related export/interface at the other occurrence
(the second PortExposureProvider declaration) to the same shape; ensure
implementations and callers (RuntimeSessionRequest.exposePorts and any callers
of expose) are updated to pass through the augmented port objects without
casting or discarding ACL info so providers can enforce restricted shares
end-to-end.
---
Outside diff comments:
In `@packages/domains/agent/src/script.ts`:
- Around line 31-32: The prompt is interpolated directly into a shell argument
(args.push(`"${prompt}"`)) risking shell injection; replace this with a proper
escaping helper (e.g., implement escapeShellArg(prompt) or shellEscape(prompt))
and use args.push(`"${escapeShellArg(prompt)}"` ) or better wrap with
single-quote-safe escaping (replace ' with '"'"') to ensure quotes, backslashes,
$(), backticks etc. are escaped; add and use the new helper where the code
currently references agent.prompt / prompt in args.push so all special
characters are sanitized before emitting the script.
In `@packages/firecracker/src/network/ip-pool.ts`:
- Around line 70-89: The createIpPool constructor currently accepts a maxSlots
larger than the real capacity (MAX_SUBNET_INDEX + 1) which can cause an infinite
loop; update createIpPool to validate or clamp maxSlots: inside the createIpPool
function (referencing parameters maxSlots, constants MAX_SUBNET_INDEX and
allocation state allocated/nextCandidate) either reject invalid values by
throwing or returning an err when maxSlots > MAX_SUBNET_INDEX + 1 or silently
clamp maxSlots = Math.min(maxSlots, MAX_SUBNET_INDEX + 1) so allocated.size >=
maxSlots will reflect the true subnet capacity and prevent the
while(allocated.has(nextCandidate)) spin.
---
Nitpick comments:
In `@apps/worker/src/server.ts`:
- Line 208: The code calls runtimeRegistry.resolve() twice (once when creating
the executor and again to set runtimeName); capture the result once in a local
variable (e.g., const resolvedRuntime = runtimeRegistry.resolve()) and reuse
resolvedRuntime when constructing the executor and when deriving runtimeName
(e.g., const runtimeName = resolvedRuntime.name) to avoid duplicate resolution
and ensure consistency between executor and logging.
In `@apps/worker/src/tunnel/port-exposure.ts`:
- Around line 31-37: The cleanup method's signature includes an unused _tunnels
parameter; update the PortExposureProvider implementation to either remove the
parameter if the interface allows, or keep it but document/underscore it
consistently and satisfy the interface. Locate the async cleanup(sessionId,
_tunnels) method and either (a) change the signature to async cleanup(sessionId)
and update the interface/type for PortExposureProvider, or (b) keep async
cleanup(sessionId, _tunnels) but add a brief comment explaining it's
intentionally ignored and ensure the parameter name uses the conventional
underscore prefix (or adjust the interface to mark it optional) so linters and
readers know it's intentionally unused.
In `@packages/domains/agent/src/script.ts`:
- Around line 11-13: Replace the generic throw in the switch default with a
typed error: create or use an AgentError (or add a new class AgentError) and
throw new AgentError('UNKNOWN_FRAMEWORK', `Unknown agent framework:
${agent.framework}`) from the default branch in
packages/domains/agent/src/script.ts (the switch handling agent.framework);
ensure the error class accepts an error code and message per project conventions
and import or export it appropriately so callers can handle the specific error
code.
In `@packages/domains/fleet/src/worker.ts`:
- Around line 29-41: The WorkerType export currently wraps the enum in a Zod
default; change WorkerType to be a plain enum schema (remove
.default('firecracker')) and then apply the default at the WorkerSchema field
level by setting the type property to use WorkerType with
.default('firecracker'); update nothing else—keep export type WorkerType =
z.infer<typeof WorkerType> and adjust the WorkerSchema entry for "type" to apply
the default so the enum stays strict while the field still defaults to
'firecracker'.
In `@packages/domains/session/src/types.ts`:
- Around line 68-86: The field comments were updated to be runtime-agnostic for
ExposedPortSchema, but another comment later still references "VM via Pangolin
tunnel"; find that comment (related to ExposedPortSchema / the session-level
docstring or any constant describing exposed ports or tunnel access) and reword
it to match the generic phrasing used here (e.g., remove "VM via Pangolin
tunnel" and replace with a runtime-agnostic description like "publicly
accessible runtime port" or "exposed port via tunnel/provider"), keeping
references to access/pin/shareLink semantics intact.
In `@packages/proxy/src/server.ts`:
- Around line 69-81: The rejection branch of the fetch promise (the second .then
handler where upstream is rejected and you return new Response(...)) currently
swallows the error but emits no safe telemetry; add a structured, redacted
telemetry/log call there that records non-sensitive context (hostname,
route/path, timestamp, and a sanitized/truncated error message or error.code)
and increments a failure metric, ensuring you never log request headers, bodies,
auth tokens, or full URLs containing credentials; use the same symbols in this
file (the rejection handler for upstream, hostname) and a project
telemetry/logger API (e.g., telemetry.record or processLogger.error) to capture
the event while explicitly redacting sensitive fields before sending.
In `@packages/runtime/src/registry.ts`:
- Around line 39-48: The resolve method currently throws RuntimeError when
runtimes is empty; change its API to return a neverthrow Result instead: update
the resolve signature to return Result<RuntimeAdapter, RuntimeError>, stop
throwing, and return ok(first.value) when a runtime exists or err(new
RuntimeError(RuntimeErrorCode.NO_RUNTIME, 'No runtime adapter registered.
Configure at least one runtime (e.g., firecracker).')) when first.done is true;
update any callers of resolve to handle the Result (or wrap with ResultAsync
where needed). Ensure you import/return the correct types from neverthrow and
keep references to resolve, runtimes, RuntimeError, and RuntimeErrorCode to
locate the change.
- Around line 31-33: The register(adapter) call currently overwrites an existing
entry in the runtimes Map without notice; modify the register function to detect
duplicates by checking runtimes.has(adapter.name) before setting, then either
log a warning via your logger (e.g., processLogger.warn or similar) or return a
boolean indicating whether it was a new add (true) or a replacement (false);
make sure to reference runtimes and adapter.name in the check and preserve
existing behavior unless the caller needs replacement semantics.
In `@packages/runtime/tsconfig.tsbuildinfo`:
- Line 1: The repo is tracking TypeScript incremental build files
(tsconfig.tsbuildinfo); add a .gitignore entry for *.tsbuildinfo, then remove
the currently tracked tsbuildinfo files from Git's index (e.g., git rm --cached
<tracked tsbuildinfo filenames>) and commit the change so those files (like
tsconfig.tsbuildinfo) are no longer versioned; ensure .gitignore contains the
literal pattern *.tsbuildinfo and push the commit.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 858fa947-6f80-46f8-bb5c-90e1cbd5d289
⛔ Files ignored due to path filters (1)
bun.lockis excluded by!**/*.lock
📒 Files selected for processing (53)
apps/control-plane/package.jsonapps/control-plane/src/app.tsapps/control-plane/src/governance.test.tsapps/control-plane/src/index.tsapps/control-plane/src/store/daemons.tsapps/worker/package.jsonapps/worker/src/routes.tsapps/worker/src/server.tsapps/worker/src/session/executor.tsapps/worker/src/tunnel/port-exposure.tspackages/domains/access/package.jsonpackages/domains/access/src/index.tspackages/domains/access/src/types.tspackages/domains/access/tsconfig.jsonpackages/domains/agent/package.jsonpackages/domains/agent/src/index.tspackages/domains/agent/src/script.tspackages/domains/agent/src/types.tspackages/domains/agent/tsconfig.jsonpackages/domains/common/src/errors.tspackages/domains/credential/package.jsonpackages/domains/credential/src/index.tspackages/domains/credential/src/types.tspackages/domains/credential/tsconfig.jsonpackages/domains/daemon/package.jsonpackages/domains/daemon/src/index.tspackages/domains/daemon/src/store.tspackages/domains/daemon/src/types.tspackages/domains/fleet/src/worker.tspackages/domains/network/src/types.tspackages/domains/policy/package.jsonpackages/domains/policy/src/checker.tspackages/domains/policy/src/index.tspackages/domains/policy/src/types.tspackages/domains/policy/tsconfig.jsonpackages/domains/session/src/types.tspackages/domains/snapshot/src/types.tspackages/firecracker/src/index.tspackages/firecracker/src/network/ip-pool.tspackages/proxy/src/server.tspackages/runtime-firecracker/package.jsonpackages/runtime-firecracker/src/adapter.tspackages/runtime-firecracker/src/index.tspackages/runtime-firecracker/src/ssh.tspackages/runtime-firecracker/tsconfig.jsonpackages/runtime-firecracker/tsconfig.tsbuildinfopackages/runtime/package.jsonpackages/runtime/src/errors.tspackages/runtime/src/index.tspackages/runtime/src/registry.tspackages/runtime/src/types.tspackages/runtime/tsconfig.jsonpackages/runtime/tsconfig.tsbuildinfo
💤 Files with no reviewable changes (1)
- packages/domains/daemon/src/index.ts
| try { | ||
| // 1. Acquire semaphore slot | ||
| // Acquire semaphore slot | ||
| await config.semaphore.acquire(); | ||
|
|
||
| // 2. Allocate network | ||
| const allocResult = ipPool.allocate(); | ||
| if (allocResult.isErr()) { | ||
| throw new WorkerError( | ||
| WorkerErrorCode.CAPACITY_EXHAUSTED, | ||
| allocResult.error.message, | ||
| allocResult.error, | ||
| ); | ||
| } | ||
| allocation = allocResult.value; | ||
| session.allocation = allocation; | ||
|
|
||
| const vmDir = `${config.vmBaseDir}/${sessionId}`; | ||
|
|
||
| // 3. Create TAP device | ||
| const tapResult = await createTap(allocation); | ||
| if (tapResult.isErr()) { | ||
| throw new WorkerError( | ||
| WorkerErrorCode.EXECUTION_FAILED, | ||
| `TAP creation failed: ${tapResult.error.message}`, | ||
| tapResult.error, | ||
| ); | ||
| } | ||
|
|
||
| // 4. Generate session CA | ||
| const caResult = await generateSessionCa({ dir: `${vmDir}/ca` }); | ||
| if (caResult.isErr()) throw caResult.error; | ||
| ca = caResult.value; | ||
| session.ca = ca; | ||
| // Convert domain request to runtime-agnostic format | ||
| const runtimeRequest = toRuntimeRequest(request); | ||
| const credentials = resolveCredentials(request.network, config.llmGateway); | ||
|
|
||
| // 5. Setup iptables rules | ||
| const iptResult = await setupIptables(allocation); | ||
| if (iptResult.isErr()) { | ||
| throw new WorkerError( | ||
| WorkerErrorCode.EXECUTION_FAILED, | ||
| `iptables setup failed: ${iptResult.error.message}`, | ||
| iptResult.error, | ||
| ); | ||
| } | ||
|
|
||
| // 6. Spawn TLS proxy | ||
| proxyHandle = createProxy({ | ||
| listen: { host: allocation.hostIp, port: 8080 }, | ||
| domains: networkConfigToDomains(network, config.llmGateway), | ||
| ca: { cert: ca.cert, key: ca.key }, | ||
| // Delegate to runtime adapter | ||
| const result = await config.runtime.execute(sessionId, runtimeRequest, credentials, { | ||
| portExposure: config.portExposure, | ||
| }); | ||
| await proxyHandle.start(); | ||
| session.proxyHandle = proxyHandle; | ||
|
|
||
| // 7. Restore VM from snapshot (resolve snapshot ID → directory) | ||
| const snapshotDir = resolveSnapshotDir(config, request.snapshot); | ||
| const restoreOpts = { | ||
| snapshotDir, | ||
| vmDir, | ||
| ...(config.firecrackerBin ? { firecrackerBin: config.firecrackerBin } : {}), | ||
| }; | ||
| const restoreResult = await restoreVm(restoreOpts); | ||
| if (restoreResult.isErr()) { | ||
| throw new WorkerError( | ||
| WorkerErrorCode.EXECUTION_FAILED, | ||
| `VM restore failed: ${restoreResult.error.message}`, | ||
| restoreResult.error, | ||
| ); | ||
| } | ||
| vmHandle = restoreResult.value; | ||
| session.vmHandle = vmHandle; | ||
|
|
||
| // 8. Wait for SSH | ||
| const sshOpts = { | ||
| host: allocation.guestIp, | ||
| keyPath: config.sshKeyPath, | ||
| }; | ||
|
|
||
| const sshResult = await waitForSsh(sshOpts); | ||
| if (sshResult.isErr()) throw sshResult.error; | ||
|
|
||
| // 9. Inject CA cert into VM trust store | ||
| const injectCaResult = await sshWriteFile( | ||
| sshOpts, | ||
| '/usr/local/share/ca-certificates/paws-session.crt', | ||
| ca.cert, | ||
| ); | ||
| if (injectCaResult.isErr()) throw injectCaResult.error; | ||
|
|
||
| const updateCaResult = await sshExec(sshOpts, 'update-ca-certificates 2>/dev/null || true'); | ||
| if (updateCaResult.isErr()) throw updateCaResult.error; | ||
|
|
||
| // 9.5. Set up port exposure (if configured) | ||
| const exposePorts = network.expose ?? []; | ||
| let exposedPortUrls: SessionResult['exposedPorts']; | ||
|
|
||
| if (exposePorts.length > 0 && config.portPool) { | ||
| const portResult = config.portPool.allocate(exposePorts.length); | ||
| if (portResult.isErr()) { | ||
| throw new WorkerError( | ||
| WorkerErrorCode.CAPACITY_EXHAUSTED, | ||
| `Port allocation failed: ${portResult.error.message}`, | ||
| portResult.error, | ||
| ); | ||
| } | ||
| const hostPorts = portResult.value; | ||
| session.inboundPorts = exposePorts.map((ep, i) => ({ | ||
| hostPort: hostPorts[i]!, | ||
| guestPort: ep.port, | ||
| })); | ||
|
|
||
| // Set up inbound iptables rules | ||
| for (const mapping of session.inboundPorts) { | ||
| const iptResult = await setupInboundPort( | ||
| allocation, | ||
| mapping.hostPort, | ||
| mapping.guestPort, | ||
| ); | ||
| if (iptResult.isErr()) { | ||
| throw new WorkerError( | ||
| WorkerErrorCode.EXECUTION_FAILED, | ||
| `Inbound iptables failed: ${iptResult.error.message}`, | ||
| iptResult.error, | ||
| ); | ||
| } | ||
| } | ||
|
|
||
| // Create Pangolin tunnels (if configured) or use direct URLs | ||
| if (config.pangolinResources) { | ||
| const tunnels = await config.pangolinResources.expose( | ||
| sessionId, | ||
| exposePorts, | ||
| hostPorts, | ||
| ); | ||
| session.exposedTunnels = tunnels; | ||
| exposedPortUrls = tunnels.map((t) => ({ | ||
| port: t.port, | ||
| url: t.publicUrl, | ||
| label: t.label, | ||
| access: t.access, | ||
| pin: t.pin, | ||
| shareLink: t.shareLink, | ||
| })); | ||
| } else if (config.workerExternalUrl) { | ||
| // Fallback: direct host port URLs | ||
| exposedPortUrls = exposePorts.map((ep, i) => ({ | ||
| port: ep.port, | ||
| url: `${config.workerExternalUrl}:${hostPorts[i]}`, | ||
| label: ep.label, | ||
| })); | ||
| } | ||
| } | ||
|
|
||
| // 10. Write and execute workload script | ||
| const envExports = Object.entries(request.workload.env) | ||
| .map(([k, v]) => `export ${k}=${shellEscape(v)}`) | ||
| .join('\n'); | ||
|
|
||
| const script = [ | ||
| '#!/bin/bash', | ||
| 'set -euo pipefail', | ||
| envExports, | ||
| request.workload.script, | ||
| ].join('\n'); | ||
|
|
||
| const writeResult = await sshWriteFile(sshOpts, '/tmp/workload.sh', script); | ||
| if (writeResult.isErr()) throw writeResult.error; | ||
|
|
||
| await sshExec(sshOpts, 'chmod +x /tmp/workload.sh'); | ||
|
|
||
| // Execute with timeout | ||
| const timeoutSecs = Math.ceil(request.timeoutMs / 1000); | ||
| const execResult = await sshExec( | ||
| sshOpts, | ||
| `timeout ${timeoutSecs} /tmp/workload.sh 2>/tmp/stderr.log || echo "EXIT:$?" > /tmp/exit_code`, | ||
| ); | ||
|
|
||
| // 11. Collect results | ||
| let stdout = ''; | ||
| let stderr = ''; | ||
| let exitCode = 0; | ||
| let output: unknown = undefined; | ||
|
|
||
| if (execResult.isOk()) { | ||
| stdout = execResult.value.stdout; | ||
| if (result.isErr()) { | ||
| throw result.error; | ||
| } | ||
|
|
||
| // Read stderr | ||
| const stderrResult = await sshReadFile(sshOpts, '/tmp/stderr.log'); | ||
| if (stderrResult.isOk()) { | ||
| stderr = stderrResult.value; | ||
| } | ||
|
|
||
| // Check for non-zero exit code | ||
| const exitCodeResult = await sshReadFile(sshOpts, '/tmp/exit_code'); | ||
| if (exitCodeResult.isOk()) { | ||
| const match = exitCodeResult.value.match(/EXIT:(\d+)/); | ||
| if (match?.[1]) { | ||
| exitCode = parseInt(match[1], 10); | ||
| } | ||
| } | ||
|
|
||
| // Try to read structured output | ||
| const outputResult = await sshReadFile(sshOpts, '/output/result.json'); | ||
| if (outputResult.isOk()) { | ||
| try { | ||
| output = JSON.parse(outputResult.value); | ||
| } catch { | ||
| // Not valid JSON — ignore | ||
| } | ||
| } | ||
|
|
||
| const durationMs = Date.now() - startedAt.getTime(); | ||
|
|
||
| return { exitCode, stdout, stderr, output, durationMs, exposedPorts: exposedPortUrls }; | ||
| session.exposedPorts = result.value.exposedPorts; | ||
| return result.value; | ||
| } finally { | ||
| // Cleanup — always runs | ||
| session.status = 'stopping'; | ||
|
|
||
| // Clean up Pangolin resources | ||
| if (session.exposedTunnels?.length && config.pangolinResources) { | ||
| await config.pangolinResources.cleanup(session.exposedTunnels); | ||
| } | ||
|
|
||
| // Clean up inbound iptables rules and release host ports | ||
| if (session.inboundPorts?.length && allocation) { | ||
| for (const mapping of session.inboundPorts) { | ||
| await teardownInboundPort(allocation, mapping.hostPort, mapping.guestPort); | ||
| } | ||
| if (config.portPool) { | ||
| config.portPool.release(session.inboundPorts.map((m) => m.hostPort)); | ||
| } | ||
| } | ||
|
|
||
| if (vmHandle) { | ||
| await stopVm(vmHandle); | ||
| } | ||
|
|
||
| if (proxyHandle) { | ||
| await proxyHandle.stop(); | ||
| } | ||
|
|
||
| if (allocation) { | ||
| await teardownIptables(allocation); | ||
| await deleteTap(allocation.tapDevice); | ||
| ipPool.release(allocation.subnetIndex); | ||
| } | ||
|
|
||
| config.semaphore.release(); | ||
| sessions.delete(sessionId); |
There was a problem hiding this comment.
Only release the semaphore after acquire() succeeds.
finally calls release() unconditionally. If acquire() rejects or throws, the permit count is incremented without ever being taken, which lets later sessions exceed the configured concurrency limit.
Suggested change
+ let acquired = false;
try {
// Acquire semaphore slot
await config.semaphore.acquire();
+ acquired = true;
// Convert domain request to runtime-agnostic format
const runtimeRequest = toRuntimeRequest(request);
const credentials = resolveCredentials(request.network, config.llmGateway);
@@
} finally {
session.status = 'stopping';
- config.semaphore.release();
+ if (acquired) config.semaphore.release();
sessions.delete(sessionId);
}🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@apps/worker/src/session/executor.ts` around lines 71 - 93, The finally block
unconditionally calls config.semaphore.release() even if
config.semaphore.acquire() failed; wrap the acquire in a guarded flag (e.g., let
acquired = false) and set acquired = true immediately after await
config.semaphore.acquire(); then in the finally only call
config.semaphore.release(), set session.status = 'stopping', and
sessions.delete(sessionId) if acquired is true; update references in executor.ts
around toRuntimeRequest, resolveCredentials, and config.runtime.execute to
preserve behavior but ensure release and session cleanup are conditional on the
successful acquire.
| // If an LLM gateway is configured, override matching domains to route through it | ||
| if (gateway) { | ||
| for (const domain of gateway.domains) { | ||
| domains[domain] = { | ||
| headers: { | ||
| Authorization: `Bearer ${gateway.apiKey}`, | ||
| // Preserve existing credential headers (merged, gateway key takes priority) | ||
| ...(domains[domain]?.headers ?? {}), | ||
| // Override Authorization with gateway key | ||
| ...(gateway.apiKey ? { Authorization: `Bearer ${gateway.apiKey}` } : {}), | ||
| Authorization: `Bearer ${gateway.apiKey}`, | ||
| }, | ||
| target: gateway.url, | ||
| }; | ||
| } |
There was a problem hiding this comment.
Don't turn llmGateway into an implicit allowlist expansion.
The comment says "override matching domains", but this loop adds every gateway.domains entry to domains even when the session never requested it via network.credentials or network.allowOut. That broadens outbound access instead of just rewriting approved destinations.
Suggested change
// If an LLM gateway is configured, override matching domains to route through it
if (gateway) {
+ const allowedDomains = new Set([...allowlist, ...Object.keys(domains)]);
for (const domain of gateway.domains) {
+ if (!allowedDomains.has(domain)) continue;
domains[domain] = {
headers: {
...(domains[domain]?.headers ?? {}),
Authorization: `Bearer ${gateway.apiKey}`,🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@apps/worker/src/session/executor.ts` around lines 155 - 165, The current loop
that applies the LLM gateway settings (variable gateway and gateway.domains)
unconditionally adds entries to domains, effectively expanding the allowlist;
instead, restrict changes to only rewrite destinations that were already
permitted by the session (check existing domains entries or consult
network.credentials / network.allowOut) and skip any gateway.domains that do not
already exist in domains or are not explicitly allowed by network settings.
Update the loop in executor.ts so it first verifies domains[domain] exists or
the domain is present in the session's allowed lists before setting
headers/target on that domain, leaving unknown domains untouched.
| export const DomainCredentialSchema = z.object({ | ||
| headers: z.record(z.string(), z.string()), | ||
| }); | ||
|
|
||
| export type DomainCredential = z.infer<typeof DomainCredentialSchema>; | ||
|
|
||
| /** Credential injection type discriminator */ | ||
| export const CredentialInjectionType = z.enum(['headers', 'oauth', 'mtls']); | ||
|
|
||
| export type CredentialInjectionType = z.infer<typeof CredentialInjectionType>; | ||
|
|
||
| /** Header-based credential injection */ | ||
| export interface HeaderCredentialInjection { | ||
| type: 'headers'; | ||
| headers: Record<string, string>; | ||
| } | ||
|
|
||
| /** OAuth token-based credential injection (future) */ | ||
| export interface OAuthCredentialInjection { | ||
| type: 'oauth'; | ||
| tokenEndpoint: string; | ||
| clientId: string; | ||
| clientSecret: string; | ||
| } | ||
|
|
||
| /** Mutual TLS credential injection (future) */ | ||
| export interface MtlsCredentialInjection { | ||
| type: 'mtls'; | ||
| cert: string; | ||
| key: string; | ||
| } | ||
|
|
||
| /** Polymorphic credential injection result */ | ||
| export type CredentialInjection = | ||
| | HeaderCredentialInjection | ||
| | OAuthCredentialInjection | ||
| | MtlsCredentialInjection; |
There was a problem hiding this comment.
🛠️ Refactor suggestion | 🟠 Major
Back the credential injection contract with runtime schemas.
DomainCredentialSchema currently accepts arbitrary header maps, and HeaderCredentialInjection / OAuthCredentialInjection / MtlsCredentialInjection are TS-only. That leaves malformed credential payloads to fail later at request/proxy boundaries instead of at parse time. Export Zod schemas for these variants and a discriminated union from this package.
As per coding guidelines, "Use Zod for all external data validation (API requests, config, env via @t3-oss/env-core)".
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@packages/domains/credential/src/types.ts` around lines 9 - 45,
DomainCredentialSchema and the credential injection types are only defined in
TypeScript, so malformed payloads escape runtime validation; add Zod schemas for
each variant (e.g., HeaderCredentialInjectionSchema,
OAuthCredentialInjectionSchema, MtlsCredentialInjectionSchema) matching the
existing interfaces (including the literal "type" field), then create a
discriminated union schema (using z.discriminatedUnion or equivalent) named
CredentialInjectionSchema and export it alongside DomainCredentialSchema and
CredentialInjectionType; update the exported types to infer from these schemas
(e.g., export type HeaderCredentialInjection = z.infer<typeof
HeaderCredentialInjectionSchema>, etc.) so all external credential payloads are
validated at parse time.
| function resolveSnapshotDir(config: FirecrackerRuntimeConfig, snapshotId: string): string { | ||
| if (config.snapshotBaseDir) { | ||
| const resolved = `${config.snapshotBaseDir}/${snapshotId}`; | ||
| try { | ||
| const stat = Bun.file(`${resolved}/vmstate.snap`); | ||
| if (stat.size > 0) return resolved; | ||
| } catch { | ||
| // Fall back to default | ||
| } | ||
| } | ||
| return config.snapshotDir; |
There was a problem hiding this comment.
Validate snapshotId and fail closed when the requested snapshot is missing.
resolveSnapshotDir() interpolates snapshotId directly into a filesystem path and then falls back to config.snapshotDir if that directory does not contain a valid vmstate.snap. That means a crafted ../... value can escape snapshotBaseDir, and a typoed snapshot boots the default image instead of rejecting the request.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@packages/runtime-firecracker/src/adapter.ts` around lines 128 - 138,
resolveSnapshotDir currently interpolates snapshotId into config.snapshotBaseDir
and silently falls back to config.snapshotDir when vmstate.snap is missing,
allowing path-traversal and accidental fallback; change resolveSnapshotDir to
validate snapshotId (reject any values containing "..", path separators or null
bytes), construct the candidate path safely (use a safe join and then verify the
resulting resolved path is a child of config.snapshotBaseDir), confirm the file
`${resolved}/vmstate.snap` exists and has size > 0, and if any check fails throw
an error (fail closed) instead of returning config.snapshotDir so callers cannot
boot an unintended default image.
| if (options?.portExposure) { | ||
| const tunnelResults = await options.portExposure.expose(sessionId, exposePorts, hostPorts); | ||
| exposedPortResults = tunnelResults; | ||
| session.exposedPortResults = tunnelResults; |
There was a problem hiding this comment.
Track the exposure provider so tunnel cleanup can actually run.
expose() allocates provider-owned resources, but only tunnelResults are saved on the session. cleanupSession() never receives the PortExposureProvider, so cleanup() cannot be called and exposed tunnels remain orphaned after success, failure, or dispose().
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@packages/runtime-firecracker/src/adapter.ts` around lines 317 - 320, The code
only saves tunnelResults on the session but not the PortExposureProvider, so
cleanupSession() cannot call cleanup() and tunnels are orphaned; fix by also
saving the provider (e.g., session.portExposureProvider = options.portExposure
or similar) when you call options.portExposure.expose(sessionId, exposePorts,
hostPorts) in adapter.ts, and update cleanupSession()/dispose logic to read that
stored provider and call its cleanup()/dispose() when the session ends or on
error; reference options.portExposure, expose(),
exposedPortResults/session.exposedPortResults, PortExposureProvider, and
cleanupSession()/cleanup() when making the change.
| // Clean up inbound iptables rules and release host ports | ||
| if (session.inboundPorts?.length && session.allocation) { | ||
| for (const mapping of session.inboundPorts) { | ||
| await teardownInboundPort(session.allocation, mapping.hostPort, mapping.guestPort); | ||
| } | ||
| if (config.portPool) { | ||
| config.portPool.release(session.inboundPorts.map((m) => m.hostPort)); | ||
| } | ||
| } | ||
|
|
||
| if (session.vmHandle) { | ||
| await stopVm(session.vmHandle); | ||
| } | ||
|
|
||
| if (session.proxyHandle) { | ||
| await session.proxyHandle.stop(); | ||
| } | ||
|
|
||
| if (session.allocation) { | ||
| await teardownIptables(session.allocation); | ||
| await deleteTap(session.allocation.tapDevice); | ||
| ipPool.release(session.allocation.subnetIndex); |
There was a problem hiding this comment.
Make teardown best-effort instead of fail-fast.
Any rejection in this block aborts the remaining teardown steps, and activeSessions.delete(sessionId) never runs because it sits after cleanupSession() in the caller's finally. One bad teardown can therefore leak host resources and leave the session stuck as active. Catch/log per step, or use an allSettled-style cleanup pass.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@packages/runtime-firecracker/src/adapter.ts` around lines 404 - 425, The
teardown block currently fails fast if any await rejects, risking leaked
resources; update cleanup in the adapter (around teardownInboundPort,
config.portPool.release, stopVm, session.proxyHandle.stop, teardownIptables,
deleteTap, ipPool.release) to be best-effort: either wrap each await call in its
own try/catch and log errors to avoid aborting subsequent steps, or collect the
individual cleanup promises and run Promise.allSettled to ensure every step is
attempted and failures are logged but do not prevent further cleanup.
| export function waitForSsh( | ||
| opts: SshOptions, | ||
| maxAttempts: number = 60, | ||
| intervalMs: number = 500, | ||
| ): ResultAsync<void, RuntimeError> { | ||
| const exec = opts.exec ?? defaultExecFn; | ||
| const user = opts.user ?? DEFAULT_SSH_USER; | ||
|
|
||
| return ResultAsync.fromPromise( | ||
| (async () => { | ||
| for (let i = 0; i < maxAttempts; i++) { | ||
| try { | ||
| await exec('ssh', [...baseSshArgs(opts), `${user}@${opts.host}`, 'true']); | ||
| return; | ||
| } catch { | ||
| await new Promise((resolve) => setTimeout(resolve, intervalMs)); | ||
| } | ||
| } | ||
| throw new Error(`SSH not available after ${maxAttempts * intervalMs}ms`); | ||
| })(), | ||
| (e) => | ||
| new RuntimeError( | ||
| RuntimeErrorCode.SSH_FAILED, | ||
| `SSH connection to ${opts.host} timed out: ${e}`, | ||
| e, | ||
| ), | ||
| ); |
There was a problem hiding this comment.
🧩 Analysis chain
🌐 Web query:
In OpenSSH, does the ConnectTimeoutoption apply to each individualsshinvocation? If a loop runsssh60 times withConnectTimeout=30, what is the worst-case total wait time when the host is unreachable?
💡 Result:
Yes, the ConnectTimeout option applies to each individual ssh invocation. When running ssh 60 times in a loop with ConnectTimeout=30 to an unreachable host, each invocation times out independently after approximately 30 seconds. The worst-case total wait time is 60 * 30 = 1800 seconds (30 minutes), assuming no connection multiplexing (via -M or ControlMaster) and default ConnectionAttempts=1.
Citations:
- 1: https://man7.org/linux/man-pages/man5/ssh_config.5.html
- 2: https://man.openbsd.org/i386/ssh_config
- 3: https://serverfault.com/questions/835104/what-does-connecttimeout-in-ssh-config-mean
- 4: https://man.openbsd.org/OpenBSD-current/man5/ssh_config.5
- 5: https://manpages.ubuntu.com/manpages/jammy/en/man5/ssh_config.5.html
🏁 Script executed:
cd packages/runtime-firecracker && rg -A 10 "function baseSshArgs|const baseSshArgs|export.*baseSshArgs" src/ssh.tsRepository: arek-e/paws
Length of output: 373
🏁 Script executed:
cd packages/runtime-firecracker && rg "DEFAULT_CONNECT_TIMEOUT" src/ssh.ts -B 2 -A 2Repository: arek-e/paws
Length of output: 541
🏁 Script executed:
cd packages/runtime-firecracker && sed -n '47,73p' src/ssh.tsRepository: arek-e/paws
Length of output: 857
waitForSsh() error message is misleading; actual timeout can be ~30 minutes, not 30 seconds.
Each ssh invocation uses ConnectTimeout=30 (from baseSshArgs()). With default maxAttempts=60, an unreachable host triggers 60 sequential attempts of up to 30 seconds each—totaling ~1800 seconds. The error message at line 65 reports only maxAttempts * intervalMs = 30000ms, ignoring the per-invocation timeouts entirely. Use an absolute deadline or per-probe timeout much shorter than 30 seconds to bound the actual wait time and make the error message accurate.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@packages/runtime-firecracker/src/ssh.ts` around lines 47 - 73, waitForSsh()
currently underreports the real timeout because each ssh probe uses
ConnectTimeout=30s from baseSshArgs(), so 60 attempts can take ~1800s; fix by
bounding probe time or using an absolute deadline: either pass a short per-probe
timeout override (e.g., add/override '-o ConnectTimeout=<probeSec>' to the args
passed in exec inside waitForSsh) or implement a wall-clock deadline (record
start time and stop once Date.now() exceeds start+totalTimeoutMs) and use that
when looping; also update the thrown Error message to compute and report the
actual total wait time (use tracked start+end or
maxAttempts*(probeTimeout+intervalMs)) and reference waitForSsh, baseSshArgs,
maxAttempts, intervalMs, and ConnectTimeout when making the change.
| export function sshWriteFile( | ||
| opts: SshOptions, | ||
| remotePath: string, | ||
| content: string, | ||
| ): ResultAsync<void, RuntimeError> { | ||
| const user = opts.user ?? DEFAULT_SSH_USER; | ||
| const port = opts.port ?? DEFAULT_SSH_PORT; | ||
| const timeout = opts.connectTimeoutSecs ?? DEFAULT_CONNECT_TIMEOUT; | ||
|
|
||
| return ResultAsync.fromPromise( | ||
| (async () => { | ||
| const proc = Bun.spawn( | ||
| [ | ||
| 'ssh', | ||
| '-o', | ||
| 'StrictHostKeyChecking=no', | ||
| '-o', | ||
| 'UserKnownHostsFile=/dev/null', | ||
| '-o', | ||
| `ConnectTimeout=${timeout}`, | ||
| '-o', | ||
| 'LogLevel=ERROR', | ||
| '-i', | ||
| opts.keyPath, | ||
| '-p', | ||
| String(port), | ||
| `${user}@${opts.host}`, | ||
| `cat > ${remotePath}`, | ||
| ], | ||
| { stdin: new TextEncoder().encode(content) }, | ||
| ); | ||
| const exitCode = await proc.exited; | ||
| if (exitCode !== 0) { | ||
| throw new Error(`SSH write to ${remotePath} failed with exit code ${exitCode}`); | ||
| } | ||
| })(), | ||
| (e) => | ||
| new RuntimeError( | ||
| RuntimeErrorCode.SSH_FAILED, | ||
| `Failed to write ${remotePath} on ${opts.host}: ${e}`, | ||
| e, | ||
| ), | ||
| ); |
There was a problem hiding this comment.
🛠️ Refactor suggestion | 🟠 Major
Keep sshWriteFile() on the injected shell abstraction.
This is the one helper here that bypasses opts.exec and hard-codes Bun.spawn, so it cannot be exercised through the same test seam and it forces a Bun-specific path even when the caller already injected an executor. Either extend the injected runner to support stdin or inject a dedicated spawn wrapper.
Based on learnings, "Use dependency injection for testability: firecracker client accepts injected request, shell wrappers accept injected exec".
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@packages/runtime-firecracker/src/ssh.ts` around lines 95 - 137, sshWriteFile
currently bypasses the injected executor and calls Bun.spawn directly; change it
to use the injected shell abstraction by adding a stdin-capable exec to
SshOptions (e.g., opts.execWithStdin or a dedicated opts.spawn wrapper) and
update sshWriteFile to call that injected function (passing command args and the
content as stdin) instead of Bun.spawn, preserving the same SSH args, exit-code
check, and RuntimeError mapping (RuntimeErrorCode.SSH_FAILED) so behavior and
error semantics remain unchanged; ensure callers/tests are updated to provide
the new execWithStdin/spawn or fallback to a default Bun.spawn-based
implementation.
| export function sshWriteFile( | ||
| opts: SshOptions, | ||
| remotePath: string, | ||
| content: string, | ||
| ): ResultAsync<void, RuntimeError> { | ||
| const user = opts.user ?? DEFAULT_SSH_USER; | ||
| const port = opts.port ?? DEFAULT_SSH_PORT; | ||
| const timeout = opts.connectTimeoutSecs ?? DEFAULT_CONNECT_TIMEOUT; | ||
|
|
||
| return ResultAsync.fromPromise( | ||
| (async () => { | ||
| const proc = Bun.spawn( | ||
| [ | ||
| 'ssh', | ||
| '-o', | ||
| 'StrictHostKeyChecking=no', | ||
| '-o', | ||
| 'UserKnownHostsFile=/dev/null', | ||
| '-o', | ||
| `ConnectTimeout=${timeout}`, | ||
| '-o', | ||
| 'LogLevel=ERROR', | ||
| '-i', | ||
| opts.keyPath, | ||
| '-p', | ||
| String(port), | ||
| `${user}@${opts.host}`, | ||
| `cat > ${remotePath}`, | ||
| ], | ||
| { stdin: new TextEncoder().encode(content) }, | ||
| ); | ||
| const exitCode = await proc.exited; | ||
| if (exitCode !== 0) { | ||
| throw new Error(`SSH write to ${remotePath} failed with exit code ${exitCode}`); | ||
| } | ||
| })(), | ||
| (e) => | ||
| new RuntimeError( | ||
| RuntimeErrorCode.SSH_FAILED, | ||
| `Failed to write ${remotePath} on ${opts.host}: ${e}`, | ||
| e, | ||
| ), | ||
| ); | ||
| } | ||
|
|
||
| /** Read a file from the guest via SSH */ | ||
| export function sshReadFile( | ||
| opts: SshOptions, | ||
| remotePath: string, | ||
| ): ResultAsync<string, RuntimeError> { | ||
| return sshExec(opts, `cat ${remotePath}`).map((r) => r.stdout); |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Inspect current call sites to see whether remote paths are always safe literals.
rg -nP --type=ts -C2 '\bssh(?:ReadFile|WriteFile)\s*\('Repository: arek-e/paws
Length of output: 4108
🏁 Script executed:
#!/bin/bash
# Get the full ssh.ts file to see sshExec and other functions
cat -n packages/runtime-firecracker/src/ssh.tsRepository: arek-e/paws
Length of output: 5575
🏁 Script executed:
#!/bin/bash
# Check if remotePath could come from user input or external sources
rg -nP --type=ts 'remotePath\s*:' packages/Repository: arek-e/paws
Length of output: 283
🏁 Script executed:
#!/bin/bash
# Look for existing shell escape utilities in the codebase
rg -nP 'shellQuote|escapeShell|shellEscape' --type=tsRepository: arek-e/paws
Length of output: 248
🏁 Script executed:
#!/bin/bash
# Check apps/worker/src/ssh/client.ts to see if it has similar issues
cat -n apps/worker/src/ssh/client.ts | head -200Repository: arek-e/paws
Length of output: 6524
🏁 Script executed:
#!/bin/bash
# See the shellEscape function implementation
sed -n '425,445p' packages/runtime-firecracker/src/adapter.tsRepository: arek-e/paws
Length of output: 250
🏁 Script executed:
#!/bin/bash
# Check if apps/worker/src/ssh/client.ts is a separate implementation or related
head -10 apps/worker/src/ssh/client.ts
git log --oneline -n 5 -- apps/worker/src/ssh/client.ts 2>/dev/null | head -5Repository: arek-e/paws
Length of output: 396
🏁 Script executed:
#!/bin/bash
# Check the file structure to confirm if apps/worker is separate from packages
ls -la apps/worker/src/ssh/ packages/runtime-firecracker/src/Repository: arek-e/paws
Length of output: 638
🏁 Script executed:
#!/bin/bash
# Verify if there are imports from one to the other
rg -nP 'from.*ssh.*client|from.*@paws' packages/runtime-firecracker/src/ssh.ts apps/worker/src/ssh/client.tsRepository: arek-e/paws
Length of output: 332
Escape remotePath before building the remote shell command.
Both cat > ${remotePath} (line 122) and cat ${remotePath} (line 145) are parsed by the guest shell. Spaces or shell metacharacters in the path can fail the transfer or run the wrong command inside the VM.
The codebase already has a shellEscape function in adapter.ts that handles this correctly. Reuse it here instead of creating a duplicate function:
Suggested implementation
Import or define a shell escape helper, then apply it to both functions:
+function shellEscape(value: string): string {
+ return `'${value.replace(/'/g, "'\\''")}'`;
+}
+
@@
- `cat > ${remotePath}`,
+ `cat > ${shellEscape(remotePath)}`,
@@
- return sshExec(opts, `cat ${remotePath}`).map((r) => r.stdout);
+ return sshExec(opts, `cat ${shellEscape(remotePath)}`).map((r) => r.stdout);📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| export function sshWriteFile( | |
| opts: SshOptions, | |
| remotePath: string, | |
| content: string, | |
| ): ResultAsync<void, RuntimeError> { | |
| const user = opts.user ?? DEFAULT_SSH_USER; | |
| const port = opts.port ?? DEFAULT_SSH_PORT; | |
| const timeout = opts.connectTimeoutSecs ?? DEFAULT_CONNECT_TIMEOUT; | |
| return ResultAsync.fromPromise( | |
| (async () => { | |
| const proc = Bun.spawn( | |
| [ | |
| 'ssh', | |
| '-o', | |
| 'StrictHostKeyChecking=no', | |
| '-o', | |
| 'UserKnownHostsFile=/dev/null', | |
| '-o', | |
| `ConnectTimeout=${timeout}`, | |
| '-o', | |
| 'LogLevel=ERROR', | |
| '-i', | |
| opts.keyPath, | |
| '-p', | |
| String(port), | |
| `${user}@${opts.host}`, | |
| `cat > ${remotePath}`, | |
| ], | |
| { stdin: new TextEncoder().encode(content) }, | |
| ); | |
| const exitCode = await proc.exited; | |
| if (exitCode !== 0) { | |
| throw new Error(`SSH write to ${remotePath} failed with exit code ${exitCode}`); | |
| } | |
| })(), | |
| (e) => | |
| new RuntimeError( | |
| RuntimeErrorCode.SSH_FAILED, | |
| `Failed to write ${remotePath} on ${opts.host}: ${e}`, | |
| e, | |
| ), | |
| ); | |
| } | |
| /** Read a file from the guest via SSH */ | |
| export function sshReadFile( | |
| opts: SshOptions, | |
| remotePath: string, | |
| ): ResultAsync<string, RuntimeError> { | |
| return sshExec(opts, `cat ${remotePath}`).map((r) => r.stdout); | |
| function shellEscape(value: string): string { | |
| return `'${value.replace(/'/g, "'\\''")}'`; | |
| } | |
| export function sshWriteFile( | |
| opts: SshOptions, | |
| remotePath: string, | |
| content: string, | |
| ): ResultAsync<void, RuntimeError> { | |
| const user = opts.user ?? DEFAULT_SSH_USER; | |
| const port = opts.port ?? DEFAULT_SSH_PORT; | |
| const timeout = opts.connectTimeoutSecs ?? DEFAULT_CONNECT_TIMEOUT; | |
| return ResultAsync.fromPromise( | |
| (async () => { | |
| const proc = Bun.spawn( | |
| [ | |
| 'ssh', | |
| '-o', | |
| 'StrictHostKeyChecking=no', | |
| '-o', | |
| 'UserKnownHostsFile=/dev/null', | |
| '-o', | |
| `ConnectTimeout=${timeout}`, | |
| '-o', | |
| 'LogLevel=ERROR', | |
| '-i', | |
| opts.keyPath, | |
| '-p', | |
| String(port), | |
| `${user}@${opts.host}`, | |
| `cat > ${shellEscape(remotePath)}`, | |
| ], | |
| { stdin: new TextEncoder().encode(content) }, | |
| ); | |
| const exitCode = await proc.exited; | |
| if (exitCode !== 0) { | |
| throw new Error(`SSH write to ${remotePath} failed with exit code ${exitCode}`); | |
| } | |
| })(), | |
| (e) => | |
| new RuntimeError( | |
| RuntimeErrorCode.SSH_FAILED, | |
| `Failed to write ${remotePath} on ${opts.host}: ${e}`, | |
| e, | |
| ), | |
| ); | |
| } | |
| /** Read a file from the guest via SSH */ | |
| export function sshReadFile( | |
| opts: SshOptions, | |
| remotePath: string, | |
| ): ResultAsync<string, RuntimeError> { | |
| return sshExec(opts, `cat ${shellEscape(remotePath)}`).map((r) => r.stdout); |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@packages/runtime-firecracker/src/ssh.ts` around lines 95 - 145, The
remotePath is passed directly into guest shell commands in sshWriteFile and
sshReadFile (the `cat > ${remotePath}` and `cat ${remotePath}` usages), which
lets spaces or shell metacharacters break or hijack the command; import and use
the existing shellEscape helper from adapter.ts to sanitize remotePath before
interpolating it into the command strings in sshWriteFile and sshReadFile,
updating the command arguments to use the escaped path and adding the import at
the top of the file.
| export interface PortExposureProvider { | ||
| expose( | ||
| sessionId: string, | ||
| ports: Array<{ port: number; protocol?: string; label?: string }>, | ||
| hostPorts: number[], | ||
| ): Promise<ExposedPortResult[]>; |
There was a problem hiding this comment.
Preserve exposure ACL metadata in the provider contract.
RuntimeSessionRequest.exposePorts includes access and allowedEmails, but PortExposureProvider.expose() narrows the argument back to { port, protocol, label }. That forces providers to cast or ignore the ACL fields, so restricted shares cannot be enforced end-to-end.
Suggested change
+export interface RuntimeExposePortRequest {
+ port: number;
+ protocol?: string;
+ label?: string;
+ access?: string;
+ allowedEmails?: string[];
+}
+
/** Provider for exposing ports from the runtime to the outside world */
export interface PortExposureProvider {
expose(
sessionId: string,
- ports: Array<{ port: number; protocol?: string; label?: string }>,
+ ports: RuntimeExposePortRequest[],
hostPorts: number[],
): Promise<ExposedPortResult[]>;
cleanup(sessionId: string, tunnels: ExposedPortResult[]): Promise<void>;
}
@@
/** Port exposure requests */
- exposePorts?: Array<{
- port: number;
- protocol?: string;
- label?: string;
- access?: string;
- allowedEmails?: string[];
- }>;
+ exposePorts?: RuntimeExposePortRequest[];
}Also applies to: 82-88
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@packages/runtime/src/types.ts` around lines 55 - 60,
PortExposureProvider.expose currently accepts ports as Array<{ port; protocol?;
label? }>, which drops ACL fields from RuntimeSessionRequest.exposePorts; update
the provider contract so the ports parameter preserves ACL metadata (e.g.,
include access and allowedEmails on the port descriptor) and adapt the related
export/interface at the other occurrence (the second PortExposureProvider
declaration) to the same shape; ensure implementations and callers
(RuntimeSessionRequest.exposePorts and any callers of expose) are updated to
pass through the augmented port objects without casting or discarding ACL info
so providers can enforce restricted shares end-to-end.
Summary
RuntimeAdapterinterface (@paws/runtime) so the worker is decoupled from Firecracker. The executor shrinks from 465→160 lines and delegates all VM lifecycle to the adapter. Future runtimes (Agent OS, Docker, etc.) plug in without touching the worker.fetch()error handling prevents credential header leakage; fixdurationMsbug that always reported 0.New packages
@paws/runtimeRuntimeAdapterinterface,RuntimeRegistry,PawsErrorbase class,RuntimeError@paws/runtime-firecracker@paws/domain-agentAgentConfig,AgentFramework,generateAgentScript()(extracted from daemon)@paws/domain-policyGovernanceSchema,GovernanceChecker(extracted from daemon)@paws/domain-accessExposedPort+ExposedPortWithAccess(no Pangolin coupling)@paws/domain-credentialCredentialTransformer, polymorphic injection types (headers, OAuth, mTLS)Domain type cleanup
NetworkAllocation: removedtapDevice(moved toFirecrackerAllocationin firecracker package)SnapshotSize.memory: now optional (not all runtimes have memory snapshots)Worker: addedWorkerTypeenum (firecracker | container | lightweight),snapshotnow optionalagent.tsandgovernance.ts— no backward-compat shims, all consumers import from canonical packagesArchitecture (before → after)
Security fixes
Date.now() - Date.now()(always 0) →Date.now() - sessionStartTimeTest plan
tsc --noEmit)bun run testpasses (exit 0)🤖 Generated with Claude Code
Summary by CodeRabbit
Bug Fixes
Refactor