fix(execute): run backend under Node so isolated-vm loads#430
Conversation
Bun uses JavaScriptCore and exposes a partial V8 ABI shim — enough that node-gyp-build picks the abi137 prebuild for isolated-vm at install time, but not enough that the addon dlopens at runtime. Loading throws 'undefined symbol: v8::ValueSerializer::Delegate::HasCustomHostObject'. The previous fix (#427) caught this and returned RuntimeUnavailable instead of crashing — but the execute MCP tool still silently failed in prod. Fix the underlying mismatch by running the backend under Node: the source layout, dependencies, and Hono server entry are already Node-style ('Node.js Server Entry Point' is in the docblock); the only Bun-specific thing was 'exec bun src/server.ts' in start.sh. Switch start.sh to 'exec node --import tsx src/server.ts'. tsx is already installed (devDep, copied into the runtime image via the existing node_modules pipeline) and handles ESM + TypeScript loading without a precompile step. Verified end-to-end inside the live prod image: node --import tsx loads isolated-vm, runs runScript() and returns the expected value. The regression silently shipped because the existing sandbox tests ran under bun test with a 'skipIfRuntimeUnavailable' helper that turned the load failure into a no-op pass. Replace with a vitest integration test that fails loudly when isolated-vm can't load, and wire it into CI invoking vitest under node directly. SKIP_TEST_DB_SETUP=1 keeps it fast — the test uses stub SDKs and doesn't need Postgres. The broader vitest suite (~38 integration files) is not yet wired into CI; many are stale after #348's MCP tool consolidation. Tracked separately.
|
Triage decision: Reasons:
Next: Human review required due to infrastructure changes. Assigned to @buremba for review and manual merge after validation. |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: d275a27130
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| # RuntimeUnavailable under bun. tsx provides the TS loader so the | ||
| # source layout stays uncompiled. cwd needs to be the package so | ||
| # `--import tsx` resolves from owletto-backend's node_modules. | ||
| cd /app/packages/owletto-backend |
There was a problem hiding this comment.
Preserve repo-root cwd before starting backend
Changing the startup script to cd /app/packages/owletto-backend regresses components that resolve config paths from process.cwd(): for example, gateway startup constructs ProviderRegistryService("config/providers.json") (packages/gateway/src/services/core-services.ts) and the embedded agent routes default to resolve(process.cwd(), 'config/providers.json') (packages/owletto-backend/src/lobu/agent-routes.ts). With the new cwd, both resolve to /app/packages/owletto-backend/config/providers.json instead of /app/config/providers.json, so when LOBU_PROVIDER_REGISTRY_PATH is not explicitly set the bundled provider registry is silently not loaded.
Useful? React with 👍 / 👎.
|
Triage decision: Reasons:
Next: Human review required due to infrastructure changes and P1 issue flagged in review. The P1 comment highlights that changing the startup script working directory may break components that resolve config paths from |
Codex review on #430 caught that 'cd /app/packages/owletto-backend' breaks bundled config resolution: gateway's ProviderRegistryService and the embedded agent routes both resolve 'config/providers.json' from process.cwd(). Under the previous cwd of /app, this finds /app/config/providers.json. Under /app/packages/owletto-backend, it resolves to a path that doesn't exist and the bundled provider registry is silently empty. Pass tsx as an absolute file URL so the loader resolves regardless of cwd, and run server.ts by absolute path. Verified in the live prod image: runScript() returns success with cwd preserved at /app.
PR #430 switched docker/app/start.sh from 'exec bun src/server.ts' to 'exec node --import tsx src/server.ts' so isolated-vm could load. End-to-end smoke in the live image confirmed the runtime swap itself worked, but the resulting prod image crashloops on boot: SyntaxError: The requested module '@lobu/core' does not provide an export named 'createBuiltinSecretRef' at packages/owletto-backend/src/lobu/stores/postgres-secret-store.ts:13 The export exists in @lobu/core/dist/secret-refs.js. Node's cjs-module-lexer detects it when the file is the sole entry, but fails to detect it when reached via the full server.ts boot chain. Bun has its own CJS↔ESM interop and didn't hit this. Fixing properly requires shipping @lobu/core as dual ESM+CJS instead of CJS-only behind an 'import' condition. Reverting unblocks the partial deploy. Old pods are still serving traffic (CrashLoopBackOff prevented rollover). The execute tool returns to its pre-#430 state — same RuntimeUnavailable behavior that's been in prod since #427 hardened the load failure. No user-facing regression beyond what we already had this morning. Tracked: convert @lobu/core (and other workspace packages with the same shape) to dual ESM+CJS so the runtime swap can land cleanly.
) Execute MCP tool has been broken in prod since launch: the backend runs under Bun (JSC, partial V8 ABI shim) but isolated-vm is a V8 native addon that fails to dlopen under Bun. Returns RuntimeUnavailable to every caller. #430 tried 'exec node --import tsx src/server.ts' to switch the runtime. The runtime swap mechanism worked end-to-end (verified isolated-vm + runScript() in the live image), but the actual server boot crashed because Node's cjs-module-lexer couldn't detect named exports of @lobu/core's CJS dist barrel when reached via the full server.ts import chain. Reverted in #431. Right fix per second-opinion review: bundle the backend at build time. esbuild resolves all workspace imports inline (using the 'bun' condition to pick up TS source rather than CJS dist), so Node never has to bind named imports against cross-package CJS barrels — the failure mode that broke #430 simply doesn't exist for the bundle. Native addons and packages with require-in-the-middle hooks (Sentry, OpenTelemetry, pino) stay external so their runtime hooks keep working. Implementation: - packages/owletto-backend/scripts/build-server-bundle.mjs: esbuild config that bundles workspace + relative imports, externalizes every bare specifier. ~2.4 MB self-contained ESM, ~100 ms build. - packages/owletto-backend/package.json: 'build:server' npm script. - docker/app/Dockerfile: switch 'bun install' to 'bun install --linker=hoisted' so node_modules is flat (Node's resolver doesn't understand bun's .bun/<pkg>@<ver>/ isolated layout). Add a 'bun run build:server' step in the builder stage. - docker/app/start.sh: 'exec node /app/.../dist/server.bundle.mjs' instead of 'exec bun src/server.ts'. Verified end-to-end against a locally built docker image: basic: {success: true, returnValue: 3} chaining: {success: true, returnValue: {slug: 'atlas', ok: true}} The CI sandbox-runtime test added in #430 stays as the regression guard - it'll keep failing until the runtime can actually load isolated-vm. Tradeoffs: - Adds ~5 s to the docker build (esbuild bundle). - Hoisted layout is npm-style flat: increases build-stage install time slightly but matches what Node expects natively. - Native addons and Sentry/OpenTelemetry/pino externalized; if any of those moves are wrong they'll surface at boot, not silently.
Why
The execute MCP tool has been silently broken in prod since launch.
isolated-vmis a V8 native addon; the prod backend runs under Bun (JavaScriptCore). Bun reportsprocess.versions.modules=137which makes node-gyp-build pick the abi137 prebuild at install — but loading the addon at runtime throws:PR #427 caught this and returned
RuntimeUnavailableinstead of crashing the MCP server. The hardening worked. But the tool itself never ran.What
docker/app/start.shfromexec bun src/server.tstoexec node --import tsx src/server.ts. tsx is already a devDep; the existing node_modules pipeline copies it into the runtime image.server.tsis already a Node-style entry (its docblock literally says "Node.js Server Entry Point", uses@hono/node-server).node --import tsxloads isolated-vm, runScript() returns the expected value.Regression guard
The bug shipped because the existing sandbox tests ran under
bun testwith askipIfRuntimeUnavailablehelper that turned the load failure into a no-op pass. Replaced with a vitest integration test (run-script-runtime.test.ts) that fails loudly when isolated-vm can't load, wired into CI invoking vitest undernodedirectly.SKIP_TEST_DB_SETUP=1keeps it fast — the test uses stub SDKs and doesn't need Postgres.Out of scope
The broader vitest suite (~38 integration files) isn't wired into CI today. ~40 of them are stale after #348's MCP tool consolidation (
manage_*→execute/search); fixing them deserves its own PR.Test plan
runScript()end-to-end in live prod image: succeeds under node + tsx, returns{success: true, returnValue: 3}