Skip to content

fix(execute): run backend under Node so isolated-vm loads#430

Merged
buremba merged 3 commits into
mainfrom
fix/execute-runtime-node
Apr 27, 2026
Merged

fix(execute): run backend under Node so isolated-vm loads#430
buremba merged 3 commits into
mainfrom
fix/execute-runtime-node

Conversation

@buremba
Copy link
Copy Markdown
Member

@buremba buremba commented Apr 27, 2026

Why

The execute MCP tool has been silently broken in prod since launch. isolated-vm is a V8 native addon; the prod backend runs under Bun (JavaScriptCore). Bun reports process.versions.modules=137 which makes node-gyp-build pick the abi137 prebuild at install — but loading the addon at runtime throws:

undefined symbol: _ZN2v815ValueSerializer8Delegate19HasCustomHostObjectEPNS_7IsolateE

PR #427 caught this and returned RuntimeUnavailable instead of crashing the MCP server. The hardening worked. But the tool itself never ran.

What

  • Switch docker/app/start.sh from exec bun src/server.ts to exec node --import tsx src/server.ts. tsx is already a devDep; the existing node_modules pipeline copies it into the runtime image. server.ts is already a Node-style entry (its docblock literally says "Node.js Server Entry Point", uses @hono/node-server).
  • Verified end-to-end inside the live prod image: node --import tsx loads isolated-vm, runScript() returns the expected value.

Regression guard

The bug shipped because the existing sandbox tests ran under bun test with a skipIfRuntimeUnavailable helper that turned the load failure into a no-op pass. Replaced with a vitest integration test (run-script-runtime.test.ts) that fails loudly when isolated-vm can't load, wired into CI invoking vitest under node directly. SKIP_TEST_DB_SETUP=1 keeps it fast — the test uses stub SDKs and doesn't need Postgres.

Out of scope

The broader vitest suite (~38 integration files) isn't wired into CI today. ~40 of them are stale after #348's MCP tool consolidation (manage_*execute/search); fixing them deserves its own PR.

Test plan

  • runScript() end-to-end in live prod image: succeeds under node + tsx, returns {success: true, returnValue: 3}
  • New vitest test fails on Node 25 (correct — runtime guard rejects it) and passes on Node 22 (verified via docker)
  • CI on this PR runs the new sandbox-runtime test and passes
  • After merge: build-images succeeds, deploy rolls, MCP execute call returns a real result instead of RuntimeUnavailable

Bun uses JavaScriptCore and exposes a partial V8 ABI shim — enough
that node-gyp-build picks the abi137 prebuild for isolated-vm at install
time, but not enough that the addon dlopens at runtime. Loading throws
'undefined symbol: v8::ValueSerializer::Delegate::HasCustomHostObject'.

The previous fix (#427) caught this and returned RuntimeUnavailable
instead of crashing — but the execute MCP tool still silently failed
in prod. Fix the underlying mismatch by running the backend under Node:
the source layout, dependencies, and Hono server entry are already
Node-style ('Node.js Server Entry Point' is in the docblock); the only
Bun-specific thing was 'exec bun src/server.ts' in start.sh.

Switch start.sh to 'exec node --import tsx src/server.ts'. tsx is
already installed (devDep, copied into the runtime image via the
existing node_modules pipeline) and handles ESM + TypeScript loading
without a precompile step. Verified end-to-end inside the live prod
image: node --import tsx loads isolated-vm, runs runScript() and
returns the expected value.

The regression silently shipped because the existing sandbox tests
ran under bun test with a 'skipIfRuntimeUnavailable' helper that
turned the load failure into a no-op pass. Replace with a vitest
integration test that fails loudly when isolated-vm can't load, and
wire it into CI invoking vitest under node directly. SKIP_TEST_DB_SETUP=1
keeps it fast — the test uses stub SDKs and doesn't need Postgres.

The broader vitest suite (~38 integration files) is not yet wired into
CI; many are stale after #348's MCP tool consolidation. Tracked
separately.
@github-actions github-actions Bot added the triage:needs-human Triage agent escalated for human review label Apr 27, 2026
@github-actions
Copy link
Copy Markdown
Contributor

Triage decision: needs-human

Reasons:

  • PR modifies infrastructure files under docker/ (docker/app/start.sh) - infra blast radius requires human review
  • Changes to Docker runtime configuration (switching from Bun to Node) need careful verification

Next: Human review required due to infrastructure changes. Assigned to @buremba for review and manual merge after validation.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: d275a27130

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread docker/app/start.sh Outdated
# RuntimeUnavailable under bun. tsx provides the TS loader so the
# source layout stays uncompiled. cwd needs to be the package so
# `--import tsx` resolves from owletto-backend's node_modules.
cd /app/packages/owletto-backend
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Preserve repo-root cwd before starting backend

Changing the startup script to cd /app/packages/owletto-backend regresses components that resolve config paths from process.cwd(): for example, gateway startup constructs ProviderRegistryService("config/providers.json") (packages/gateway/src/services/core-services.ts) and the embedded agent routes default to resolve(process.cwd(), 'config/providers.json') (packages/owletto-backend/src/lobu/agent-routes.ts). With the new cwd, both resolve to /app/packages/owletto-backend/config/providers.json instead of /app/config/providers.json, so when LOBU_PROVIDER_REGISTRY_PATH is not explicitly set the bundled provider registry is silently not loaded.

Useful? React with 👍 / 👎.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Apr 27, 2026

Triage decision: needs-human

Reasons:

  • PR modifies infrastructure files under docker/ (docker/app/start.sh) - infra blast radius requires human review
  • Review comment contains P1 escalation keyword indicating a priority issue with working directory changes
  • Changes to Docker runtime configuration (switching from Bun to Node) need careful verification
  • P1 comment specifically flags concern about config path resolution after working directory change

Next: Human review required due to infrastructure changes and P1 issue flagged in review. The P1 comment highlights that changing the startup script working directory may break components that resolve config paths from process.cwd(). Assigned to @buremba for review and manual merge after addressing the working directory concern.

Codex review on #430 caught that 'cd /app/packages/owletto-backend'
breaks bundled config resolution: gateway's ProviderRegistryService and
the embedded agent routes both resolve 'config/providers.json' from
process.cwd(). Under the previous cwd of /app, this finds
/app/config/providers.json. Under /app/packages/owletto-backend, it
resolves to a path that doesn't exist and the bundled provider
registry is silently empty.

Pass tsx as an absolute file URL so the loader resolves regardless of
cwd, and run server.ts by absolute path. Verified in the live prod
image: runScript() returns success with cwd preserved at /app.
@buremba buremba merged commit 71c74e1 into main Apr 27, 2026
14 checks passed
@buremba buremba deleted the fix/execute-runtime-node branch April 27, 2026 22:42
buremba added a commit that referenced this pull request Apr 27, 2026
PR #430 switched docker/app/start.sh from 'exec bun src/server.ts' to
'exec node --import tsx src/server.ts' so isolated-vm could load.
End-to-end smoke in the live image confirmed the runtime swap itself
worked, but the resulting prod image crashloops on boot:

  SyntaxError: The requested module '@lobu/core' does not provide an
  export named 'createBuiltinSecretRef'
    at packages/owletto-backend/src/lobu/stores/postgres-secret-store.ts:13

The export exists in @lobu/core/dist/secret-refs.js. Node's
cjs-module-lexer detects it when the file is the sole entry, but
fails to detect it when reached via the full server.ts boot chain.
Bun has its own CJS↔ESM interop and didn't hit this. Fixing properly
requires shipping @lobu/core as dual ESM+CJS instead of CJS-only
behind an 'import' condition.

Reverting unblocks the partial deploy. Old pods are still serving
traffic (CrashLoopBackOff prevented rollover). The execute tool
returns to its pre-#430 state — same RuntimeUnavailable behavior
that's been in prod since #427 hardened the load failure. No
user-facing regression beyond what we already had this morning.

Tracked: convert @lobu/core (and other workspace packages with the
same shape) to dual ESM+CJS so the runtime swap can land cleanly.
buremba added a commit that referenced this pull request Apr 28, 2026
)

Execute MCP tool has been broken in prod since launch: the backend runs
under Bun (JSC, partial V8 ABI shim) but isolated-vm is a V8 native
addon that fails to dlopen under Bun. Returns RuntimeUnavailable to
every caller.

#430 tried 'exec node --import tsx src/server.ts' to switch the runtime.
The runtime swap mechanism worked end-to-end (verified isolated-vm +
runScript() in the live image), but the actual server boot crashed
because Node's cjs-module-lexer couldn't detect named exports of
@lobu/core's CJS dist barrel when reached via the full server.ts
import chain. Reverted in #431.

Right fix per second-opinion review: bundle the backend at build time.
esbuild resolves all workspace imports inline (using the 'bun' condition
to pick up TS source rather than CJS dist), so Node never has to bind
named imports against cross-package CJS barrels — the failure mode that
broke #430 simply doesn't exist for the bundle. Native addons and
packages with require-in-the-middle hooks (Sentry, OpenTelemetry, pino)
stay external so their runtime hooks keep working.

Implementation:
- packages/owletto-backend/scripts/build-server-bundle.mjs: esbuild
  config that bundles workspace + relative imports, externalizes every
  bare specifier. ~2.4 MB self-contained ESM, ~100 ms build.
- packages/owletto-backend/package.json: 'build:server' npm script.
- docker/app/Dockerfile: switch 'bun install' to 'bun install
  --linker=hoisted' so node_modules is flat (Node's resolver doesn't
  understand bun's .bun/<pkg>@<ver>/ isolated layout). Add a 'bun run
  build:server' step in the builder stage.
- docker/app/start.sh: 'exec node /app/.../dist/server.bundle.mjs'
  instead of 'exec bun src/server.ts'.

Verified end-to-end against a locally built docker image:
  basic: {success: true, returnValue: 3}
  chaining: {success: true, returnValue: {slug: 'atlas', ok: true}}

The CI sandbox-runtime test added in #430 stays as the regression guard
- it'll keep failing until the runtime can actually load isolated-vm.

Tradeoffs:
- Adds ~5 s to the docker build (esbuild bundle).
- Hoisted layout is npm-style flat: increases build-stage install time
  slightly but matches what Node expects natively.
- Native addons and Sentry/OpenTelemetry/pino externalized; if any of
  those moves are wrong they'll surface at boot, not silently.
@buremba buremba restored the fix/execute-runtime-node branch May 12, 2026 00:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

triage:needs-human Triage agent escalated for human review

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant