Skip to content

fix: actionable error + docs for pnpm dlx / pnpx native-load crash (#307)#1967

Merged
abhigyanpatwari merged 7 commits into
abhigyanpatwari:mainfrom
tupe12334:fix/pnpm-dlx-native-load-guard
Jun 2, 2026
Merged

fix: actionable error + docs for pnpm dlx / pnpx native-load crash (#307)#1967
abhigyanpatwari merged 7 commits into
abhigyanpatwari:mainfrom
tupe12334:fix/pnpm-dlx-native-load-guard

Conversation

@tupe12334

Copy link
Copy Markdown
Contributor

The bug

Running pnpm dlx gitnexus@latest serve (or pnpx gitnexus) crashes immediately with a raw native stack trace:

Error: dlopen(.../node_modules/@ladybugdb/core/lbugjs.node, 0x0001): tried: '.../@ladybugdb/core/lbugjs.node' (no such file) ...
    at Object.<anonymous> (.../@ladybugdb/core/lbug_native.js:22:11)
  code: 'ERR_DLOPEN_FAILED'

Reproduced on Node v25.9.0, macOS arm64, gitnexus 1.6.5, @ladybugdb/core 0.16.1/0.17.0.

Root cause

gitnexus depends on @ladybugdb/core, whose native addon (lbugjs.node) is copied into place by a postinstall script (install.js) out of the per-platform sub-package (@ladybugdb/core-darwin-arm64, etc.). pnpm dlx / pnpx (and any install with --ignore-scripts) run ephemerally and do not execute lifecycle scripts, so the binary is never placed and the runtime dlopen fails. The prebuilt binary genuinely exists inside the core-<platform> sub-package — it just never gets linked into place.

Repro

pnpm dlx gitnexus@latest serve
# or equivalently:
npm i gitnexus@latest --ignore-scripts && npx gitnexus serve

What this PR changes

serve / mcp / analyze already route through checkLbugNative() (via createLbugLazyAction), which detects the missing/unloadable binary and prints guidance instead of letting dlopen throw. However, that guidance only mentioned bun and --ignore-scripts — not the most common real-world trigger, pnpm dlx / pnpx.

  • src/core/lbug/native-check.ts: extend the "binary missing" guidance to call out the pnpm dlx / pnpx case and the fix:
    • install non-ephemerally and approve the build: pnpm add -g gitnexus && pnpm approve-builds -g
    • or run via npx / npm, which execute install scripts.
  • README.md: add an ERR_DLOPEN_FAILED / lbugjs.node troubleshooting section covering pnpm dlx/pnpx and pnpm approve-builds.
  • test/unit/lbug-native-check.test.ts: assert the new pnpm approve-builds guidance is present.

Verified end-to-end: with the binary removed, tsx src/cli/index.ts serve now prints the actionable message and exits non-zero (code 1) instead of dumping ERR_DLOPEN_FAILED.

What this PR does NOT do

It does not make pnpm dlx gitnexus work directly — dlx runs ephemerally and never executes the dependency's build script, so there is nothing gitnexus can do at that layer. The deeper fix (a runtime fallback that loads the binary directly from the @ladybugdb/core-<platform> sub-package even when postinstall was skipped) belongs in @ladybugdb/core and is being filed separately there. This PR turns the crash into clear, actionable guidance.

Refs #307 (doesn't work with pnpx). Related: #1939 (npx gitnexus crash on npm 11.x).

🤖 Generated with Claude Code

`pnpm dlx gitnexus serve` (and `pnpx gitnexus`) crash with a raw
`ERR_DLOPEN_FAILED` stack trace because @ladybugdb/core's native addon
(lbugjs.node) is placed by a postinstall script, and dlx/pnpx run
ephemerally without executing lifecycle scripts.

The existing checkLbugNative() guard already catches the missing binary
for serve/mcp/analyze, but its guidance only mentioned bun and
--ignore-scripts. Extend the message to call out the common pnpm dlx /
pnpx case and the fix (`pnpm add -g gitnexus && pnpm approve-builds -g`,
or use npx/npm). Add a matching README troubleshooting section.

This does not make `pnpm dlx` itself work — that requires a runtime
fallback in @ladybugdb/core. It turns the crash into actionable guidance.

Refs abhigyanpatwari#307

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@vercel

vercel Bot commented Jun 1, 2026

Copy link
Copy Markdown

@tupe12334 is attempting to deploy a commit to the NexusCore Team on Vercel.

A member of the Team first needs to authorize it.

@magyargergo

Copy link
Copy Markdown
Collaborator

@tupe12334 I have a solution for this but I need somebody to test it #1945

@magyargergo

Copy link
Copy Markdown
Collaborator

btw, for pnpm you need this command because of the security model they decided to go with:

pnpm --allow-build=@ladybugdb/core --allow-build=gitnexus --allow-build=tree-sitter dlx gitnexus@latest analyze

Incorporates collaborator feedback (magyargergo): pnpm's security model
allows `dlx` to run build scripts when you pass `--allow-build` for each
native dep. Add this as the first/preferred pnpm-dlx path in the error
message, README troubleshooting section, and test assertion. Drop the
now-incorrect claim that `pnpm dlx` "cannot be made to work directly".

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

@magyargergo magyargergo left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR tri-review (#1967, framed vs #1945)

Methods: GitNexus risk-architect + test-ci-verifier, Compound-Engineering correctness + adversarial (4 Claude persona lanes), Codex (codex-rescue, live). Engines: 4× Claude + 1× Codex.

Verdict: production-ready with minor follow-ups — the change correctly turns ERR_DLOPEN_FAILED into actionable stderr for the #307 repro path; fix the P1 global-install line before merge on pnpm 11 hosts.

Branch hygiene: clean feature/fix PR — 2 commits, 3 files, no unrelated churn.

Merge state: checks failing or incomplete — only Vercel (org auth gate) visible via gh pr checks; no Actions rollup in this environment. Branch is behind main. Local vitest run test/unit/lbug-native-check.test.ts: 3/3 pass.


What is solid [code-read]

  • createLbugLazyAction (gitnexus/src/cli/lazy-action.ts:32-36) calls checkLbugNative() before serve/mcp/analyze — raw dlopen should not reach users on the missing-binary path.
  • --allow-build flags are before dlx (native-check.ts:49-50) — matches PR #1945’s resolve-analyze-cmd.cjs comment (issue-1939 branch, lines 12-15).
  • Allow-build package triple matches PR #1945 PNPM_ALLOW_BUILD_BASE (@ladybugdb/core, gitnexus, tree-sitter).

Cross-PR context (#1945 “in the sense of”)

PR #1945 steers hooks/runner toward pnpm --allow-build=… dlx (not bare dlx) to dodge npm 11 npx (#1939). #1967 is the safety net when users still run pnpm dlx gitnexus@latest serve (#307) and hit missing lbugjs.node. The two PRs are complementary, not contradictory — but README will need a merge-time dedupe with #1945’s rewritten npm-11 section, and #1945’s skill-file steering commit (3f2434f9 on issue-1939) still emits bare pnpm dlx gitnexus@latest analyze without flags (predictable first-failure loop until stderr from this PR).


Headline findings (inline comments on verified lines)

Sev Theme Blocks merge?
P1 pnpm approve-builds -g invalid on pnpm 11+ yes (broken repair path on target audience)
P2 Unqualified npx escape vs npm 11 arborist crash no
P3 Load-failure branch still bun-only; examples use analyze not serve no

P1 — global install workaround

  • Risk: Users on pnpm 11 copy pnpm add -g gitnexus && pnpm approve-builds -g and hit an unsupported flag.
  • Evidence: native-check.ts:52, README.md:317-318; pnpm approve-builds docs: “--global, -g — Removed in v11.0.0”.
  • Recommended fix: pnpm add -g --allow-build=@ladybugdb/core --allow-build=gitnexus --allow-build=tree-sitter gitnexus (flags before subcommand, same triple as #1945).

P2 — npx/npm fallback

  • Risk: npm 11 npx crashes in arborist before gitnexus runs (#1939) — worse than this message.
  • Evidence: native-check.ts:53, README.md:313-314; README L274+ already documents node.target is null.
  • Recommended fix: Qualify (“npm < 11 or after npm i -g gitnexus”) or prefer global npm / flagged pnpm dlx.

P3 — nits

  • Load-failure branch (native-check.ts:79-81) unchanged — partial pnpm installs that leave a bad lbugjs.node get no pnpm/--allow-build hints.
  • Examples (native-check.ts:50, README.md:310-311) say analyze; issue repro is serve — add serve/mcp or a <command> placeholder.
  • Tests (lbug-native-check.test.ts:27-28) pin substrings only — consider asserting --allow-build index < dlx index.

Refuted / lower priority (body only)

  • Refuted: This PR cannot make bare pnpm dlx work without --allow-build — second commit documents the correct flagged command (pnpm 10.2+).
  • Refuted as blocking: Contradicts #1945 runner — #1967 does not touch hooks/resolver.
  • Advisory: Windows users copying POSIX \ continuation from stderr (native-check.ts:49); pnpm < 10.2 lacks --allow-build ( #1945 gates; this message does not).

Automated multi-tool digest — verify before acting. Coordinator tags: inline findings are [code-read] unless noted.

Comment thread gitnexus/src/core/lbug/native-check.ts Outdated
Comment thread gitnexus/README.md Outdated
Comment thread gitnexus/src/core/lbug/native-check.ts Outdated
Comment thread gitnexus/README.md Outdated
magyargergo and others added 5 commits June 1, 2026 19:15
Replace removed pnpm approve-builds -g with add -g --allow-build flags,
qualify npm 11 npx caveats, use serve in examples, extend load-failure hints,
and assert --allow-build precedes dlx in tests.

Co-authored-by: Cursor <cursoragent@cursor.com>
@abhigyanpatwari abhigyanpatwari merged commit bfe8a87 into abhigyanpatwari:main Jun 2, 2026
9 of 10 checks passed
@github-actions

github-actions Bot commented Jun 2, 2026

Copy link
Copy Markdown
Contributor

CI Report

All checks passed

Pipeline Status

Stage Status Details
✅ Typecheck success tsc --noEmit
✅ Tests success unit tests, 3 platforms
✅ E2E success gitnexus-web changes only

Test Results

Tests Passed Failed Skipped Duration
10771 10761 0 10 662s

✅ All 10761 tests passed

10 test(s) skipped — expand for details
  • COBOL pipeline benchmark > scales with file count
  • C# pipeline benchmark > scales with file count — namespaces spread across the solution
  • C# pipeline benchmark > scales with file count — all types in one (global) namespace bucket
  • C# pipeline benchmark > scales with file count — all types in one (named) namespace bucket
  • Go pipeline benchmark > scales with file count (workers enabled)
  • Go pipeline benchmark — worker pool (issue Worker idle timeout kills long Go scope extraction and surfaces as Napi::Error during analyze #1848) > does not quarantine the large generated Go file on sub-batch idle timeout
  • PHP pipeline benchmark > scales with file count (workers enabled)
  • Ruby pipeline benchmark > scales with file count (workers enabled)
  • Rust pipeline benchmark > scales with file count (workers enabled)
  • buildTypeEnv > known limitations (documented skip tests) > Ruby block parameter: users.each { |user| } — closure param inference, different feature

Code Coverage

Tests

Metric Coverage Covered Base Delta Status
Statements 80.35% 37498/46667 79.84% 📈 +0.5 🟢 ████████████████░░░░
Branches 68.9% 23870/34642 68.5% 📈 +0.4 🟢 █████████████░░░░░░░
Functions 85.45% 3885/4546 84.94% 📈 +0.5 🟢 █████████████████░░░
Lines 83.92% 33741/40205 83.36% 📈 +0.6 🟢 ████████████████░░░░

📋 View full run · Generated by CI

@tupe12334 tupe12334 deleted the fix/pnpm-dlx-native-load-guard branch June 2, 2026 06:18
magyargergo added a commit to magyargergo/GitNexus that referenced this pull request Jun 10, 2026
…dlx (abhigyanpatwari#307)

`@huggingface/transformers` does a bare `import 'onnxruntime-common'` from its
shipped `dist/transformers.node.mjs`, but never declares onnxruntime-common in
its own `dependencies`. npm's flat node_modules (and pnpm with hoisting) place
it on transformers' resolution path by accident; pnpm's isolated store only
links a package's declared deps into its scope, so under pnpm-strict /
`pnpm dlx` / `pnpx` the import dies with ERR_MODULE_NOT_FOUND before
`analyze --embeddings` can run.

Declaring onnxruntime-common in gitnexus' own deps (abhigyanpatwari#2074) does not fix this
under pnpm: Node resolves the bare specifier from transformers' module scope,
not ours, and overrides/resolutions can only re-version an existing edge, never
add the missing one (verified against a real `hoist=false` install — the
declaration only changes which version wins the hoist, never whether the import
resolves).

Fix: install a synchronous, in-thread ESM resolution hook
(`module.registerHooks`) right before the lazy transformers import that
redirects `onnxruntime-common` to the copy gitnexus depends on — but only when
the default resolver fails. On npm / hoisted layouts the default resolver
succeeds first and the hook never fires, so working setups are unchanged. The
hook only intercepts the exact `onnxruntime-common` specifier on failure, so it
can never mask an unrelated resolution error; onnxruntime-node's native binding
still loads normally from transformers' own scope.

`registerHooks` (sync, in-thread, single inline closure) is preferred over the
older `module.register` (async, off-thread, now deprecated — DEP0205, removed in
Node 26): the redirect is a one-line conditional that needs no worker thread, no
separate hook module, and no `data` marshalling. It is available on Node >= 22.15;
on older runtimes the helper is a graceful no-op (the gitnexus engines floor is
>= 22.0.0, and the import still resolves on hoisted layouts there).

Chosen over bundling transformers (the build is tsc-only, and transformers
carries native onnxruntime-node + WASM onnxruntime-web assets that bundle
poorly). Installation is idempotent, best-effort, and lazy — only on the
local-embedding path, so it never affects analysis, the parse workers, or HTTP
embedding mode.

Validated end-to-end: the compiled resolver fixes a real pnpm `hoist=false`
transformers install (ERR_MODULE_NOT_FOUND -> resolved). The separate
`@ladybugdb/core` native-binary path under pure `pnpm dlx` is unchanged (abhigyanpatwari#1967
handles that gracefully).

Refs abhigyanpatwari#307, abhigyanpatwari#2069

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
magyargergo added a commit that referenced this pull request Jun 10, 2026
…dlx (#307) (#2139)

* fix(embeddings): resolve onnxruntime-common under pnpm-strict / pnpm dlx (#307)

`@huggingface/transformers` does a bare `import 'onnxruntime-common'` from its
shipped `dist/transformers.node.mjs`, but never declares onnxruntime-common in
its own `dependencies`. npm's flat node_modules (and pnpm with hoisting) place
it on transformers' resolution path by accident; pnpm's isolated store only
links a package's declared deps into its scope, so under pnpm-strict /
`pnpm dlx` / `pnpx` the import dies with ERR_MODULE_NOT_FOUND before
`analyze --embeddings` can run.

Declaring onnxruntime-common in gitnexus' own deps (#2074) does not fix this
under pnpm: Node resolves the bare specifier from transformers' module scope,
not ours, and overrides/resolutions can only re-version an existing edge, never
add the missing one (verified against a real `hoist=false` install — the
declaration only changes which version wins the hoist, never whether the import
resolves).

Fix: install a synchronous, in-thread ESM resolution hook
(`module.registerHooks`) right before the lazy transformers import that
redirects `onnxruntime-common` to the copy gitnexus depends on — but only when
the default resolver fails. On npm / hoisted layouts the default resolver
succeeds first and the hook never fires, so working setups are unchanged. The
hook only intercepts the exact `onnxruntime-common` specifier on failure, so it
can never mask an unrelated resolution error; onnxruntime-node's native binding
still loads normally from transformers' own scope.

`registerHooks` (sync, in-thread, single inline closure) is preferred over the
older `module.register` (async, off-thread, now deprecated — DEP0205, removed in
Node 26): the redirect is a one-line conditional that needs no worker thread, no
separate hook module, and no `data` marshalling. It is available on Node >= 22.15;
on older runtimes the helper is a graceful no-op (the gitnexus engines floor is
>= 22.0.0, and the import still resolves on hoisted layouts there).

Chosen over bundling transformers (the build is tsc-only, and transformers
carries native onnxruntime-node + WASM onnxruntime-web assets that bundle
poorly). Installation is idempotent, best-effort, and lazy — only on the
local-embedding path, so it never affects analysis, the parse workers, or HTTP
embedding mode.

Validated end-to-end: the compiled resolver fixes a real pnpm `hoist=false`
transformers install (ERR_MODULE_NOT_FOUND -> resolved). The separate
`@ladybugdb/core` native-binary path under pure `pnpm dlx` is unchanged (#1967
handles that gracefully).

Refs #307, #2069

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* fix(embeddings): version-match the onnxruntime-common redirect target (#307)

Prefer the onnxruntime-common that onnxruntime-node (the native binding
transformers actually loads) depends on, so the redirected copy is version-
matched to that binding even under `pnpm dlx` — where gitnexus' npm-style
`overrides` block does not apply, because it is honoured only from a root
manifest and gitnexus is a transitive dependency there. The walk resolves
transformers' main entry (not its `exports`-blocked package.json) ->
onnxruntime-node -> its onnxruntime-common, and falls back to gitnexus' own
direct dependency when the chain can't be walked. Also corrects the doc comment
that claimed the gitnexus copy was already "version-aligned".

Addresses a PR #2139 tri-review finding (P2).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* fix(embeddings): narrow the onnxruntime-common resolve fallback to absence errors (#307)

The resolve closure's `catch` swallowed every error from `nextResolve` and
redirected, which would silently paper over a genuinely present-but-broken
onnxruntime-common install. Only substitute gitnexus' copy when the specifier is
actually absent (ERR_MODULE_NOT_FOUND, or ERR_PACKAGE_PATH_NOT_EXPORTED for an
exports-broken copy); rethrow anything else. Adds a test that an unrelated error
code rethrows.

Addresses a PR #2139 tri-review finding (P3).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* test(embeddings): cover the onnxruntime-common resolver best-effort swallow path (#307)

The outer try/catch in ensureOnnxRuntimeCommonResolvable() was untested. A
throwing registerHooks spy drives it; the call must not throw (initEmbedder does
not guard the return, so a throw would break `analyze --embeddings`). The vitest
quirk that surfaced an earlier attempt applies to throwing mock factories, not a
throwing spy implementation, so this is testable cleanly.

Addresses a PR #2139 tri-review finding (P2).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* test(embeddings): tighten the onnxruntime-common redirect-URL assertion (#307)

`/^file:\/\/.*onnxruntime-common/` matched the substring anywhere, so a lookalike
path (e.g. `/x/onnxruntime-common-fake/`) would pass. Require an actual
`/node_modules/onnxruntime-common/...js` segment so the assertion proves the
redirect resolves to the real package, not just a string match.

Addresses a PR #2139 tri-review finding (P3).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* refactor(embeddings): drop the no-op __resetOnnxRuntimeCommonResolverForTests seam (#307)

The test helper reloads the resolver via vi.resetModules() + a fresh import(),
which already re-initialises the module-level one-shot `attempted` flag to false.
The __reset export it then called was therefore a no-op. Remove the test-only
export and its call; isolation now rests solely on vi.resetModules().

Addresses a PR #2139 tri-review finding (P3).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* docs(embeddings): correct the onnxruntime-common resolver isolation comment (#307)

The doc comment claimed the hook "never affects other tools' resolution". Once
installed, `module.registerHooks` is process-global and its resolve closure runs
for every subsequent resolution — it passes them all through untouched and only
substitutes the exact `onnxruntime-common` specifier on genuine absence, at a
cost of one string comparison. Also note `registerHooks` is @experimental and
requires Node >= 22.15 (graceful no-op below that). Comment-only.

Addresses a PR #2139 tri-review finding (P3).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants