Skip to content

fix(embeddings): resolve onnxruntime-common under pnpm-strict / pnpm dlx (#307)#2139

Merged
magyargergo merged 9 commits into
abhigyanpatwari:mainfrom
magyargergo:fix/307-pnpx
Jun 10, 2026
Merged

fix(embeddings): resolve onnxruntime-common under pnpm-strict / pnpm dlx (#307)#2139
magyargergo merged 9 commits into
abhigyanpatwari:mainfrom
magyargergo:fix/307-pnpx

Conversation

@magyargergo

@magyargergo magyargergo commented Jun 10, 2026

Copy link
Copy Markdown
Collaborator

The bug (#307 / #2069)

pnpm dlx gitnexus@latest analyze --embeddings (and pnpm add -g strict installs) crash with:

Cannot find package 'onnxruntime-common' imported from
.../@huggingface/transformers/dist/transformers.node.mjs

@huggingface/transformers@4.2.0 does a bare import 'onnxruntime-common' from its shipped transformers.node.mjs, but never declares onnxruntime-common in its own dependencies (it lists onnxruntime-node / onnxruntime-web / sharp). It's a phantom dependency: npm's flat node_modules — and pnpm with hoisting — place it on transformers' resolution path by accident, so the import resolves. Under pnpm's isolated store a package only sees its declared deps, so the import dies.

Why #2074 didn't fix it for pnpm

#2074 declared onnxruntime-common in gitnexus' own deps. Node resolves the bare specifier from transformers' module scope, not ours, so a sibling-scope declaration is invisible to it under pnpm-strict. Verified empirically against real pnpm installs (10.x and 11.x): under default hoisting the import already resolves (so most users are fine, and #2074 only changes which onnxruntime-common version wins the hoist); under a non-hoisting / strict layout it fails even with #2074.

The fix

Install a synchronous, in-thread ESM resolution hook (module.registerHooks) immediately before the lazy import('@huggingface/transformers'). The hook:

  • tries normal resolution first and only redirects onnxruntime-common when the default resolver fails — so on npm / hoisted layouts it never fires and behaviour is unchanged;
  • redirects to the copy gitnexus depends on (a direct dep since fix: declare onnxruntime-common runtime dependency #2074; onnxruntime-common is a stable pure-JS package whose Tensor API is unchanged across 1.24–1.26, version-aligned with the onnxruntime-node transformers loads), always resolvable from gitnexus' own scope regardless of how the consumer's package manager linked things;
  • only ever touches the exact onnxruntime-common specifier on failure, so it can't mask an unrelated resolution error. onnxruntime-node's native binding still loads normally from transformers' own scope.

Installation is idempotent, best-effort, and lazy — only on the local-embedding code path, so it never affects analysis, the parse workers, or HTTP embedding mode. If anything goes wrong it's swallowed and embeddings proceed exactly as before.

Why registerHooks (and not register, or bundling)

  • module.register() (async, off-thread) is deprecatedDEP0205, slated for removal in Node 26 — and would force a separate hook module + cross-thread data plumbing. module.registerHooks() is the supported replacement: one inline synchronous closure, no separate file, no worker thread. It's available on Node ≥ 22.15; gitnexus' engine floor is >=22.0.0, so on older 22.x the helper is a graceful no-op (the import still resolves on hoisted layouts there). I kept the floor at >=22.0.0 rather than bumping the whole package's engines for one best-effort fallback.
  • Bundling transformers with esbuild was rejected: the build is tsc-only, and transformers carries a native onnxruntime-node binding + WASM onnxruntime-web assets that bundle poorly.

Validation

  • End-to-end with the compiled code: the real dist/ resolver fixes a genuine pnpm hoist=false transformers install — ERR_MODULE_NOT_FOUND → resolved.
  • Embedding unit tests pass (tsc --noEmit clean). New unit test covers the resolve closure's redirect / passthrough / rethrow logic, one-shot installation, and graceful degradation when registerHooks is unavailable.

Scope / what this does NOT cover

The separate @ladybugdb/core native-binary failure under pure pnpm dlx/pnpx (its lbugjs.node is staged by a postinstall that ephemeral runners skip) is unchanged — owned upstream in @ladybugdb/core, and #1967 already makes it fail gracefully. This PR restores --embeddings under pnpm-strict / global installs.

Suggested follow-ups (not in this PR)

🤖 Generated with Claude Code

@vercel

vercel Bot commented Jun 10, 2026

Copy link
Copy Markdown

@magyargergo is attempting to deploy a commit to the NexusCore Team on Vercel.

A member of the Team first needs to authorize it.

@github-actions

github-actions Bot commented Jun 10, 2026

Copy link
Copy Markdown
Contributor

✨ PR Autofix

Found fixable formatting / unused-import issues across 16 changed lines. Comment /autofix on this PR to apply them, or run npm run lint:fix && npm run format locally.

{"schema":"gitnexus.pr-autofix/v2","state":"fixes-available","pr_number":2139,"changed_lines":16,"head_sha":"33654364f5dd33410160f991b8c44a3dca0f2c7a","run_id":"27264946037","apply_command":"/autofix"}

@github-actions

github-actions Bot commented Jun 10, 2026

Copy link
Copy Markdown
Contributor

CI Report

All checks passed

Pipeline Status

Stage Status Details
✅ Typecheck success tsc --noEmit
✅ Tests success unit tests, 3 platforms
✅ E2E success gitnexus-web changes only

Test Results

Tests Passed Failed Skipped Duration
10993 10977 0 16 564s

✅ All 10977 tests passed

16 test(s) skipped — expand for details
  • COBOL pipeline benchmark > scales with file count
  • C++ ADL emit benchmark > emit phase scales sub-quadratically with co-scaled files and sites
  • C++ pipeline benchmark > scales with file count
  • C# pipeline benchmark > scales with file count — namespaces spread across the solution
  • C# pipeline benchmark > scales with file count — all types in one (global) namespace bucket
  • C# pipeline benchmark > scales with file count — all types in one (named) namespace bucket
  • Go pipeline benchmark > scales with file count (workers enabled)
  • Go pipeline benchmark — worker pool (issue Worker idle timeout kills long Go scope extraction and surfaces as Napi::Error during analyze #1848) > does not quarantine the large generated Go file on sub-batch idle timeout
  • Go structural interface detection benchmark > scales linearly with interface × struct count
  • Go structural interface detection split-phase benchmark > separates index-build and detection time
  • PHP pipeline benchmark > scales with file count (workers enabled)
  • Ruby pipeline benchmark > scales with file count (workers enabled)
  • Rust pipeline benchmark > scales with file count (workers enabled)
  • Vue pipeline benchmark > scales with component count
  • run.cjs direct-exec entrypoint (fix(cli): steer docs, skills, and hooks through a CLI-neutral project-local runner (#1939) #1945) > resolves a .cmd shim via the Windows shell branch, passing args and exit code
  • buildTypeEnv > known limitations (documented skip tests) > Ruby block parameter: users.each { |user| } — closure param inference, different feature

Code Coverage

Tests

Metric Coverage Covered Base Delta Status
Statements 75.27% 35909/47702 N/A% 🟢 ███████████████░░░░░
Branches 63.05% 22250/35286 N/A% 🟢 ████████████░░░░░░░░
Functions 80.98% 3872/4781 N/A% 🟢 ████████████████░░░░
Lines 79.07% 32467/41061 N/A% 🟢 ███████████████░░░░░

📋 View full run · Generated by CI

…dlx (abhigyanpatwari#307)

`@huggingface/transformers` does a bare `import 'onnxruntime-common'` from its
shipped `dist/transformers.node.mjs`, but never declares onnxruntime-common in
its own `dependencies`. npm's flat node_modules (and pnpm with hoisting) place
it on transformers' resolution path by accident; pnpm's isolated store only
links a package's declared deps into its scope, so under pnpm-strict /
`pnpm dlx` / `pnpx` the import dies with ERR_MODULE_NOT_FOUND before
`analyze --embeddings` can run.

Declaring onnxruntime-common in gitnexus' own deps (abhigyanpatwari#2074) does not fix this
under pnpm: Node resolves the bare specifier from transformers' module scope,
not ours, and overrides/resolutions can only re-version an existing edge, never
add the missing one (verified against a real `hoist=false` install — the
declaration only changes which version wins the hoist, never whether the import
resolves).

Fix: install a synchronous, in-thread ESM resolution hook
(`module.registerHooks`) right before the lazy transformers import that
redirects `onnxruntime-common` to the copy gitnexus depends on — but only when
the default resolver fails. On npm / hoisted layouts the default resolver
succeeds first and the hook never fires, so working setups are unchanged. The
hook only intercepts the exact `onnxruntime-common` specifier on failure, so it
can never mask an unrelated resolution error; onnxruntime-node's native binding
still loads normally from transformers' own scope.

`registerHooks` (sync, in-thread, single inline closure) is preferred over the
older `module.register` (async, off-thread, now deprecated — DEP0205, removed in
Node 26): the redirect is a one-line conditional that needs no worker thread, no
separate hook module, and no `data` marshalling. It is available on Node >= 22.15;
on older runtimes the helper is a graceful no-op (the gitnexus engines floor is
>= 22.0.0, and the import still resolves on hoisted layouts there).

Chosen over bundling transformers (the build is tsc-only, and transformers
carries native onnxruntime-node + WASM onnxruntime-web assets that bundle
poorly). Installation is idempotent, best-effort, and lazy — only on the
local-embedding path, so it never affects analysis, the parse workers, or HTTP
embedding mode.

Validated end-to-end: the compiled resolver fixes a real pnpm `hoist=false`
transformers install (ERR_MODULE_NOT_FOUND -> resolved). The separate
`@ladybugdb/core` native-binary path under pure `pnpm dlx` is unchanged (abhigyanpatwari#1967
handles that gracefully).

Refs abhigyanpatwari#307, abhigyanpatwari#2069

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

@magyargergo magyargergo left a comment

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tri-review of #2139 — onnxruntime-common pnpm-strict fix

3 methods: GitNexus swarm + Compound-Engineering personas + Codex. Engine breakdown: 6 Claude lanes + Codex (the one independent engine; live). Independence is asymmetric — Claude-lane agreement is "consistent across personas," not independent confirmation; the strong signal is Codex agreeing with a Claude lane.

The fix is sound — no P0/P1

The correctness and adversarial lanes empirically executed the redirect on Node 22.16 against the real installed packages and refuted the scariest hypotheses:

  • No new dual-instance Tensor identity break — the redirect makes transformers use the same CJS onnxruntime-common build that onnxruntime-node already loads, and the feed path into InferenceSession.run is duck-typed, not instanceof-gated. (Codex raised this as a P2; two Claude lanes ran it and refuted it.)
  • The CJS-entry redirect still yields the named Tensor export via cjs-module-lexer; and (correctness lane) omitting format is safe because onnxruntime-common ships a nested dist/cjs/package.json {"type":"commonjs"}.
  • The helper is fully wrapped in try/catch, so a failure degrades to a logged no-op rather than throwing into the embedder (that swallow path is itself untested — see Test gaps).

Inline findings (2)

See the two inline comments (:70, :79).

Body / lower-priority (not inlined)

  • [P2] Test gap (test/CI + testing lanes): the resolver's outer try/catch swallow path is untested. The reason given for dropping the test (vitest surfaces a throwing mock) applies to a throwing factory, not a throwing spy — verified loadResolver(vi.fn(() => { throw … })) + not.toThrow() passes. Worth adding.
  • [P3] Weak test assertion (onnxruntime-common-resolver.test.ts): /^file:\/\/.*onnxruntime-common/ matches loosely; anchor to /node_modules/onnxruntime-common/.
  • [P3] __resetOnnxRuntimeCommonResolverForTests is effectively a no-op after vi.resetModules() + a fresh import (maintainability + testing lanes) — consider removing the seam.
  • [P3] Comment overstates isolation: "never affects other tools' resolution" — the hook is process-global and fires (cheaply, inert) for every resolution in the long-lived MCP server (adversarial; Codex refuted any real overhead concern).
  • [design] Node 22.0–22.14: registerHooks needs ≥22.15 while the engines floor is ≥22.0.0, so the fix is a silent no-op there (documented in the file; registerHooks is also @experimental).
  • Refuted/contested: the adversarial lane raised "attempted=true before the try poisons the fallback on a transient require.resolve failure"; Codex refuted it (a failed require.resolve has no valid redirect target to retry anyway).

CI

Real test/build checks pending as of a880f31b; the Vercel fail is deploy-auth, not code. The PR head is a "Merge main" commit (branch updated with main) — verified the net PR diff is exactly the 4 source files (merge-base 7eaeb0a0; 425809cf..HEAD shows no change to those 4 files).

Automated multi-tool digest (GitNexus swarm + CE personas + Codex) — verify before acting.

const require = createRequire(import.meta.url);
// Resolve from gitnexus' own scope — onnxruntime-common is a direct
// dependency, so this succeeds under npm and pnpm-strict alike.
const redirectUrl = pathToFileURL(require.resolve('onnxruntime-common')).href;

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[P2 · code-read · Codex + adversarial converge] Redirect version can skew from the native binding under pnpm dlx, and the comment at :25 is inaccurate.

The comment claims the redirected copy is "version-aligned with the onnxruntime-node transformers loads." But gitnexus' npm-style overrides block (which pins onnxruntime-node) is honored only from the root manifest — so under pnpm dlx/pnpx, where gitnexus is a transitive dep, it does not apply. transformers then loads its own pinned onnxruntime-node@1.24.3 (→ wants onnxruntime-common@1.24.3) while require.resolve('onnxruntime-common') here supplies gitnexus' ^1.26.0.

Not a current bug — it works because the Tensor feed surface is stable/duck-typed across 1.24–1.26 (verified). But the comment's rationale is wrong, and the skew is latent on every onnxruntime-common bump.

  • Confirmed defect: correct the :25 comment.
  • Recommended hardening: resolve the redirect target from onnxruntime-node's own scope (the version the native binding actually expects), falling back to gitnexus' copy.

// (npm / hoisted pnpm); only substitute ours when resolution fails.
try {
return nextResolve(specifier, context);
} catch {

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[P3 · code-read · Codex + correctness + adversarial] catch {} is too broad.

This swallows every error from nextResolve for the onnxruntime-common specifier — not just ERR_MODULE_NOT_FOUND — and redirects. A genuinely present-but-broken install (e.g. ERR_PACKAGE_PATH_NOT_EXPORTED, a corrupt exports map) would be silently papered over with gitnexus' copy instead of surfaced.

Narrow it:

} catch (err) {
  if (err?.code === 'ERR_MODULE_NOT_FOUND' || err?.code === 'ERR_PACKAGE_PATH_NOT_EXPORTED') {
    return { url: redirectUrl, shortCircuit: true };
  }
  throw err;
}

magyargergo and others added 7 commits June 10, 2026 11:30
…abhigyanpatwari#307)

Prefer the onnxruntime-common that onnxruntime-node (the native binding
transformers actually loads) depends on, so the redirected copy is version-
matched to that binding even under `pnpm dlx` — where gitnexus' npm-style
`overrides` block does not apply, because it is honoured only from a root
manifest and gitnexus is a transitive dependency there. The walk resolves
transformers' main entry (not its `exports`-blocked package.json) ->
onnxruntime-node -> its onnxruntime-common, and falls back to gitnexus' own
direct dependency when the chain can't be walked. Also corrects the doc comment
that claimed the gitnexus copy was already "version-aligned".

Addresses a PR abhigyanpatwari#2139 tri-review finding (P2).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…sence errors (abhigyanpatwari#307)

The resolve closure's `catch` swallowed every error from `nextResolve` and
redirected, which would silently paper over a genuinely present-but-broken
onnxruntime-common install. Only substitute gitnexus' copy when the specifier is
actually absent (ERR_MODULE_NOT_FOUND, or ERR_PACKAGE_PATH_NOT_EXPORTED for an
exports-broken copy); rethrow anything else. Adds a test that an unrelated error
code rethrows.

Addresses a PR abhigyanpatwari#2139 tri-review finding (P3).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…wallow path (abhigyanpatwari#307)

The outer try/catch in ensureOnnxRuntimeCommonResolvable() was untested. A
throwing registerHooks spy drives it; the call must not throw (initEmbedder does
not guard the return, so a throw would break `analyze --embeddings`). The vitest
quirk that surfaced an earlier attempt applies to throwing mock factories, not a
throwing spy implementation, so this is testable cleanly.

Addresses a PR abhigyanpatwari#2139 tri-review finding (P2).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…on (abhigyanpatwari#307)

`/^file:\/\/.*onnxruntime-common/` matched the substring anywhere, so a lookalike
path (e.g. `/x/onnxruntime-common-fake/`) would pass. Require an actual
`/node_modules/onnxruntime-common/...js` segment so the assertion proves the
redirect resolves to the real package, not just a string match.

Addresses a PR abhigyanpatwari#2139 tri-review finding (P3).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ForTests seam (abhigyanpatwari#307)

The test helper reloads the resolver via vi.resetModules() + a fresh import(),
which already re-initialises the module-level one-shot `attempted` flag to false.
The __reset export it then called was therefore a no-op. Remove the test-only
export and its call; isolation now rests solely on vi.resetModules().

Addresses a PR abhigyanpatwari#2139 tri-review finding (P3).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…omment (abhigyanpatwari#307)

The doc comment claimed the hook "never affects other tools' resolution". Once
installed, `module.registerHooks` is process-global and its resolve closure runs
for every subsequent resolution — it passes them all through untouched and only
substitutes the exact `onnxruntime-common` specifier on genuine absence, at a
cost of one string comparison. Also note `registerHooks` is @experimental and
requires Node >= 22.15 (graceful no-op below that). Comment-only.

Addresses a PR abhigyanpatwari#2139 tri-review finding (P3).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@magyargergo magyargergo merged commit 7f0ab87 into abhigyanpatwari:main Jun 10, 2026
27 of 28 checks passed
@magyargergo magyargergo deleted the fix/307-pnpx branch June 10, 2026 11:00
@magyargergo magyargergo linked an issue Jun 10, 2026 that may be closed by this pull request
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

doesn't work with pnpx

1 participant