Skip to content

fix: probe CUDA availability in MCP embedder before requesting it#534

Closed
davidbeesley wants to merge 2 commits into
abhigyanpatwari:mainfrom
davidbeesley:fix/mcp-cuda-probe
Closed

fix: probe CUDA availability in MCP embedder before requesting it#534
davidbeesley wants to merge 2 commits into
abhigyanpatwari:mainfrom
davidbeesley:fix/mcp-cuda-probe

Conversation

@davidbeesley

Copy link
Copy Markdown

Summary

Add CUDA availability probe to the MCP embedder so semantic search works on systems without CUDA.

Motivation / context

The MCP embedder blindly attempted CUDA initialization on Linux. When CUDA libraries were absent, the failed attempt poisoned ONNX Runtime's native state, causing the CPU fallback to fail with the same CUDA error. This silently disabled all semantic vector search — the catch in semanticSearch() swallowed the error and returned [], so queries fell back to BM25-only with no visible indication.

The core embedder (src/core/embeddings/embedder.ts) already had isCudaAvailable() / hasOrtCudaProvider() guards that probe for CUDA libraries before requesting them. The MCP embedder was missing these guards.

Areas touched

  • gitnexus/ (CLI / core / MCP server)
  • gitnexus-web/ (Vite / React UI)
  • .github/ (workflows, actions)
  • eval/ or other tooling
  • Docs / agent config only (AGENTS.md, CLAUDE.md, .cursor/, llms.txt, etc.)

Scope & constraints

In scope

  • Port hasOrtCudaProvider() and isCudaAvailable() from the core embedder to the MCP embedder
  • Skip CUDA attempt entirely when system lacks required libraries
  • Affects both MCP server and CLI query command (both use the same MCP embedder)

Explicitly out of scope / not done here

  • Refactoring to share the CUDA probe between core and MCP embedders

Implementation notes

  • hasOrtCudaProvider() resolves onnxruntime-node from transformers.js's module scope and checks for libonnxruntime_providers_cuda.so
  • isCudaAvailable() additionally probes ldconfig -p and CUDA_PATH/LD_LIBRARY_PATH for libcublasLt.so.12
  • Device list construction now skips CUDA when probe returns false, going straight to CPU

Testing & verification

  • cd gitnexus && npx tsc --noEmit
  • cd gitnexus && npm test — 128 files, 4457 tests passed
  • Manual: npx gitnexus query "retry with exponential backoff" -r GitNexus now prints "Embedding model loaded (cpu)" and returns semantic results (previously returned BM25-only)

Risk & rollout

  • No behavioral change on systems with CUDA — the probe returns true and CUDA is attempted as before
  • On systems without CUDA, semantic search now works instead of silently failing
  • No migration or config changes needed

Checklist

  • PR body meets repo minimum length
  • If AGENTS.md / overlays changed: N/A
  • No secrets, tokens, or machine-specific paths committed

@vercel

vercel Bot commented Mar 27, 2026

Copy link
Copy Markdown

@davidbeesley is attempting to deploy a commit to the NexusCore Team on Vercel.

A member of the Team first needs to authorize it.

@magyargergo

Copy link
Copy Markdown
Collaborator

@claude could this be supported on Windows and MacOS as well?

@github-actions

github-actions Bot commented Mar 27, 2026

Copy link
Copy Markdown
Contributor

Claude finished @magyargergo's task in 56s —— View job


Cross-platform support analysis

Short answer: Windows is already handled; macOS works correctly (CPU-only, by design).

Here's the breakdown:


Windows ✅ Already supported

The MCP embedder already has explicit Windows handling at line 93–94:

const isWindows = process.platform === 'win32';
const gpuDevice = isWindows ? 'dml' : (isCudaAvailable() ? 'cuda' : 'cpu');

On Windows, DirectML (dml) is used for GPU acceleration (via DirectX 12) — isCudaAvailable() is never called. If DML fails, it falls back to CPU. This mirrors exactly how the core embedder (src/core/embeddings/embedder.ts) handles Windows.

macOS ✅ Already works correctly (CPU-only)

The probe functions are Linux-specific by design:

  • hasOrtCudaProvider() checks for libonnxruntime_providers_cuda.so — a Linux shared object. On macOS this path doesn't exist, so it returns false.
  • isCudaAvailable() calls ldconfig -p (a Linux tool that doesn't exist on macOS), which throws and is caught. It then checks LD_LIBRARY_PATH (also unused on macOS where DYLD_LIBRARY_PATH is the equivalent).

The result: on macOS, isCudaAvailable() safely returns false → device falls back to CPU. This is correct — onnxruntime-node's CUDA provider only ships binaries for Linux/x64.

What's missing: macOS GPU acceleration (Metal/MPS)

Neither this PR nor the existing core embedder attempts Metal Performance Shaders on macOS. The onnxruntime-node package doesn't currently expose an MPS backend in a way that's safely probed the same way CUDA is. This is a pre-existing limitation, and explicitly out of scope for this PR (which is specifically fixing the CUDA-poisoning-ORT-state bug on Linux).


Summary:

Platform GPU Path Probe Needed? Status
Linux CUDA Yes — ORT state can be poisoned ✅ Fixed by this PR
Windows DirectML Not critical (DML is stable) ✅ Already handled
macOS None (CPU only) N/A ✅ Works correctly

No changes are needed in this PR for Windows or macOS support. A future enhancement could add a safe hasDmlProvider() probe for Windows (to mirror the CUDA pattern), and MPS support on macOS would require upstream onnxruntime-node changes.

@magyargergo

Copy link
Copy Markdown
Collaborator

@claude check you previous comment and I meant the ORT because all the checks are performed on linux kernel.

@github-actions

github-actions Bot commented Mar 27, 2026

Copy link
Copy Markdown
Contributor

Claude finished @magyargergo's task in 1m 4s —— View job


Corrected cross-platform ORT analysis

You're right to call that out. Here's the more precise picture:


hasOrtCudaProvider() is hardcoded to Linux

The key line (MCP embedder L29, core embedder L45):

return existsSync(join(ortPath, 'bin', 'napi-v6', 'linux', arch, 'libonnxruntime_providers_cuda.so'));

The linux segment is hardcoded. On Windows or macOS this path never exists, so hasOrtCudaProvider() always returns false — which makes isCudaAvailable() return false immediately as well. Neither probe does any real work on non-Linux.

What actually happens per platform

Platform CUDA probe called? DML probe? Net behavior
Linux Yes — ldconfig -p + LD_LIBRARY_PATH scan N/A ✅ Fixed by this PR
Windows No — isWindows check skips it entirely ❌ None DML is attempted blindly; falls back to CPU on failure
macOS hasOrtCudaProvider() returns false immediately (hardcoded linux path) N/A Falls straight to CPU — correct, but via a Linux-only guard

The real gap: Windows has no hasDmlProvider() probe

On Windows, dml is attempted unconditionally. If ORT's DML initialization fails, the code catches the error and retries with CPU (L95–127 in the MCP embedder). The question is whether a failed DML attempt poisons ORT's native state the same way a failed CUDA attempt does on Linux.

The original CUDA bug was specifically that onnxruntime-node makes an uncatchable native crash when CUDA libs are missing (per the core embedder comments, L144). DML may fail more gracefully (a catchable JS exception) since DirectX 12 is a standard Windows component — but this hasn't been explicitly verified.

Summary

My previous comment was correct in outcome (Windows and macOS produce working embeddings) but imprecise in reasoning. The probe functions are Linux-only by construction. The Windows DML path relies on ORT's own error handling rather than a pre-flight check. If Windows DML failures turn out to exhibit the same native state-poisoning as Linux CUDA failures, a hasDmlProvider() guard would be the right follow-up — but that's out of scope for this PR, which is narrowly fixing the Linux CUDA bug.


@github-actions

Copy link
Copy Markdown
Contributor

CI Report

All checks passed

Pipeline Status

Stage Status Details
✅ Typecheck success tsc --noEmit
✅ Tests success unit tests, 3 platforms
✅ E2E success gitnexus-web changes only

Test Results

Tests Passed Failed Skipped Duration
4669 4668 0 1 184s

✅ All 4668 tests passed

1 test(s) skipped — expand for details
  • buildTypeEnv > known limitations (documented skip tests) > Ruby block parameter: users.each { |user| } — closure param inference, different feature

Code Coverage

Tests

Metric Coverage Covered Base Delta Status
Statements 71.99% 12770/17738 72.24% 📉 -0.2 🔴 ██████████████░░░░░░
Branches 61.58% 8472/13756 62.16% 📉 -0.6 🔴 ████████████░░░░░░░░
Functions 76.09% 1114/1464 75.86% 📈 +0.2 🟢 ███████████████░░░░░
Lines 74.44% 11439/15365 74.72% 📉 -0.3 🔴 ██████████████░░░░░░

📋 View full run · Generated by CI

@magyargergo

Copy link
Copy Markdown
Collaborator

⚠️ Upcoming Prettier formatting — rebase instructions

PR #563 adds Prettier as the code formatter for the repo. When it merges, the bulk format commit will touch ~350 files (style-only: whitespace, quotes, trailing commas). Your branch will likely conflict.

After #563 merges, rebase your branch:

git fetch origin
git checkout <your-branch>
git rebase origin/main

# Conflicts will be formatting-only — accept your version:
git checkout --theirs .
git add .
git rebase --continue

# Then re-format your branch to match the new style:
npx prettier --write .
git add -A
git commit -m "style: apply prettier formatting"
git push --force-with-lease

New setup step: Run npm install at the repo root (not just in gitnexus/) to get prettier + activate the pre-commit hook. The hook auto-formats staged files on every commit going forward.

@magyargergo

Copy link
Copy Markdown
Collaborator

Please submit a new PR if this is still relevant

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants