fix: probe CUDA availability in MCP embedder before requesting it#534
fix: probe CUDA availability in MCP embedder before requesting it#534davidbeesley wants to merge 2 commits into
Conversation
|
@davidbeesley is attempting to deploy a commit to the NexusCore Team on Vercel. A member of the Team first needs to authorize it. |
|
@claude could this be supported on Windows and MacOS as well? |
|
Claude finished @magyargergo's task in 56s —— View job Cross-platform support analysisShort answer: Windows is already handled; macOS works correctly (CPU-only, by design). Here's the breakdown: Windows ✅ Already supportedThe MCP embedder already has explicit Windows handling at line 93–94: const isWindows = process.platform === 'win32';
const gpuDevice = isWindows ? 'dml' : (isCudaAvailable() ? 'cuda' : 'cpu');On Windows, DirectML ( macOS ✅ Already works correctly (CPU-only)The probe functions are Linux-specific by design:
The result: on macOS, What's missing: macOS GPU acceleration (Metal/MPS)Neither this PR nor the existing core embedder attempts Metal Performance Shaders on macOS. The Summary:
No changes are needed in this PR for Windows or macOS support. A future enhancement could add a safe |
|
@claude check you previous comment and I meant the ORT because all the checks are performed on linux kernel. |
|
Claude finished @magyargergo's task in 1m 4s —— View job Corrected cross-platform ORT analysisYou're right to call that out. Here's the more precise picture:
|
| Platform | CUDA probe called? | DML probe? | Net behavior |
|---|---|---|---|
| Linux | Yes — ldconfig -p + LD_LIBRARY_PATH scan |
N/A | ✅ Fixed by this PR |
| Windows | No — isWindows check skips it entirely |
❌ None | DML is attempted blindly; falls back to CPU on failure |
| macOS | hasOrtCudaProvider() returns false immediately (hardcoded linux path) |
N/A | Falls straight to CPU — correct, but via a Linux-only guard |
The real gap: Windows has no hasDmlProvider() probe
On Windows, dml is attempted unconditionally. If ORT's DML initialization fails, the code catches the error and retries with CPU (L95–127 in the MCP embedder). The question is whether a failed DML attempt poisons ORT's native state the same way a failed CUDA attempt does on Linux.
The original CUDA bug was specifically that onnxruntime-node makes an uncatchable native crash when CUDA libs are missing (per the core embedder comments, L144). DML may fail more gracefully (a catchable JS exception) since DirectX 12 is a standard Windows component — but this hasn't been explicitly verified.
Summary
My previous comment was correct in outcome (Windows and macOS produce working embeddings) but imprecise in reasoning. The probe functions are Linux-only by construction. The Windows DML path relies on ORT's own error handling rather than a pre-flight check. If Windows DML failures turn out to exhibit the same native state-poisoning as Linux CUDA failures, a hasDmlProvider() guard would be the right follow-up — but that's out of scope for this PR, which is narrowly fixing the Linux CUDA bug.
CI Report✅ All checks passed Pipeline Status
Test Results
✅ All 4668 tests passed 1 test(s) skipped — expand for details
Code CoverageTests
📋 View full run · Generated by CI |
|
|
Please submit a new PR if this is still relevant |
Summary
Add CUDA availability probe to the MCP embedder so semantic search works on systems without CUDA.
Motivation / context
The MCP embedder blindly attempted CUDA initialization on Linux. When CUDA libraries were absent, the failed attempt poisoned ONNX Runtime's native state, causing the CPU fallback to fail with the same CUDA error. This silently disabled all semantic vector search — the
catchinsemanticSearch()swallowed the error and returned[], so queries fell back to BM25-only with no visible indication.The core embedder (
src/core/embeddings/embedder.ts) already hadisCudaAvailable()/hasOrtCudaProvider()guards that probe for CUDA libraries before requesting them. The MCP embedder was missing these guards.Areas touched
gitnexus/(CLI / core / MCP server)gitnexus-web/(Vite / React UI).github/(workflows, actions)eval/or other toolingAGENTS.md,CLAUDE.md,.cursor/,llms.txt, etc.)Scope & constraints
In scope
hasOrtCudaProvider()andisCudaAvailable()from the core embedder to the MCP embedderquerycommand (both use the same MCP embedder)Explicitly out of scope / not done here
Implementation notes
hasOrtCudaProvider()resolvesonnxruntime-nodefrom transformers.js's module scope and checks forlibonnxruntime_providers_cuda.soisCudaAvailable()additionally probesldconfig -pandCUDA_PATH/LD_LIBRARY_PATHforlibcublasLt.so.12Testing & verification
cd gitnexus && npx tsc --noEmitcd gitnexus && npm test— 128 files, 4457 tests passednpx gitnexus query "retry with exponential backoff" -r GitNexusnow prints "Embedding model loaded (cpu)" and returns semantic results (previously returned BM25-only)Risk & rollout
Checklist
AGENTS.md/ overlays changed: N/A