fix(lbug): resolve non-ASCII paths for KuzuDB on Windows (#1811)#1817
Conversation
KuzuDB's native C++ layer uses ANSI file APIs (fopen) on Windows. When the repo path contains CJK or other non-ASCII characters, the UTF-8 bytes from Node.js are misinterpreted as the system's Active Code Page (e.g. GBK), producing a garbled path — "Error 3: The system cannot find the path specified." Add `toNativeSafePath()` which converts non-ASCII paths to their Windows 8.3 short-name form (all-ASCII) before passing them to the native layer. Applied to both the database open path and the COPY CSV paths. No-ops on non-Windows and on all-ASCII paths. Closes #1811
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
CI Report✅ All checks passed Pipeline Status
Test Results
✅ All 9854 tests passed 3 test(s) skipped — expand for details
Code CoverageTests
📋 View full run · Generated by CI |
…1811) - Unit tests for toNativeSafePath: ASCII passthrough, non-Windows no-op, Windows short-path conversion, nonexistent-path fallback - Integration test: full initLbug + loadGraphToLbug round-trip with CJK characters in the storage path — runs on all platforms - Fix toNativeSafePath to reject cmd.exe output containing '?' chars (replacement for unrepresentable Unicode in the console code page) - Register integration test in vitest lbug-db project and cross-platform-tests.ts matrix
✨ PR AutofixFound fixable formatting / unused-import issues across 16 changed lines. Comment |
|
/autofix |
|
✅ Applied autofix and pushed a commit. (apply run) |
…age (#1811) U1+U4: toNativeSafePath now tries 8.3 short path → NTFS junction fallback → diagnostic warning. Junctions target path.dirname(p) and reconstruct the leaf. Handles EEXIST races. Registers cleanup on exit/SIGTERM/SIGINT. Orphan scan on first call removes stale junctions from prior crashes. U2: loadGraphToLbug redirects csvDir to os.tmpdir() when storagePath contains non-ASCII on Windows, avoiding non-ASCII characters in COPY FROM paths entirely. U3: All 4 createLbugDatabase call sites in pool-adapter.ts now wrap dbPath with toNativeSafePath.
- Fix lbug-non-ascii-path integration test: use CodeRelation (actual relationship table name) instead of CALLS - Add toNativeSafePath to lbug-config.js mocks in pool-wal-recovery and lbug-pool-win-fts-probe tests — pool-adapter now imports it
Reject paths containing cmd.exe metacharacters (" % | & < > ^)
before interpolating into the `for %I` short-path command.
Prevents command injection via crafted path names.
- U1: Use process.exit(0) on Windows instead of process.kill re-raise (SIGTERM forcefully kills on Windows, handlers never fire) - U2: Pass safePath to openWithLockRetry so sidecar sweep targets the path KuzuDB actually opened, not the original non-ASCII path - U3: Skip junction creation in worker threads (isMainThread guard) to prevent junction leaks from pool-adapter workers - U4: Replace existsSync with lstatSync in orphan scan to avoid 30s blocking on unreachable UNC network targets
|
Claude finished @magyargergo's task in 5m 37s —— View job PR #1817 Production-Readiness ReviewTodo List
Review BarReviewer persona: Senior production-readiness reviewer for GitNexus. Adversarial but fair. The bar is whether Windows users with non-ASCII repo paths can run Problem Being SolvedIssue #1811: Current PR State
Merge Status and MergeabilityCI Gate is currently red. The only failing check is Repository History Considered
Branch Hygiene Assessment✅ Clean. All 7 commits are scoped to the #1811 fix, tests, CI fixes, CodeQL hardening, and autofix. No unrelated domains, workflow churn, or mixed refactors. Understanding of the Change
Tests — integration test ( FindingsFinding 1 — CI Gate is red: Prettier format check failsSeverity: HIGH (blocks merge) Risk: The autofix commit ( Evidence: CI status — Recommended fix: Run Blocks merge: YES Finding 2 — CMD_UNSAFE_RE missing parentheses (functional gap, not injection)Severity: LOW Risk: CR/LF in paths is theoretical only — Windows filesystem prohibits control characters ( Evidence: Recommended fix: Add Blocks merge: NO Finding 3 — SIGTERM handler exits with code 0 on WindowsSeverity: LOW-MEDIUM Risk:
On POSIX, the existing approach ( Evidence: Recommended fix: Use Blocks merge: MAYBE — risk is real if existing CLI/server has registered its own SIGTERM handlers before the first non-ASCII path is processed. In practice, on Windows the first Finding 4 — Junction orphan scanner has a TOCTOU race windowSeverity: LOW Risk: Windows tmpdir is per-user ( Evidence: Recommended fix: The race window is narrow and the fallback is benign (Process A falls back to warning). No code change strictly required, but adding a creation-time marker (e.g., checking if the junction link itself is new rather than the target) would eliminate the race. Document the race in a comment. Blocks merge: NO Finding 5 — Worker-thread junction skip: pool paths may not be covered with 8.3 disabledSeverity: LOW (likely NO impact in practice) Risk: However, if the pool adapter is ever invoked from a worker thread context (e.g., via a worker-thread–based MCP transport in the future), and 8.3 short names are disabled, the junction fallback is skipped and the original non-ASCII path would reach LadybugDB. Evidence: Recommended fix: Add a comment at each Blocks merge: NO Finding 6 — Integration test doesn't exercise junction fallback on Windows CISeverity: LOW Risk: The integration test at The test is correctly wired into the Evidence: Recommended fix: No immediate action required. If a Windows CI runner with Blocks merge: NO Finding 7 — Unicode scan: no dangerous charactersSeverity: NONE Non-ASCII characters in Blocks merge: NO PR-Specific Assessment SectionsLane A — Windows path conversion: Logic is correct and layered. 8.3 short path first (fast, stateless), junction fallback (persistent but cleaned up), warn-and-return-original (last resort). The Lane B — LBUG DB open, sidecar, pool: Lane C — COPY CSV tempdir staging: Lane D — Tests and CI wiring: Integration test correctly placed in Lane E — Security and hygiene: CodeQL ✅, Gitleaks ✅, no hidden Unicode, no unrelated churn. Branch is clean. Back-and-Forth Avoided
Final Verdict❌ not production-readyThe single hard blocker is the failing CI Gate ( To unblock this PR:
· Branch |
|
/autofix |
|
✅ Applied autofix and pushed a commit. (apply run) |
- Use exit code 143 (SIGTERM) / 130 (SIGINT) on Windows instead of 0 so termination is not masked as success - Run Prettier to fix formatting (CI Gate blocker)
Pass the path via GITNEXUS_SP environment variable instead of interpolating it into the cmd.exe command string. The FOR loop reads %GITNEXUS_SP% from the environment, so the command text is entirely static — no user-controlled data in the shell command. Also removes CMD_UNSAFE_RE since the env var approach makes character-level sanitization unnecessary.
Summary
COPY failed for File: IO exception: Cannot open filewhen the repo path contains Chinese (or other non-ASCII) characters on WindowsKuzuDB's native C++ layer uses
CreateFileA(ANSI), notCreateFileW. UTF-8 bytes from Node.js are misinterpreted via the system's Active Code Page (e.g. GBK on Chinese Windows), producing a garbled path — "Error 3: The system cannot find the path specified."Approach
Layered workaround in
toNativeSafePath():cmd.exeforexpansion. Path passed via environment variable (%GITNEXUS_SP%) to avoid command injection (CodeQL clean).os.tmpdir()/gitnexus-junction-<hash>) topath.dirname(p), returnsjunction/basename(p).Applied at all paths into KuzuDB's native layer:
openLbugConnection()converts beforecreateLbugDatabase()normalizeCopyPath()converts before embedding in SQL;loadGraphToLbug()stages CSVs inos.tmpdir()whenstoragePathis non-ASCII on WindowscreateLbugDatabase()call sites wrappedSafety
process.exit/SIGTERM/SIGINT(correct exit codes: 143/130 on Windows)lstatSyncto avoid UNC blocking)isMainThreadguard) — use 8.3 onlysafePath(the path KuzuDB opened), not the originalFiles changed
gitnexus/src/core/lbug/lbug-config.tstoNativeSafePath(), junction lifecycle, signal handlers, orphan scangitnexus/src/core/lbug/lbug-adapter.tsnormalizeCopyPath()integration, tmpdir CSV staginggitnexus/src/core/lbug/pool-adapter.tstoNativeSafePath()at all 4createLbugDatabasesitesgitnexus/test/unit/lbug-native-safe-path.test.tsgitnexus/test/integration/lbug-non-ascii-path.test.tsgitnexus/test/unit/pool-wal-recovery.test.tstoNativeSafePathexportgitnexus/test/unit/lbug-pool-win-fts-probe.test.tstoNativeSafePathexportgitnexus/scripts/cross-platform-tests.tsgitnexus/vitest.config.tsTest plan
npx vitest run test/unit/lbug-native-safe-path.test.ts— 5 passingnpx vitest run test/integration/lbug-non-ascii-path.test.ts— 3 passing (all platforms)npx vitest run test/unit/pool-wal-recovery.test.ts test/unit/lbug-pool-win-fts-probe.test.ts— 20 passing (no regression)npx gitnexus analyzecompletes without COPY ENOENT