feat: add IncludeExtractor for C++ cross-repo include tracking (group)#1156
Conversation
|
Someone is attempting to deploy a commit to the NexusCore Team on Vercel. A member of the Team first needs to authorize it. |
|
Hi @abhigyanpatwari @magyargergo, this PR is ready for review when you have a moment. What it does: Adds Validation done:
The branch is up to date with Best regards, |
- Remove unused HEADER_GLOB constant in include-extractor.ts - Use fs.mkdtempSync for secure temp dir creation in tests (CodeQL: 'Insecure temporary file')
|
I've pushed a fix commit fcc4319 addressing the 4 CodeQL warnings (1 unused variable + 3 insecure temp file).Thanks! |
|
Claude finished @magyargergo's task in 5m 44s —— View job PR #1156 Deep Review — IncludeExtractor for C++ cross-repo include trackingReview bar for this PR
Current PR state
Understanding of the changeIssue #1142 reports that header-only interface repos show 0 cross-repo links with their consumers because the Group sync pipeline has no concept of C/C++ Findings[BLOCKER] Syntax error in manifest-extractor.ts — missing closing parenthesis
} else if (link.type === 'include') {
rows = await executor(
`MATCH (f:File) WHERE f.filePath = $contract
...
LIMIT 1`,
{ contract: link.contract }, // ← closing ) is missing here
} else if (link.type === 'custom') {ESLint reports
[BLOCKER] CI Gate is failing — not production-ready in current state
[HIGH] Path lowercasing can cause collisions on case-sensitive Linux filesystems
return raw.replace(/\\/g, '/').replace(/^\.\//, '').replace(/\/+/g, '/').toLowerCase();This means
[HIGH] suffixResolve ambiguity can silently suppress true cross-repo includes
[MEDIUM] Regex fallback matches
|
| Test area | Coverage |
|---|---|
Provider header registration (.h, .hpp) |
✅ |
.cpp not registered as provider |
✅ |
| Unresolved includes → consumer contracts | ✅ |
| Local include suppression | ✅ |
| Angle-bracket includes filtered | ✅ |
| System headers filtered | ✅ |
| System path prefixes filtered | ✅ |
| Provider/consumer ID round-trip matching | ✅ |
normalizeContractId for include (backslash, casing, leading ./, double slashes`) |
✅ |
| Deduplication (same include in multiple files) | ✅ (though semantics allow N consumer contracts) |
| Block comment false positive (regex fallback) | ❌ Missing |
Ambiguous suffix resolution (local view.h suppresses cross-repo map/base/view.h) |
❌ Missing |
Macro includes (#include MACRO) |
❌ Missing |
Case collision (Foo.h vs foo.h) |
❌ Missing |
detect.includes: false disables extraction |
❌ Missing |
| Group sync integration (syncGroup → crossLinks) | ❌ Missing |
| regex fallback path (via large file or forced parse failure) | ❌ Missing (tree-sitter path always succeeds on small test files) |
The test for deduplication (lines 202–214) has a semantic confusion: the test comment says "Both files include ext/api.h — each should produce a separate consumer contract (different symbolRef.filePath)" with toHaveLength(2). This is intentional but contradicts the section title "deduplication". The actual dedup behavior (same file including the same header twice) is not tested.
Also: expect(consumers[0].confidence).toBe(0.85) — DoD §2.7 warns against toBeGreaterThanOrEqual but also cautions that exact assertions should be meaningful. A hardcoded 0.85 confidence is a magic number; its business meaning is not tested.
Performance assessment
Default-on scanning: Non-C++ repos still trigger IncludeExtractor.extract() because canExtract() always returns true. Each repo pays for 2 glob calls. For a group with 20 repos, this is 40 extra filesystem scans per sync even if none are C++ repos. Acceptable at current scale, but detect.includes: false provides an escape hatch.
Parser reuse: new Parser() is created once per extractConsumers() call, shared across all files. Parser.Query is cached per language. ✅
Suffix index: Built once in extract(), reused across all source files. O(N_files) construction. ✅
Regex fallback performance: INCLUDE_REGEX is stateful (g flag), correctly reset with lastIndex = 0. Each fallback call scans the file linearly. For a 500KB file (large C++ source), this is fast. ✅
Memory: All files are read into memory via readSafe() before parsing. A repo with 200 large .cpp files could transiently hold significant memory if tree-sitter fails on all of them. For typical repos this is acceptable.
GHAS / hidden Unicode / hygiene assessment
GHAS findings: The fcc4319 commit addresses all 4 GHAS findings: HEADER_GLOB unused variable removed, and insecure fs.mkdtempSync in tests replaced with crypto-random paths. CodeQL check is now ✅ green.
Hidden Unicode: No non-ASCII characters found in include-extractor.ts or include-extractor.test.ts. Bidi controls absent. ✅
Lint errors introduced by this PR: The ESLint error (not warning) at manifest-extractor.ts:283 was introduced by this PR's include branch addition. The file's other warnings are pre-existing.
Scope: The diff is appropriately scoped. No unrelated reformatting. sync.ts additions are minimal and proportionate. ✅
Elegance / maintainability assessment
The overall design is clean and follows the existing extractor pattern (ContractExtractor interface). The dual strategy (graph-assisted providers → filesystem fallback) mirrors other extractors in the pipeline. The SOURCE_GLOB constant cleanly bounds what gets scanned. The SYSTEM_HEADERS set is comprehensive and readable.
One design concern: Provider graph strategy (extractProvidersGraph) uses a regex '.*\\.(h|hpp|hxx|hh)$' that likely won't work on Linux (backslash in a regex matched against forward-slash paths). On Linux, file paths in the graph would look like /repo/path/include/foo.h — the Cypher regex '.*\\.(h|hpp|hxx|hh)$' with escaped backslash-dot would work, but using \\. in a JavaScript template literal inside a Cypher string needs careful escaping. In practice extractProvidersGraph returns 0 results and falls through to the filesystem fallback, so this is latent rather than active.
Future extensibility: The extractor follows the ContractExtractor interface cleanly. Adding support for -I include directories or conditional compilation blocks would be natural extensions without requiring architectural changes. ✅
Final verdict
❌ Not production-ready
The PR introduces a syntax error in manifest-extractor.ts (missing closing parenthesis in the include branch of resolveSymbol) that causes ESLint to report a parse error, prevents Prettier from formatting the file, and fails TypeScript compilation. This cascades to all test, typecheck, and CI gate jobs failing. A PR that cannot compile is not production-ready by any reasonable bar.
Beyond the syntax error, there are two high-priority design issues that stop short of blocking on their own but together suggest the implementation needs one more iteration: (1) suffixResolve ambiguity can silently suppress true cross-repo includes when a local file shares a suffix with the external header — a systematic false negative risk in large C++ repos; and (2) the regex fallback can produce false positive contracts from #include directives inside block comments. Neither has a test. The missing integration test for the syncGroup → matchingn → CrossLink → writeBridge path means the end-to-end contract matching and bridge generation code is untested in CI.
Required before merge: Fix the syntax error in manifest-extractor.ts:283. Once CI is green, the remaining issues (suffix ambiguity test, regex fallback comment test, source metadata, integration test) should be addressed in a follow-up commit or treated as required follow-ups in this PR depending on the team's bar.
The 'include' branch in ManifestExtractor.resolveSymbol was missing the closing ); for the executor() call, causing a syntax error that broke ESLint, Prettier, and the full test CI on all platforms. Reported by Claude PR review on abhigyanpatwari#1156.
Adding 'include' pushed the array over prettier's 100-char limit, so prettier prefers multi-line. Apply the reformat to unbreak ci-quality/format job.
… findings abhigyanpatwari#3-abhigyanpatwari#7 Claude Deep Review raised 7 findings on the IncludeExtractor. #1/abhigyanpatwari#2 (BLOCKERs) were fixed earlier. This commit closes the remaining five. abhigyanpatwari#3 HIGH case-sensitive FS -> provider contract-id collision Document the deliberate case-folding trade-off on normalizeIncludePath (matches C/C++ convention on Windows/macOS; collapses Foo.h & foo.h on Linux). Add a unit test pinning the behavior. abhigyanpatwari#4 HIGH suffixResolve short-suffix match silently drops cross-repo include When a local file ends with the same basename as an external include (e.g. local internal/api.h vs. #include "ext/api.h"), suffixResolve returned a bogus local hit and suppressed the cross-repo consumer. Replace the suffixResolve lookup inside include-extractor with a strict isLocalInclude() that only accepts full-path hits via SuffixIndex.get / getInsensitive. Callers of suffixResolve elsewhere are unaffected. Add 3 unit tests covering the regression. abhigyanpatwari#5 MEDIUM regex fallback matched #include inside /* ... */ Strip block comments before running the fallback regex scan. Add a unit test. abhigyanpatwari#6 MEDIUM meta.source was hard-coded to 'tree_sitter' Track the actual extraction path with an extractionSource local and write it into meta.source so downstream audits can distinguish tree-sitter parses from regex fallbacks. Add 2 unit tests. abhigyanpatwari#7 MEDIUM missing end-to-end coverage Add test/integration/group/include-extractor-sync.test.ts with 3 cases exercising extractor -> syncGroup -> CrossLink (mocked contracts, mixed-case/backslash normalization, real temp repos). Tests: 21 unit + 3 integration, all green.
Review response + CI statusThanks @magyargergo @claude for the deep review. All 7 findings are addressed below, and I want to flag one CI signal that is unrelated to this PR. Claude review findings — status
Verification
CI status note:
|
| Failure | Root cause |
|---|---|
test/integration/resolvers/kotlin.test.ts (3 cases) |
Kotlin fixtures produce empty results on Windows |
test/unit/type-env.test.ts:21, test/unit/call-form.test.ts:13 |
Cannot find package 'tree-sitter-kotlin' — optionalDependencies fails to build on the Windows runner, so the subsequent static import 'tree-sitter-kotlin' throws ERR_MODULE_NOT_FOUND |
test/integration/cross-file-binding.test.ts (5 cases) |
Cascaded: the fixture relies on Kotlin resolution |
All other platforms are green:
- ✅
tests / ubuntu-latest - ✅
tests / macos-latest - ✅
quality(prettier / eslint / tsc / tsc-web / workflow-convention) - ✅
e2e - ✅
scope-parity
Happy to iterate further — let me know if any of the above needs adjustment. 🙏
CI Report✅ All checks passed Pipeline Status
Test Results
✅ All 8343 tests passed 1 test(s) skipped — expand for details
Code CoverageTests
📋 View full run · Generated by CI |
LadybugDB's `new Database()` raises `Could not set lock on file` from local_file_system.cpp synchronously inside the constructor — before any query is issued, so `withLbugDb`'s query-time retry never sees it. On Windows CI this surfaces as flaky integration tests due to AV-scanner holds, libuv handle-release lag, and stale `.wal` sidecars from aborted prior runs. This change closes the gap at *open time*: - `openLbugConnection` now wraps `new lbug.Database()` in a bounded busy-retry (5x100ms back-off) inside `lbug-config.ts`. Errors that exhaust the budget are tagged via `LBUG_OPEN_RETRY_EXHAUSTED` so `withLbugDb`'s outer 3x retry skips re-retrying a freshly-exhausted path (eliminates the 3x5=15-attempt / ~6s tail latency). - For recognized test fixtures only (immediate-parent dir matches a known prefix AND resolves under `os.tmpdir()`), one final stale- sidecar sweep removes `.wal`/`.lock` and retries once. Production paths never enter this branch. - `safeClose` on Windows runs a bounded `fs.open` probe to absorb native handle-release lag; logs a warning if the probe exhausts so operators can spot AV interference. - `isDbBusyError` is now defined in `lbug-config.ts` as the single source of truth, re-exported from `lbug-adapter.ts` for compatibility. - New tests cover open-time retry (happy/retry/exhaust/non-busy/tag), stale-sidecar sweep (test-fixture-only, production-rejection, preserves-original-error), `isTestFixturePath` direct unit suite (accept/reject/traversal/nested/trailing-sep), and `waitForWindowsHandleRelease` (openable/ENOENT/no-leak). - The two new test files are added to vitest's existing serialized `lbug-db` project (already `fileParallelism: false`). Closes the chronic Windows CI flake on lbug-touching integration tests while preserving the existing single-writable-Database-per-process LadybugDB contract. No public API surface changed. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
… directly The re-export from lbug-adapter.ts was a transitional convenience — with the matcher now living in lbug-config.ts, having two import paths for the same symbol invites future drift. Updated the two real consumers (lbug-lock-retry.test.ts, lbug-open-retry.test.ts) to import from lbug-config directly, removed the re-export equality test (now vacuous), and refreshed the explanatory comment so it no longer references a re-export pattern that doesn't exist. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
… on Windows doInitLbug logs "⚠️ Schema creation warning: ... Could not set lock on file" on every CREATE NODE TABLE call after the first init on a given dbPath, on Windows. The lock is internal to LadybugDB v0.16.1 and is resolved before the table is created — same tolerance pattern as the existing "already exists" filter. Genuine cross-process lock contention still surfaces on the next operation through withLbugDb's retry, so filtering at the schema-init catch only suppresses noise, not signal. Also extend the safeClose Windows handle-release probe to cover the .wal sidecar (the previous Database's WAL handle was the slowest to release, surfacing as the schema-query lock contention) and switch the probe back to 'r+' so it actually detects exclusive locks. Test loop in lbug-close-handle-release.test.ts simplified to 10 plain iterations now that the underlying noise is filtered upstream. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- Drop redundant `could not set lock` term — already subsumed by `lock`.
- Document the intentionally-broad matcher: graph-DB lock-shaped errors
("deadlock", "unlock failed", "lock contention", "could not open lock
file") are all treated as transient. If a non-transient surfaces,
tighten the matcher rather than raise the retry budget.
- Add positive test cases covering those lock-shaped strings so the
intent is visible and a future tightening would deliberately break
these.
- Fix the open-retry back-off comment: max sleep is 100+200+300+400 =
1000ms (no sleep after the final attempt), not 1.5s.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
Claude finished @magyargergo's task in 8m 41s —— View job PR #1156 Deep Review — IncludeExtractor for C++ cross-repo include trackingReview tasks:
Review bar for this PR
Current PR state
Branch hygiene assessmentmerge-from-main commits present but harmless and merge-safe — with one caveat that warrants explanation. The 6 merge commits are standard GitHub merge-branch operations that bring upstream fixes in. The Understanding of the changeIntra-repo C/C++ include resolution in the single-repo ingestion pipeline creates This PR adds
Findings[BLOCKER] Windows CI regression —
|
| Test area | Status |
|---|---|
Provider .h/.hpp/.hxx/.hh |
✅ |
Provider ignores .cpp |
✅ |
| Consumer emits unresolved includes | ✅ |
| Local include resolution suppresses consumer | ✅ |
| System headers (quoted form) filtered | ✅ |
| System path prefixes filtered | ✅ |
| Angle-bracket filtered | ✅ |
| Provider/consumer round-trip normalization | ✅ |
normalizeContractId (backslash, case, ./, //) |
✅ |
| Suffix ambiguity fix (finding #4) | ✅ — 3 tests |
| Block comment false positive (finding #5) | ✅ |
meta.source reflects path (finding #6) |
✅ — 2 tests |
| Case-folding collision documented (finding #3) | ✅ |
| Deduplication semantics | ✅ |
syncGroup → CrossLink (finding #7) |
✅ — 3 integration tests |
detect.includes: false |
❌ Missing |
Macro includes #include HEADER |
❌ Missing |
../ relative include noise |
❌ Missing |
| Graph strategy absolute path | ❌ Missing |
The integration tests properly exercise the extraction → matching → CrossLink pipeline via extractorOverride. The real-filesystem round-trip test (third test) is the strongest signal. The test for detect.includes: false is notably absent.
GHAS / hidden Unicode assessment
GHAS: All 4 CodeQL findings (1 unused var HEADER_GLOB, 3 insecure temp file) were addressed in fcc4319. CodeQL is green on latest head. The integration test file does not have the same insecure temp file pattern — it uses fs.mkdtempSync correctly.
Hidden Unicode: … and — in comments only. No bidi controls. No impact on parser behavior, regex semantics, or contract IDs.
Elegance / maintainability assessment
The extractor follows the existing ContractExtractor interface pattern cleanly. The dual-strategy (graph → filesystem fallback) mirrors other extractors. The STANDARD_IGNORES constant is readable. The isLocalInclude function is well-documented with a reference to the review finding. stripBlockComments is a clean, focused helper. The meta.source tracking is a nice observability touch.
The extractProvidersGraph strategy has a latent bug (absolute paths) but is otherwise well-structured. The separation of extractProviders / extractConsumers / dedupe is clean.
The commit history is a bit noisy (multiple rounds of fixes) but the final state of the code is readable. Comments are on-point and reference review findings rather than being noise.
Future extensibility: adding -I include directories or selective vendor exclusion would fit naturally into the existing structure. The ContractExtractor interface is preserved correctly.
Final verdict
❌ Not production-ready
Two blockers require fixes:
-
Windows CI regression (BLOCKER): Adding
writeBridgetosync.tscauses the existing testsync.test.ts > writes registry to groupDir when skipWrite is falseto fail ontests/windows-latestwithENOTEMPTY. LadybugDB's background checkpoint thread briefly holds a Win32 lock aftercloseBridgeDb()returns; when test cleanup runsrmSyncimmediately after, Windows refuses to delete the locked file. This is a direct regression introduced by this PR — the test passed onmainbeforewriteBridgewas added. The fix is straightforward: addskipWrite: trueto that specific test (it only needs to verify the registry JSON, not the bridge) or wrap the test's cleanup in a retry/delay. -
Graph provider absolute-path bug (BLOCKER):
extractProvidersGraphreturns rows wheref.filePathis an absolute path from the DB (e.g.,/home/user/repos/interface/map/base/view.h).normalizeIncludePathdoes not strip the repo root, so provider contract IDs include the absolute path and never match consumer's repo-relative IDs. The guardif (graphProviders.length > 0) return graphProvidersprevents the correct filesystem fallback from running. For indexed C++ repos (the primary use case), this silently produces 0 cross-links. The fix is either to passrepoPathintoextractProvidersGraphand applypath.relative(), or to remove the graph strategy entirely and always use the filesystem fallback.
Beyond these two blockers, the overall design and implementation quality is high. The finding #4 fix (exact-suffix local resolution) is elegant and correct. The block-comment stripping for the regex fallback is a good defensive improvement. The three integration tests cover the key wiring. With the two fixes applied and CI green on all platforms, this PR should be in good shape for final approval.
|
I'll continue this! Thanks for your PR |
|
@magyargergo Thanks for continuing with this — much appreciated! :) |
|
@SZU-WenjieHuang would you be interested in migrating C/C++ to scope resolution? |
Addresses two blockers and two mediums from the deep review.
BLOCKER 1: Windows CI ENOTEMPTY in sync.test.ts
After this PR added writeBridge() to syncGroup, the existing test
"writes registry to groupDir when skipWrite is false" fails on
windows-latest. LadybugDB's checkpoint thread briefly outlives
closeBridgeDb, holding a Win32 lock on bridge.lbug; the test's
fs.rmSync then fails with ENOTEMPTY. Switched the test cleanup to
cleanupTempDir from test/helpers/test-db.ts which already tolerates
EBUSY/EPERM/EACCES/ENOTEMPTY with bounded retries — same pattern
used elsewhere for LadybugDB-touching tests.
BLOCKER 2: Graph provider absolute-path bug
extractProvidersGraph queried File.filePath from the LadybugDB graph
but never stripped the repo root, so provider contract IDs ended up
as include::/abs/path/foo.h while consumers emitted include::foo.h.
These never matched through runExactMatch — silently producing 0
cross-links for any indexed C++ repo (the primary use case).
Now passes repoPath into extractProvidersGraph and applies
path.relative(); rows that resolve outside repoPath (stale absolute
paths from another machine, system headers somehow indexed) are
dropped instead of polluting the registry.
MEDIUM: `../` relative includes produce spurious noise
`#include "../foo.h"` is almost always intra-repo, but the suffix
index can never match a `..`-prefixed path so it became a consumer
contract no provider could satisfy. Now skipped before matching;
covers both forward-slash and backslash forms.
MEDIUM: writeBridge error in sync.ts propagates uncaught
contracts.json is the canonical source of truth and was just written
successfully when writeBridge runs. A bridge-only failure (disk full,
schema error, permission denied) shouldn't mask the registry. Wrapped
writeBridge in try/catch with a logger.warn surfacing the path and
recovery instructions.
Tests added:
- extractProvidersGraph repo-relative ID generation (stub Cypher
executor returns absolute paths)
- extractProvidersGraph drops rows whose path resolves outside repo
- `../foo.h` forward-slash skip
- `..\foo.h` backslash-form skip
Skipped findings:
- canExtract() removal (abhigyanpatwari#5, low): canExtract is part of the
ContractExtractor interface; every other extractor implements the
same `return true` shape. Removing it from IncludeExtractor would
break the interface contract — keeping for consistency.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two HIGH findings from the Codex adversarial review on
feat/group-include-extractor:
1. Default-on extraction silently changes existing groups (BLOCKER)
DEFAULT_DETECT.includes was true, so any pre-existing group.yaml
that omits the new field would gain a wave of include::* contracts
on the next sync after upgrade. Flipped to false (opt-in). The
integration test already declares includes: true explicitly so it
survives unchanged; the unit extractor tests bypass parseGroupConfig
entirely; the sync test uses extractorOverride. Only config-parser
needed regression tests covering omitted/explicit/false variants.
2. IncludeExtractor scans outside the indexed file universe (BLOCKER)
The extractor was running glob('**/*', { ignore: STANDARD_IGNORES })
twice with a hand-rolled 9-pattern list, no .gitignore/.gitnexusignore
honoring, and no max-file-size cap. That meant File:<path> contracts
could appear for files ingestion would never index, producing
cross-links group impact cannot fan out to (silent false-negatives).
Refactored to a single discoverIndexableFiles() helper that mirrors
walkRepositoryPaths exactly: createIgnoreFilter + getMaxFileSizeBytes,
one discovery pass shared by provider and consumer paths. Dropped
STANDARD_IGNORES and SOURCE_GLOB entirely.
third_party and 3rdparty (the C/C++ vendored-deps conventions) were
in the local ignore list but not in the canonical DEFAULT_IGNORE_LIST
used by ingestion. Folded both into the canonical set rather than
keep a parallel list — the whole point of the Codex finding is that
two file-discovery implementations drift. Single source of truth.
Tests: 5 new regression tests for the discovery alignment (.gitignore,
.gitnexusignore, max-file-size on both provider and consumer paths)
plus 4 for the opt-in default. All 30 include-extractor tests + the
494-test group suite + ignore-service tests pass.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
ce-code-review surfaced 6 safe_auto findings on commit a9936a9: - T1 (testing, P2): the sync.ts:174 gate was untested with includes:false. Added a sync-level test mirroring the existing thrift-off pattern at sync.test.ts:545, asserting zero include contracts when the gate is disabled in a real syncGroup call. - T3 (testing, P3): third_party and 3rdparty entries in DEFAULT_IGNORE_LIST had no regression test. Added both to ignore-service.test.ts's dependency-directories it.each block. - M1 (maintainability, P3): discoverIndexableFiles JSDoc lacked a fork-warning relative to walkRepositoryPaths. Added a MAINTENANCE note explaining why the duplication is tolerated and the contract the two implementations must keep. - M2 (maintainability, P3): thrift-extractor still hand-rolls its ignore array with no signal that DEFAULT_IGNORE_LIST additions silently do not apply there. Added TODO(abhigyanpatwari#1156-followup) comments above both call sites. - M3 (maintainability, P3): SOURCE_EXTENSIONS duplicated the four HEADER_EXTENSIONS entries with no expressed subset relationship. Spread HEADER_EXTENSIONS into SOURCE_EXTENSIONS so future header- extension additions propagate. - C1+T4 (correctness+testing, P3, cross-reviewer corroborated): discoverIndexableFiles swallowed all fs.stat errors silently, including EACCES/EMFILE/EIO. Narrowed the catch to ENOENT (the documented benign glob/stat race) and added a logger.warn for any other code so operators can spot permission/resource issues. All 629 tests pass; typecheck + prettier clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…s EPERM `storage.ts:62` used raw `fsp.rename` for the contracts.json atomic swap. On Windows, AV scanners and concurrent renames briefly hold the destination handle between rename calls, surfacing as EPERM/EBUSY. The `insecure-tempfile.test.ts > concurrent writes do not collide` test was flaking with `EPERM: operation not permitted, rename` on windows-latest CI. `bridge-db.ts` already has a battle-tested `retryRename(src, dst, 3)` helper used at six call sites for exactly this pattern. Reusing it here keeps the Windows-rename policy single-source-of-truth across the group package. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…closes Windows lbug lock flakes on cli-e2e Brings open-time retry for , post-close handle-release probe (main file + .wal sidecar) on Windows, schema-init lock-warning filter, and serialized lbug-db vitest project. Without this, the cli-e2e tests for cypher/query/impact intermittently fail on windows-latest because the CLI's read-only lbug open hits the documented Windows lock-acquisition race.
|
Hi @magyargergo, thanks for thinking of me! I'm definitely interested in tackling both #935 and #936. My plan would be:
Looking forward to contributing! |
|
Claude finished @magyargergo's task in 6m 48s —— View job PR #1156 Deep Review — IncludeExtractor for C++ cross-repo include trackingReview tasks:
Review bar for this PR
Current PR state
CI coverage note: The last public CI report covers commit Branch hygiene assessmentVerdict: merge-from-main commits present but harmless and merge-safe. Seven Understanding of the changeIntra-repo C/C++ include resolution in the single-repo ingestion pipeline creates This PR adds
Key differences from Findings[MEDIUM] Macro include
|
| Test area | Status |
|---|---|
Provider .h/.hpp/.hxx/.hh |
✅ |
Provider ignores .cpp |
✅ |
| Consumer emits unresolved includes | ✅ |
| Local include (exact full-path) suppression | ✅ |
| System headers (quoted) filtered | ✅ |
| System path prefix filtered | ✅ |
| Angle-bracket filtered | ✅ |
| Provider/consumer round-trip normalization | ✅ |
normalizeContractId (backslash, case, ./, //) |
✅ |
| Suffix ambiguity — cross-repo not suppressed by shorter local suffix | ✅ — 3 tests |
| Block comment false positive (regex fallback) | ✅ |
meta.source reflects path |
✅ — 2 tests |
| Case-folding collision documented | ✅ |
../ forward-slash and backslash skip |
✅ — 2 tests |
| Graph provider absolute-path stripping | ✅ — 2 tests |
File discovery honors .gitignore |
✅ |
File discovery honors .gitnexusignore |
✅ |
File discovery honors GITNEXUS_MAX_FILE_SIZE (provider + consumer) |
✅ — 2 tests |
detect.includes: false gate |
✅ |
syncGroup → CrossLink integration |
✅ — 3 integration tests |
normalizeContractId config parser tests |
✅ |
Macro includes #include HEADER_NAME |
❌ Missing |
.inc/.ipp/.tpp/.inl explicitly out of scope |
✅ by omission (not in SOURCE_EXTENSIONS) |
GitHub Advanced Security / hidden Unicode assessment
GHAS/CodeQL: All 4 CodeQL findings were resolved in fcc4319 (HEADER_GLOB unused variable removed, 3 insecure temp file patterns replaced with fs.mkdtempSync). CodeQL is ✅ green on latest head. The integration test file uses fs.mkdtempSync correctly — no new insecure temp file pattern introduced. ✓
Hidden Unicode: No bidi controls (\u202a–\u202e, \u2066–\u2069) in any changed file. … (U+2026 HORIZONTAL ELLIPSIS) and — (U+2014 EM DASH) appear in code comments only — not in executable code, regex literals, contract IDs, or test fixture strings. These are conventional typographic characters in prose comments and do not affect runtime behavior. Acceptable under the repo's existing non-ASCII-in-comments tolerance. ✓
Elegance / maintainability assessment
The extractor follows the ContractExtractor interface cleanly and consistently with HttpRouteExtractor, GrpcExtractor, ThriftExtractor, and TopicExtractor. The dual strategy (graph-assisted providers → filesystem fallback) matches the existing pattern. discoverIndexableFiles is well-commented with a MAINTENANCE note explaining why duplication with walkRepositoryPaths is tolerated and what the two implementations must agree on.
SOURCE_EXTENSIONS = new Set([...HEADER_EXTENSIONS, ...]) — the spread is elegant: a future contributor adding .hxx to HEADER_EXTENSIONS gets it in SOURCE_EXTENSIONS automatically.
stripBlockComments and isLocalInclude are small, focused helpers with inline references to the review findings that motivated them. Comments explain non-obvious semantics without being noise.
One design note: The extractProvidersGraph method has a complex guard chain (rel.startsWith('..') || path.isAbsolute(rel)) that exists only because the DB stores absolute paths. If extractProvidersGraph ever becomes the primary path (currently it's always overridden by the filesystem fallback for non-indexed repos), the regex '.*\\\\.(h|hpp|hxx|hh)$' embedded in the Cypher query string should be verified on both Linux and Windows DB paths. The current double-backslash escaping in the JS template literal produces the correct Cypher regex .*\\.(h|...)$ (escaped dot), but this is subtle and worth a unit test if the graph path becomes primary.
Future extensibility: adding -I include directory awareness, conditional compilation branch filtering, or .inc/.ipp support would fit naturally without architectural changes. ✓
Final verdict
✅ production-ready with minor follow-ups
All previous blockers from the earlier review rounds are resolved:
- Windows CI ENOTEMPTY fixed via
cleanupTempDirin sync test (fadbb32). - Graph provider absolute-path bug fixed via
path.relative()stripping (fadbb32). - Syntax error in manifest-extractor.ts fixed (
3f5d21c). - CodeQL findings fixed (
fcc4319). suffixResolveambiguity replaced with exact full-path matching (298c067).- Block comment false positive in regex fallback fixed (
298c067). meta.sourcehardcoding fixed (298c067).detect.includes: falsedefault (opt-in) corrected (a9936a9).- File discovery aligned with ingestion pipeline (
a9936a9).
The remaining issues are all LOW or MEDIUM in severity. The macro include false positive (orphaned consumer contracts) adds registry noise but never produces false cross-links — it is the single actionable follow-up worth addressing before a public release. The detect.includes: false default means the feature is opt-in, so existing users are not affected until they explicitly enable it. With a fresh CI run confirming the Windows runner is clean on the latest commits, this PR meets the DoD for an initial release of the IncludeExtractor feature.
Recommended follow-up (not blocking merge): Add a one-line guard in extractConsumers to skip macro-style includes (if (!/[./\\]/.test(cleaned)) continue;) and a covering unit test.
Tree-sitter's `(_) @import.source` wildcard matches the identifier node of `#include PLATFORM_HEADER`, so the cleaned value `PLATFORM_HEADER` slipped past the system-header / `..` filters and was emitted as a permanently orphaned consumer contract (no file is named after a macro identifier, so no provider can ever match). Add a shape guard that skips cleaned values lacking both a path separator and an extension dot, plus regression tests for single and multi-macro files. Also document `IncludeExtractor.canExtract()` as unused by sync.ts (gated via `config.detect.includes` instead) and kept solely for ContractExtractor interface uniformity. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
@SZU-WenjieHuang fyi, for backward compatibilty this feature is turned off in the defaul config. |
|
@magyargergo Got it. Thanks for the clarification:) |
Summary
Closes #1142
Add cross-repo
#includedependency tracking to the Group feature, enabling impact analysis across C/C++ repositories that share header files (e.g., an InterfaceMap repo providing.hheaders consumed by a VMap repo).What changed
New:
IncludeExtractor(gitnexus/src/core/group/extractors/include-extractor.ts).h/.hpp/.hxx/.hhfile in a repo as a provider contract (include::<relative-path>)#include "…"directives via tree-sitter, resolves against the repo's own file list (reusingsuffixResolvefrom the ingestion pipeline), and emits unresolved paths as consumer contractsparser.parse()throwsInvalid argument), falls back to regex extraction (/^[ \t]*#\s*include\s*"([^"]+)"/gm) so large files are not silently skippedIntegration into Group sync (
sync.ts)IncludeExtractorruns whenconfig.detect.includesis enabled (default:true)writeBridge()is now called afterwriteContractRegistry()to generatebridge.lbugfor cross-repo impact queriesType system & matching (
types.ts,config-parser.ts,matching.ts,manifest-extractor.ts)'include'toContractTypeunion andDetectConfiginclude::normalization innormalizeContractIdincludetype graph-lookup branch inManifestExtractorKey design decisions
buildSuffixIndex/suffixResolvefrom the single-repo ingestion pipeline to distinguish local includes from cross-repo ones — only unresolved includes become consumer contractsinclude::map/base/dice_map_view.h— path-normalized, lowercase, matched viarunExactMatchValidation (on two of my real repos)
Test plan
npx tsc --noEmit)include-extractor.test.ts, 262 lines)gitnexus group syncon real C++ repos produces 100% header coverage vs independentgrepscanimpact({repo: "@group", target: "header.h"})returns non-emptycrossfield with correct downstream VMap files