Skip to content

style: improve parse-worker JSDoc for isLanguageAvailable#301

Closed
jecanore wants to merge 6 commits into
abhigyanpatwari:mainfrom
jecanore:fix/ignore-and-unsupported-lang
Closed

style: improve parse-worker JSDoc for isLanguageAvailable#301
jecanore wants to merge 6 commits into
abhigyanpatwari:mainfrom
jecanore:fix/ignore-and-unsupported-lang

Conversation

@jecanore

@jecanore jecanore commented Mar 16, 2026

Copy link
Copy Markdown
Contributor

Summary

Minor JSDoc improvement for the isLanguageAvailable worker function — clarifies that the filePath parameter handles .tsx distinction at the worker level, while the main-thread version does this inside loadLanguage() instead.

Context

This PR was originally fix: add .gitnexusignore support and complete unsupported language handling. After @magyargergo requested consolidation with #231, all substantive changes (ignore infrastructure + language resilience) were ported into #231 by @ivkond in 505fc9b.

This branch has been rebased on top of #231 HEAD (fdbeb0b) and carries only the remaining JSDoc tweak.

Changes

File Change
parse-worker.ts Improved JSDoc on isLanguageAvailable — added backticks, clarified main-thread vs worker distinction

Test plan

  • vitest run test/unit/language-skip.test.ts — 5 tests pass
  • tsc --noEmit — clean

@vercel

vercel Bot commented Mar 16, 2026

Copy link
Copy Markdown

@jecanore is attempting to deploy a commit to the NexusCore Team on Vercel.

A member of the Team first needs to authorize it.

@magyargergo

Copy link
Copy Markdown
Collaborator

Thank you for your contribution! This is an impressive work!

@github-actions

github-actions Bot commented Mar 16, 2026

Copy link
Copy Markdown
Contributor

CI Report

All checks passed

Pipeline Status

Stage Status Details
✅ Typecheck success tsc --noEmit
✅ Unit Tests success 3 platforms
✅ Integration success 3 OS x 4 groups = 12 jobs

Test Results

Suite Tests Passed Failed Skipped Duration
Unit 1361 1262 0 0 8s
Integration 825 806 0 17 53s
Total 2186 2068 0 17 61s

✅ All 2068 tests passed

17 test(s) skipped — expand for details

Integration:

  • hooks e2e ('Plugin') > directory without .gitnexus > ignores PostToolUse when no .gitnexus directory exists
  • hooks e2e ('Plugin') > directory without .gitnexus > ignores PreToolUse when no .gitnexus directory exists
  • Swift constructor-inferred type resolution > detects User and Repo classes, both with save methods
  • Swift constructor-inferred type resolution > resolves user.save() to Models/User.swift via constructor-inferred type
  • Swift constructor-inferred type resolution > resolves repo.save() to Models/Repo.swift via constructor-inferred type
  • Swift constructor-inferred type resolution > emits exactly 2 save() CALLS edges (one per receiver type)
  • Swift self resolution > detects User and Repo classes, each with a save function
  • Swift self resolution > resolves self.save() inside User.process to User.save, not Repo.save
  • Swift parent resolution > detects BaseModel and User classes plus Serializable protocol
  • Swift parent resolution > emits EXTENDS edge: User → BaseModel
  • Swift parent resolution > emits IMPLEMENTS edge: User → Serializable (protocol conformance)
  • Swift cross-file User.init() inference > resolves user.save() via User.init(name:) inference
  • Swift cross-file User.init() inference > resolves user.greet() via User.init(name:) inference
  • Swift return type inference > detects User class and getUser function
  • Swift return type inference > detects save function on User (Swift class methods are Function nodes)
  • Swift return type inference > resolves user.save() to User#save via return type of getUser() -> User
  • Swift return-type inference via function return type > resolves user.save() to User#save via return type of getUser()
  • Swift return-type inference via function return type > user.save() does NOT resolve to Repo#save
  • Swift return-type inference via function return type > resolves repo.save() to Repo#save via return type of getRepo()

Code Coverage

Combined (Unit + Integration)

Metric Coverage Covered Base Delta Status
Statements 51.86% 4823/9299 31.28% 📈 +20.6 🟢 ██████████░░░░░░░░░░
Branches 44.99% 3009/6687 25.58% 📈 +19.4 🟢 ████████░░░░░░░░░░░░
Functions 53.82% 492/914 34.65% 📈 +19.2 🟢 ██████████░░░░░░░░░░
Lines 53.5% 4394/8213 33.22% 📈 +20.3 🟢 ██████████░░░░░░░░░░
Coverage breakdown by test suite

Unit Tests

Metric Coverage Covered Base Delta Status
Statements 42.29% 3933/9299 31.28% 📈 +11.0 🟢 ████████░░░░░░░░░░░░
Branches 37.7% 2521/6687 25.58% 📈 +12.1 🟢 ███████░░░░░░░░░░░░░
Functions 43.1% 394/914 34.65% 📈 +8.5 🟢 ████████░░░░░░░░░░░░
Lines 43.61% 3582/8213 33.22% 📈 +10.4 🟢 ████████░░░░░░░░░░░░

Integration Tests

Metric Coverage Covered Base Delta Status
Statements 22.84% 2124/9299 31.28% 📉 -8.4 🔴 ████░░░░░░░░░░░░░░░░
Branches 17.16% 1148/6687 25.58% 📉 -8.4 🔴 ███░░░░░░░░░░░░░░░░░
Functions 25.82% 236/914 34.65% 📉 -8.8 🔴 █████░░░░░░░░░░░░░░░
Lines 23.94% 1967/8213 33.22% 📉 -9.3 🔴 ████░░░░░░░░░░░░░░░░

📋 View full run · Generated by CI

@magyargergo

Copy link
Copy Markdown
Collaborator

Thank you for your contribution!

@magyargergo

Copy link
Copy Markdown
Collaborator

@claude try to find gaps and also take a careful look at the description and the problem space. Was this tackled properly as well as covered with properly?

@claude

claude Bot commented Mar 16, 2026

Copy link
Copy Markdown

Claude finished @magyargergo's task in 3m 14s —— View job


PR Review: .gitnexusignore Support & Unsupported Language Handling

  • Gather context and read changed files
  • Analyze ignore-service implementation
  • Analyze parsing-processor and parse-worker guards
  • Analyze filesystem-walker integration
  • Review test coverage
  • Check for gaps against the described problem space
  • Post comprehensive review

Overall Assessment: The PR is well-structured and correctly solves the core problem. The language guard coverage across all sequential processing paths is complete, and the .gitnexusignore integration is sound. However, there are several notable gaps — one of which is a functional bug, two are correctness issues under specific conditions, and the rest are coverage/quality gaps.


Gaps Found

🔴 Critical: Wiki generator never calls loadUserIgnore

wiki/generator.ts:217 and :624 both call shouldIgnorePath(f), but loadUserIgnore is never called in the wiki generation path. The generator has this.repoPath available but doesn't initialize the user ignore state.

If gitnexus wiki is run in a separate process (e.g. after gitnexus analyze), the module-level userIgnore is null and all .gitnexusignore patterns are silently bypassed. The wiki will include paths the user intended to exclude.

This is a direct gap in the stated goal: "users add a .gitnexusignore to exclude platform directories" — this only works during analysis, not documentation generation.

Fix this →


🔴 Medium: Module-level singleton breaks multi-repo scenarios

ignore-service.ts:15if (userIgnoreLoaded) return; means once loaded for any repo, subsequent calls with a different repoPath are silently ignored. In the same process:

loadUserIgnore('/repo-a');  // ✅ loads .gitnexusignore for repo-a
loadUserIgnore('/repo-b');  // ❌ no-op, silently uses repo-a's patterns

This is a latent bug for the programmatic API (new GitNexusAnalyzer(...)) and any future batch/server use cases. resetUserIgnore exists for tests but is not called between analyses. A simple fix would be to cache the repoPath alongside userIgnoreLoaded and reload when the path changes.

Fix this →


🟡 Medium: Skip logging only works on the sequential (fallback) path

The PR description says "Verbose skip logging — tracks and reports skipped unsupported languages" — but this is only implemented in processParsingSequential. The primary path (processParsingWithWorkers) silently continues in the worker with no logging at parse-worker.ts:316-322.

In a normal gitnexus analyze run, workers are used. The skip summary (Skipped unsupported languages: swift: 47) will never appear. The logging feature effectively doesn't work in the common case.

Fix this →


🟡 Minor: continue in worker skips TSX when regular TS is unavailable

At parse-worker.ts:316:

if (!isLanguageAvailable(language, regularFiles[0].path)) continue; // skips to next language

If regularFiles is empty but tsxFiles is non-empty (TypeScript-only .tsx project), the guard at line 316 is never reached and TSX processes correctly. However, if regularFiles is non-empty and the language is unavailable, the continue also bypasses the TSX block at line 326. For TypeScript this is harmless (TS is always available), but for future languages with file-variant splits this logic is fragile.


🟡 Minor: ignore package loaded via createRequire unnecessarily

ignore-service.ts:3-5 — the ignore package (v6+) ships a full ESM build. Using createRequire to load it as CommonJS is unnecessary and weakens type inference (ReturnType<typeof ignore> resolves to any). This is consistent with the tree-sitter-swift optional-dep pattern in the codebase, but swift is optional (may not be installed) whereas ignore is a declared dependency. A standard import ignore from 'ignore' would be cleaner and typesafe.


🟢 Good: Language guard coverage is complete

import-processor.ts:378, call-processor.ts:136, heritage-processor.ts:~107 — all guarded by PR #188. parsing-processor.ts:127 and parse-worker.ts:316,327 — added by this PR. The sequential path coverage is now complete and the worker path has guards. ✓


Test Coverage Gaps

What's Tested What's Missing
shouldIgnorePath with .gitnexusignore patterns ✓ Wiki generator shouldIgnorePath with user patterns
isLanguageAvailable for installed/fabricated langs ✓ Worker's local isLanguageAvailable function (only integration coverage)
Cache/reset behavior ✓ Multi-repo singleton behavior (different repoPath on second call)
No .gitnexusignore file ✓ console.warn output for skip logging in sequential path

The negation test (!Package.swift) correctly validates the ignore package behavior. The "no file" test correctly validates that default ignore rules still apply. These are good.


Summary

The PR correctly and completely addresses the unsupported language crash in both parsing paths. The .gitnexusignore feature works correctly within the analysis pipeline. The main gaps are: (1) wiki generation doesn't respect .gitnexusignore (functional bug), (2) the singleton can silently ignore patterns for a second repo in the same process (latent bug), and (3) the skip-logging feature doesn't work on the primary worker path (documentation mismatch + usability gap).

jecanore added a commit to jecanore/GitNexus that referenced this pull request Mar 16, 2026
… wiki init, worker skip logging

- Wiki generator now calls loadUserIgnore() before shouldIgnorePath() (Fix 1)
- Ignore service caches by repoPath instead of boolean, so switching repos
  reloads .gitnexusignore patterns correctly (Fix 2)
- Worker parse results include skippedLanguages; parsing-processor merges
  and logs the summary -- matching the sequential path behavior (Fix 3)
- Replace continue with if-guard so unavailable regular files don't skip
  the TSX block for the same language (Fix 4)
- Add multi-repo cache reload test
@jecanore

Copy link
Copy Markdown
Contributor Author

Addressed all review gaps identified by @magyargergo's Claude bot review in commit aa68021:

# Gap Fix
1 Wiki generator calls shouldIgnorePath without loading .gitnexusignore Added loadUserIgnore(this.repoPath) call in generator.ts before first shouldIgnorePath usage
2 Singleton boolean breaks multi-repo (second repo silently uses first repo's patterns) Replaced userIgnoreLoaded boolean with userIgnoreLoadedFor: string keyed by repoPath — switching repos now reloads patterns
3 Skip logging only in sequential path, not workers Added skippedLanguages field to ParseWorkerResult, merged counts in parsing-processor.ts, logs same format as sequential path
4 continue in worker skips TSX block when regular files unavailable Replaced continue with if guard so each file group (regular/TSX) is checked independently
5 createRequire for ignore package No change needed — confirmed ignore@7.0.5 has no ESM exports field, createRequire is correct

Tests: Added multi-repo cache reload test. All 1361 unit tests pass. npx gitnexus analyze --force verified on a live repo.

Repository owner deleted a comment from claude Bot Mar 16, 2026
@magyargergo

Copy link
Copy Markdown
Collaborator

@claude try to identify missing gaps and use ultra-thinking to fit this into the architecture that will keep it futures proof

@claude

claude Bot commented Mar 16, 2026

Copy link
Copy Markdown

Claude finished @magyargergo's task in 4m 29s —— View job


Claude finished @magyargergo's task


Deep Architectural Review: .gitnexusignore & Language Guards

  • Gather context and read all changed files
  • Audit all consumers of shouldIgnorePath / loadUserIgnore across the codebase
  • Audit all language-loading paths for missing guards
  • Analyze package.json line 61 concern
  • Architectural analysis: singleton design, worker isolation, extensibility
  • Post comprehensive review with future-proofing recommendations

Summary

The round-1 fixes were thorough and correct. There are one new functional bug (a second shouldIgnorePath call site in the wiki generator that was missed), one cosmetic issue (the package.json question), and three architectural debt items that will hurt future contributors if left unaddressed.


🔴 New Functional Bug: incrementalUpdate calls shouldIgnorePath without loadUserIgnore

generator.ts:627:

if (!found && !shouldIgnorePath(fp)) {   // ← no loadUserIgnore called in this path

The previous review gap fix added loadUserIgnore(this.repoPath) at generator.ts:217, which is inside fullGeneration. But incrementalUpdate (the code path that runs on every subsequent gitnexus wiki call after the first) reaches line 627 without ever calling loadUserIgnore. So on the second and all subsequent wiki runs, the .gitnexusignore patterns are silently inactive in the incremental path — new files that match ignore patterns will be added to the graph.

The fullGeneration fix was half of the solution. incrementalUpdate needs the same treatment at its entry point (generator.ts:598).

Fix this →


🟡 package.json line 61 — Accidental @ladybugdb/core reorder

This answers magyargergo's inline comment. The diff shows:

+    "@ladybugdb/core": "^0.15.1",        ← added here (alphabetically "correct")
     "@modelcontextprotocol/sdk": ...
     ...
     "graphology-utils": "^2.3.0",
-    "@ladybugdb/core": "^0.15.1",        ← removed from here (original position)
+    "ignore": "^7.0.5",

An editor auto-sorted the dependencies alphabetically while ignore was being added. The @ladybugdb/core entry (introduced in PR #275) was out of alphabetical order in main and got silently repositioned. Not a functional issue — the package is still present — but it inflates the diff and obscures the actual change (one new dependency: ignore). The reorder should be reverted to keep the diff minimal.

Fix this →


🟡 Architectural: ParseWorkerResult.skippedLanguages type/guard inconsistency

parse-worker.ts:143 declares skippedLanguages: Record<string, number> as a required field (initialized to {} in processBatch). But parsing-processor.ts:98 guards it with if (result.skippedLanguages) as though it might be absent:

if (result.skippedLanguages) {   // ← this guard is logically dead; the field is always present
  for (const [lang, count] of Object.entries(result.skippedLanguages)) {

This is type-level debt: the guard implies the field is optional, the type says it isn't. A future contributor either asks "when can this be undefined?" (confusing) or copies the defensive pattern everywhere (noise). The fix is to decide one way: either make the type skippedLanguages?: Record<string, number> and initialize to undefined when empty, or remove the guard and rely on the always-present empty object. The latter is cleaner given the initialized-to-{} pattern already in place.

Fix this →


🟡 Architectural: isLanguageAvailable API divergence is a maintenance trap

The two copies of isLanguageAvailable have diverged signatures:

Location Signature
parser-loader.ts:41 isLanguageAvailable(language): boolean
parse-worker.ts:179 isLanguageAvailable(language, filePath): boolean

The worker's filePath parameter exists to distinguish .tsx from .ts (which use different grammars under the same SupportedLanguages.TypeScript key). The main thread doesn't need this because loadLanguage handles TSX inline.

This divergence is intentional right now, but the comment at parse-worker.ts:177 only says "duplicated because workers can't import from the main thread" — it doesn't explain why the signatures differ. The next contributor who adds a language with file-variant splits (e.g., a future .jsx vs .js grammar split) will likely update only one copy. A concrete safeguard: add a type-level satisfies assertion or a brief comment explaining the extra parameter to prevent silent divergence.


🔴 Pre-existing but now architecturally dangerous: Kotlin hard-imports invalidate the guard pattern

parser-loader.ts:11 and parse-worker.ts:12:

import Kotlin from 'tree-sitter-kotlin';  // hard static import

tree-sitter-kotlin is in optionalDependencies. If it's not installed, the module fails to load entirely — the worker crashes before a single isLanguageAvailable call can execute. This PR correctly adds isLanguageAvailable guards, but for Kotlin those guards are unreachable because the import barrier comes first.

Swift is correctly guarded:

let Swift: any = null;
try { Swift = _require('tree-sitter-swift'); } catch {}

Kotlin should follow the same pattern. This is pre-existing and not introduced by this PR, but the PR's stated goal is "complete unsupported language handling" — and Kotlin on an environment without the optional package is currently handled worse than Swift. A user without tree-sitter-kotlin installed hits a crash at import time, not a graceful skip.

Fix this →


Future-proofing: Structural recommendations

On ignore-service.ts — the "call-before-use" coupling is fragile at scale. Currently every consumer must remember to call loadUserIgnore(repoPath) before using shouldIgnorePath. fullGeneration does. incrementalUpdate now doesn't (gap above). The wiki's init flow does. Any new entry point (a future gitnexus lint or gitnexus check command) will silently bypass user patterns until someone notices.

A more future-proof pattern is lazy initialization inside shouldIgnorePath itself — accept repoPath as an optional parameter and auto-load if not yet loaded for that path. This eliminates the call-before-use requirement entirely:

// More robust API:
export const shouldIgnorePath = (filePath: string, repoPath?: string): boolean => {
  if (repoPath) loadUserIgnore(repoPath);
  // ... rest of function
};

Or alternatively, a stronger version: throw (in dev/test) if shouldIgnorePath is called without a prior loadUserIgnore. Silent no-ops are harder to debug than loud failures.

On language extensibility: The languageMap pattern in two places (main thread + worker) means adding a new language requires exactly 2 coordinated edits — and for optional languages, exactly 4 coordinated changes (map entry + conditional spread + worker map + worker conditional spread). This is manageable now but error-prone at scale. A registry pattern in supported-languages.ts where each language declares its package name and whether it's optional would give a single source of truth.

…iscovery

Add support for excluding files from indexing based on .gitignore and
.gitnexusignore patterns. Previously, GitNexus used only a hardcoded
ignore list, causing significant index pollution in repositories with
git-ignored directories containing code (e.g., Docker-mounted volumes).

Changes:
- Add `ignore` package for gitignore-spec pattern matching
- Add `loadIgnoreRules()` to parse .gitignore + .gitnexusignore
- Add `createIgnoreFilter()` returning glob-compatible IgnoreLike object
- Integrate filter into glob's `ignore` option for directory-level pruning
- Remove post-glob `.filter()` call (now handled during traversal)

The hardcoded DEFAULT_IGNORE_LIST remains as fallback for non-git repos.

Closes abhigyanpatwari#228

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
jecanore added a commit to jecanore/GitNexus that referenced this pull request Mar 16, 2026
…port, wiki ignore, dead code

- Move loadUserIgnore() from fullGeneration() to run() so incrementalUpdate()
  also loads .gitnexusignore patterns before calling shouldIgnorePath()
- Convert Kotlin from hard static import to try/catch dynamic require (matches
  Swift pattern) in both parser-loader.ts and parse-worker.ts — prevents worker
  crash when tree-sitter-kotlin is not installed
- Remove dead `if (result.skippedLanguages)` guard in parsing-processor.ts —
  field is always {} (truthy), guard never filtered anything
- Improve isLanguageAvailable comment in parse-worker.ts explaining why the
  worker version takes an extra filePath parameter vs main-thread version
@jecanore

Copy link
Copy Markdown
Contributor Author

Addressed the second review round in cbf52c7:

# Gap Fix
6 incrementalUpdate calls shouldIgnorePath without loadUserIgnore Moved loadUserIgnore() from fullGeneration() to run() — covers both code paths
7 Kotlin hard import crashes worker if not installed Converted to try/catch dynamic require (matches Swift pattern) in parser-loader.ts and parse-worker.ts
8 Dead if (result.skippedLanguages) guard Removed — field is always {} (truthy)
9 Worker isLanguageAvailable comment missing divergence explanation Added comment explaining why worker version takes extra filePath param
10 @ladybugdb/core reorder Skipped — already in correct position

All unit tests pass (47/49 — 2 pre-existing failures unrelated to this PR). npx gitnexus analyze --force verified on BoomerAI repo.

ivkond and others added 2 commits March 16, 2026 11:24
- Distinguish ENOENT vs EACCES in loadIgnoreRules (warn on permission errors)
- Add GITNEXUS_NO_GITIGNORE env var to bypass .gitignore parsing
- Fix bare-name pattern matching in childrenIgnored (check both with/without trailing slash)
- Rename isIgnoredDirectory to isHardcodedIgnoredDirectory for clarity
- Add clarifying comments for design decisions (D2 negation, D3 dot:false redundancy)
- Add tests for bare-name patterns, file-glob patterns, EACCES handling, env var

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- G1: Document GITNEXUS_NO_GITIGNORE in `analyze --help` and log when active
- G2: Add comment clarifying path-scurry POSIX normalization contract
- G3: Add IgnoreOptions interface — env var now falls back, callers can
  pass `noGitignore` explicitly for testability and future CLI flag
- G4: Add integration test verifying walkRepositoryPaths respects the env var

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@magyargergo

Copy link
Copy Markdown
Collaborator

Would you mind rebasing this on top of #231?
I left some context here: #231 (comment)

I think the two PRs are complementary, but #231 feels like the better base for the ignore-handling side, while this PR adds useful resilience for unsupported languages. Rebasing on top of that would help us combine the strengths of both and avoid having two overlapping .gitnexusignore implementations.

@jecanore

Copy link
Copy Markdown
Contributor Author

@magyargergo Makes sense — just replied on #231 with a detailed breakdown of where each PR is stronger and a consolidation plan.

Short version: I'll rebase #301 on top of #231 once it merges, dropping the overlapping .gitnexusignore implementation (loadUserIgnore/resetUserIgnore in ignore-service.ts, the filesystem-walker.ts change). The rebased PR would carry only the language resilience work:

No file overlap with #231 after the rebase. I'll update this PR as soon as #231 lands.

ivkond and others added 2 commits March 16, 2026 12:04
…rammars

Port unsupported language resilience from PR abhigyanpatwari#301 by @jecanore.
- Make Kotlin import optional (like Swift) in parser-loader and parse-worker
- Add worker-local isLanguageAvailable() with filePath param for tsx distinction
- Track and log skipped files per language in both sequential and worker paths
- Add skippedLanguages to ParseWorkerResult for worker→main aggregation
- Add isLanguageAvailable unit tests

Refs: abhigyanpatwari#301, abhigyanpatwari#155, abhigyanpatwari#228

Co-Authored-By: jecanore <juan@housingbase.io>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add a fixture repo (test/fixtures/ignore-and-skip-repo/) with .gitignore,
.gitnexusignore, TypeScript source files, and a Swift file to exercise
all three features end-to-end:

- File discovery: verifies .gitignore excludes data/ and *.log,
  .gitnexusignore excludes vendor/, source files are discovered
- Parsing: verifies TypeScript files produce Function nodes and DEFINES
  relationships, Swift files are skipped gracefully when grammar is
  unavailable

Add the test to the standalone group in ci-integration.yml and coverage job.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@ivkond

ivkond commented Mar 16, 2026

Copy link
Copy Markdown
Contributor

Hi @jecanore! 👋

@magyargergo asked us to consolidate our PR #231 (.gitignore/.gitnexusignore support) with the unsupported language resilience work from your PR. We've ported the language-skip logic (optional Kotlin import, isLanguageAvailable guards, skippedLanguages tracking) into our branch, with Co-Authored-By attribution in the commit.

We intentionally kept our ignore-service implementation since it covers .gitignore + .gitnexusignore + env var escape hatch, but your language-skip approach was exactly what was needed — clean and well-structured.

Wanted to check: are you okay with this consolidation? If you'd prefer a different attribution approach or have any concerns, happy to adjust. Your work is credited in the commit and referenced in the PR comment.

Clarify that the filePath parameter handles .tsx distinction at the
worker level, while the main-thread version does this inside
loadLanguage() instead.

Rebased on top of abhigyanpatwari#231 — all ignore and language resilience changes
from abhigyanpatwari#301 were already ported by ivkond in 505fc9b.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@jecanore jecanore force-pushed the fix/ignore-and-unsupported-lang branch from cbf52c7 to b6b5648 Compare March 16, 2026 09:40
@jecanore jecanore changed the title fix: add .gitnexusignore support and complete unsupported language handling style: improve parse-worker JSDoc for isLanguageAvailable Mar 16, 2026
@jecanore

Copy link
Copy Markdown
Contributor Author

Rebased on top of #231 per @magyargergo's consolidation request.

All language resilience work from this PR was already ported into #231 by @ivkond in 505fc9b (co-authored credit included). The only remaining delta is a minor JSDoc improvement in parse-worker.ts.

This PR can be closed if the maintainers prefer — the substantive work now lives entirely in #231.

magyargergo added a commit that referenced this pull request Mar 16, 2026
…iscovery (#231)

* feat(ingestion): respect .gitignore and .gitnexusignore during file discovery

Add support for excluding files from indexing based on .gitignore and
.gitnexusignore patterns. Previously, GitNexus used only a hardcoded
ignore list, causing significant index pollution in repositories with
git-ignored directories containing code (e.g., Docker-mounted volumes).

Changes:
- Add `ignore` package for gitignore-spec pattern matching
- Add `loadIgnoreRules()` to parse .gitignore + .gitnexusignore
- Add `createIgnoreFilter()` returning glob-compatible IgnoreLike object
- Integrate filter into glob's `ignore` option for directory-level pruning
- Remove post-glob `.filter()` call (now handled during traversal)

The hardcoded DEFAULT_IGNORE_LIST remains as fallback for non-git repos.

Closes #228

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix(ingestion): address review feedback on ignore filtering

- Distinguish ENOENT vs EACCES in loadIgnoreRules (warn on permission errors)
- Add GITNEXUS_NO_GITIGNORE env var to bypass .gitignore parsing
- Fix bare-name pattern matching in childrenIgnored (check both with/without trailing slash)
- Rename isIgnoredDirectory to isHardcodedIgnoredDirectory for clarity
- Add clarifying comments for design decisions (D2 negation, D3 dot:false redundancy)
- Add tests for bare-name patterns, file-glob patterns, EACCES handling, env var

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix(ingestion): address second round of review feedback

- G1: Document GITNEXUS_NO_GITIGNORE in `analyze --help` and log when active
- G2: Add comment clarifying path-scurry POSIX normalization contract
- G3: Add IgnoreOptions interface — env var now falls back, callers can
  pass `noGitignore` explicitly for testability and future CLI flag
- G4: Add integration test verifying walkRepositoryPaths respects the env var

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* feat(ingestion): gracefully skip files with unavailable tree-sitter grammars

Port unsupported language resilience from PR #301 by @jecanore.
- Make Kotlin import optional (like Swift) in parser-loader and parse-worker
- Add worker-local isLanguageAvailable() with filePath param for tsx distinction
- Track and log skipped files per language in both sequential and worker paths
- Add skippedLanguages to ParseWorkerResult for worker→main aggregation
- Add isLanguageAvailable unit tests

Refs: #301, #155, #228

Co-Authored-By: jecanore <juan@housingbase.io>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* test(e2e): add ignore + language-skip end-to-end test with fixture repo

Add a fixture repo (test/fixtures/ignore-and-skip-repo/) with .gitignore,
.gitnexusignore, TypeScript source files, and a Swift file to exercise
all three features end-to-end:

- File discovery: verifies .gitignore excludes data/ and *.log,
  .gitnexusignore excludes vendor/, source files are discovered
- Parsing: verifies TypeScript files produce Function nodes and DEFINES
  relationships, Swift files are skipped gracefully when grammar is
  unavailable

Add the test to the standalone group in ci-integration.yml and coverage job.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix(ci): move ignore-and-skip-e2e test to e2e group per review feedback

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix(test): use temp directory instead of fixture for e2e ignore test

The fixture's .gitignore prevented data/seed.json and debug.log from
being committed — these files would be missing after checkout in CI.

Switch to creating the entire test structure in a temp directory via
beforeAll (matching filesystem-walker.test.ts pattern). This ensures
all files exist regardless of git ignore rules.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix(test): correct graph API usage in e2e ignore test

Use graph.nodes property getter instead of graph.getNodes(), and check
Function node filePath instead of non-existent File nodes (File nodes
are created by processStructure, not processParsing).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* ci: add workflows permission to ci-integration.yml

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* ci: change workflows permission to write per review

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* ci: move workflows permission from ci-integration.yml to ci.yml caller

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix(ci): fix Claude workflows for fork PRs, remove misplaced workflows perm

Three issues prevented Claude from running on fork PRs:

1. claude-code-review.yml lacked workflows:write — push failed when
   fork PRs modify .github/workflows/ files
2. claude.yml had no fork PR support — checked out main and couldn't
   fetch the fork's branch from origin
3. Cleanup step unconditionally deleted branches even when push failed,
   breaking the concurrent claude.yml workflow

Also removes workflows:write from ci.yml's integration job — CI tests
don't need that permission. The permission belongs on the claude
workflows that push fork branches.

Changes:
- Add workflows:write to both claude workflow permissions blocks
- Add fork PR detection + branch push/cleanup to claude.yml
- Add step id to push-fork; cleanup only runs if push succeeded
- Pass branch names via env vars to prevent shell injection (security)
- Add concurrency groups to prevent race conditions between workflows
- Remove misplaced workflows:write from ci.yml integration job

* fix(ci): use GitHub API for fork branch refs instead of git push

GITHUB_TOKEN cannot have 'workflows' permission — it's only valid for
PATs and GitHub Apps. This means git push fails whenever a fork PR
modifies .github/workflows/ files.

Replace git push with the GitHub REST API (POST/PATCH /git/refs) to
create temporary branch refs. The API creates a pointer to the
already-existing PR head commit without triggering the workflow file
push protection. Similarly, cleanup uses DELETE /git/refs instead of
git push --delete.

Also removes the invalid 'workflows: write' from permissions blocks.

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: jecanore <juan@housingbase.io>
Co-authored-by: Gergo Magyar <gergomagyar@icloud.com>
@magyargergo

Copy link
Copy Markdown
Collaborator

#231 has been merged and as dicussed we are closing this PR.

@magyargergo

Copy link
Copy Markdown
Collaborator

Thank you for your contribution!

motolese pushed a commit to motolese/datamoto-gitnexus that referenced this pull request Apr 23, 2026
…iscovery (abhigyanpatwari#231)

* feat(ingestion): respect .gitignore and .gitnexusignore during file discovery

Add support for excluding files from indexing based on .gitignore and
.gitnexusignore patterns. Previously, GitNexus used only a hardcoded
ignore list, causing significant index pollution in repositories with
git-ignored directories containing code (e.g., Docker-mounted volumes).

Changes:
- Add `ignore` package for gitignore-spec pattern matching
- Add `loadIgnoreRules()` to parse .gitignore + .gitnexusignore
- Add `createIgnoreFilter()` returning glob-compatible IgnoreLike object
- Integrate filter into glob's `ignore` option for directory-level pruning
- Remove post-glob `.filter()` call (now handled during traversal)

The hardcoded DEFAULT_IGNORE_LIST remains as fallback for non-git repos.

Closes abhigyanpatwari#228

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix(ingestion): address review feedback on ignore filtering

- Distinguish ENOENT vs EACCES in loadIgnoreRules (warn on permission errors)
- Add GITNEXUS_NO_GITIGNORE env var to bypass .gitignore parsing
- Fix bare-name pattern matching in childrenIgnored (check both with/without trailing slash)
- Rename isIgnoredDirectory to isHardcodedIgnoredDirectory for clarity
- Add clarifying comments for design decisions (D2 negation, D3 dot:false redundancy)
- Add tests for bare-name patterns, file-glob patterns, EACCES handling, env var

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix(ingestion): address second round of review feedback

- G1: Document GITNEXUS_NO_GITIGNORE in `analyze --help` and log when active
- G2: Add comment clarifying path-scurry POSIX normalization contract
- G3: Add IgnoreOptions interface — env var now falls back, callers can
  pass `noGitignore` explicitly for testability and future CLI flag
- G4: Add integration test verifying walkRepositoryPaths respects the env var

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* feat(ingestion): gracefully skip files with unavailable tree-sitter grammars

Port unsupported language resilience from PR abhigyanpatwari#301 by @jecanore.
- Make Kotlin import optional (like Swift) in parser-loader and parse-worker
- Add worker-local isLanguageAvailable() with filePath param for tsx distinction
- Track and log skipped files per language in both sequential and worker paths
- Add skippedLanguages to ParseWorkerResult for worker→main aggregation
- Add isLanguageAvailable unit tests

Refs: abhigyanpatwari#301, abhigyanpatwari#155, abhigyanpatwari#228

Co-Authored-By: jecanore <juan@housingbase.io>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* test(e2e): add ignore + language-skip end-to-end test with fixture repo

Add a fixture repo (test/fixtures/ignore-and-skip-repo/) with .gitignore,
.gitnexusignore, TypeScript source files, and a Swift file to exercise
all three features end-to-end:

- File discovery: verifies .gitignore excludes data/ and *.log,
  .gitnexusignore excludes vendor/, source files are discovered
- Parsing: verifies TypeScript files produce Function nodes and DEFINES
  relationships, Swift files are skipped gracefully when grammar is
  unavailable

Add the test to the standalone group in ci-integration.yml and coverage job.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix(ci): move ignore-and-skip-e2e test to e2e group per review feedback

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix(test): use temp directory instead of fixture for e2e ignore test

The fixture's .gitignore prevented data/seed.json and debug.log from
being committed — these files would be missing after checkout in CI.

Switch to creating the entire test structure in a temp directory via
beforeAll (matching filesystem-walker.test.ts pattern). This ensures
all files exist regardless of git ignore rules.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix(test): correct graph API usage in e2e ignore test

Use graph.nodes property getter instead of graph.getNodes(), and check
Function node filePath instead of non-existent File nodes (File nodes
are created by processStructure, not processParsing).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* ci: add workflows permission to ci-integration.yml

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* ci: change workflows permission to write per review

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* ci: move workflows permission from ci-integration.yml to ci.yml caller

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix(ci): fix Claude workflows for fork PRs, remove misplaced workflows perm

Three issues prevented Claude from running on fork PRs:

1. claude-code-review.yml lacked workflows:write — push failed when
   fork PRs modify .github/workflows/ files
2. claude.yml had no fork PR support — checked out main and couldn't
   fetch the fork's branch from origin
3. Cleanup step unconditionally deleted branches even when push failed,
   breaking the concurrent claude.yml workflow

Also removes workflows:write from ci.yml's integration job — CI tests
don't need that permission. The permission belongs on the claude
workflows that push fork branches.

Changes:
- Add workflows:write to both claude workflow permissions blocks
- Add fork PR detection + branch push/cleanup to claude.yml
- Add step id to push-fork; cleanup only runs if push succeeded
- Pass branch names via env vars to prevent shell injection (security)
- Add concurrency groups to prevent race conditions between workflows
- Remove misplaced workflows:write from ci.yml integration job

* fix(ci): use GitHub API for fork branch refs instead of git push

GITHUB_TOKEN cannot have 'workflows' permission — it's only valid for
PATs and GitHub Apps. This means git push fails whenever a fork PR
modifies .github/workflows/ files.

Replace git push with the GitHub REST API (POST/PATCH /git/refs) to
create temporary branch refs. The API creates a pointer to the
already-existing PR head commit without triggering the workflow file
push protection. Similarly, cleanup uses DELETE /git/refs instead of
git push --delete.

Also removes the invalid 'workflows: write' from permissions blocks.

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: jecanore <juan@housingbase.io>
Co-authored-by: Gergo Magyar <gergomagyar@icloud.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants