Skip to content

feat(ingestion): respect .gitignore and .gitnexusignore during file discovery#231

Merged
magyargergo merged 13 commits into
abhigyanpatwari:mainfrom
ivkond:feat/gitignore-support
Mar 16, 2026
Merged

feat(ingestion): respect .gitignore and .gitnexusignore during file discovery#231
magyargergo merged 13 commits into
abhigyanpatwari:mainfrom
ivkond:feat/gitignore-support

Conversation

@ivkond

@ivkond ivkond commented Mar 10, 2026

Copy link
Copy Markdown
Contributor

Summary

  • Respect .gitignore: files/dirs listed in the repo root .gitignore are now excluded from indexing
  • Support .gitnexusignore: custom ignore file (gitignore syntax) for project-specific exclusion patterns
  • Directory-level pruning: filtering happens at glob traversal level via childrenIgnored, skipping entire subtrees (no stat() calls on ignored files)
  • Backward compatible: existing hardcoded DEFAULT_IGNORE_LIST still applies as fallback

Motivation

GitNexus currently indexes all files on the filesystem, filtering only through a hardcoded DEFAULT_IGNORE_LIST. This causes significant noise in repos with git-ignored directories containing code.

Real-world example: A repo with Docker-mounted Grafana plugins in core/local/grafana/ (listed in .gitignore) had ~1,950 symbols (~34% of index) from irrelevant plugin code. Top execution flows, clusters, and search results were all polluted.

Changes

File Change
src/config/ignore-service.ts Add loadIgnoreRules(), createIgnoreFilter(), isIgnoredDirectory()
src/core/ingestion/filesystem-walker.ts Use createIgnoreFilter() in glob options, remove post-filter
package.json Add ignore dependency (^7.0.0, 11KB, 0 deps)
test/unit/ignore-service.test.ts 10 new unit tests for isIgnoredDirectory, loadIgnoreRules, createIgnoreFilter
test/integration/filesystem-walker.test.ts 8 new integration tests for .gitignore, .gitnexusignore, and combined

Architecture

Three ignore layers combined into a single glob IgnoreLike filter:

Layer 1: .gitignore patterns       (root only, via `ignore` package)
Layer 2: .gitnexusignore patterns   (root only, via `ignore` package)
Layer 3: DEFAULT_IGNORE_LIST        (existing hardcoded sets — unchanged)

The ignore npm package (~30M weekly downloads, used by eslint) handles gitignore-spec parsing including negation patterns, comments, and glob matching.

Scope limitations (documented for follow-up)

  • Only root-level .gitignore — subdirectory .gitignore files are not loaded (covers 95%+ of real cases)
  • No --no-gitignore CLI flag — can be added in a follow-up
  • Hardcoded lists cannot be overridden by negation patterns — intentional safety measure

Test plan

  • Unit: loadIgnoreRules with .gitignore, .gitnexusignore, both, neither
  • Unit: isIgnoredDirectory for known and unknown dirs
  • Unit: createIgnoreFilter returns valid IgnoreLike object
  • Unit: shouldIgnorePath unchanged behavior (all 86 existing tests pass)
  • Integration: .gitignore excludes directories from scan
  • Integration: .gitnexusignore excludes directories from scan
  • Integration: Both files combined work correctly
  • Integration: Hardcoded ignore list still applies alongside gitignore
  • Build: tsc compiles without errors
  • All 849 unit tests pass, all 18 filesystem-walker integration tests pass

Closes #228

@vercel

vercel Bot commented Mar 10, 2026

Copy link
Copy Markdown

@ivkond is attempting to deploy a commit to the NexusCore Team on Vercel.

A member of the Team first needs to authorize it.

@jecanore

Copy link
Copy Markdown
Contributor

@ivkond, opened PR #301 combining .gitnexusignore with the unsupported language fix. Your 3-layer approach and subtree pruning are impressive — I kept the implementation simpler (single .gitnexusignore via ignore npm package) to maximize merge likelihood, but if @magyargergo wants the layered approach I'm happy to expand. Your PR was the most thorough analysis of the problem space. 👏

@magyargergo

Copy link
Copy Markdown
Collaborator

@claude try to find gaps in this implementation and see use ultra-thinking if it maintains the architecture and it's future proof and compare it with PR #301

@claude

claude Bot commented Mar 16, 2026

Copy link
Copy Markdown

Claude finished @magyargergo's task in 3m 51s —— View job


Deep Analysis: PR #231 — Gaps, Architecture & Comparison with PR #301


Executive Summary

PR #231 is a solid, well-thought-out implementation with good test coverage and clean architecture. The 3-layer filter design is correct and extensible. I found 5 real bugs/gaps, 3 design concerns, and several test coverage gaps. For the future-proofing dimension, the architecture is good but has one important escape-hatch missing that should block merge.


Bugs & Correctness Gaps

1. Silent failure masks misconfigured .gitignoreignore-service.ts:263

} catch {
  // File doesn't exist or unreadable — skip silently
}

The catch block handles both "file does not exist" (ENOENT) and "file exists but unreadable" (EACCES) identically. In the latter case, the user has a .gitignore that the process can't read, so the filter silently degrades with no warning — the entire purpose of the feature fails without feedback.

Fix: distinguish ENOENT from other errors: Fix this →

2. isIgnoredDirectory is misleading after this PR — ignore-service.ts:246

export const isIgnoredDirectory = (name: string): boolean => {
  return DEFAULT_IGNORE_LIST.has(name);
};

This exported function now only checks the hardcoded list — it has no knowledge of .gitignore patterns. Any caller outside ignore-service.ts that uses this to gate directory traversal would silently bypass the new gitignore support. The function's name implies a complete "is this dir ignored?" answer, but it's now incomplete.

Risk: If other parts of the codebase (or future code) call isIgnoredDirectory() as a guard, gitignore-listed directories won't be filtered correctly. The function should either be made internal, or its signature should accept an optional Ignore instance.

3. childrenIgnored check only appends / — potential mismatch for bare-name patterns

// ignore-service.ts:297
if (rel && ig.ignores(rel + '/')) return true;

If .gitignore contains local (bare name, no trailing slash — valid gitignore syntax meaning "match file or directory named local"), then ig.ignores('local/') may return false in some versions of the ignore package because the path has a trailing slash while the pattern does not. The test suite only tests local/\n patterns (with trailing slash), so this edge case is untested. In contrast, ignored() at line 287 calls ig.ignores(rel) without the slash, so bare-name directory patterns work fine there but not in childrenIgnored. This creates an asymmetry: the subtree-pruning optimization silently degrades for a valid gitignore pattern class.

4. createIgnoreFilter is not memoized — double I/O on re-analysis

// filesystem-walker.ts:35
const ignoreFilter = await createIgnoreFilter(repoPath);

walkRepositoryPaths creates a new filter per call. If the analysis pipeline calls walkRepositoryPaths more than once for the same repo (incremental re-indexing, watch mode, error retry), the .gitignore files are re-read each time with no caching. For large monorepos this is minor, but it's an easy win and the ignore instance is pure/immutable after construction.

5. No handling of .gitignore_global / ~/.config/git/ignore

The global gitignore (git config --global core.excludesFile) is respected by git itself and by tools like ESLint, but not by this implementation. In CI environments or developer machines with a ~/.gitignore_global listing things like *.DS_Store, .idea/, etc., those exclusions won't apply. Low severity since DEFAULT_IGNORE_LIST covers most common cases, but technically incorrect relative to git's actual behavior.


Design Concerns

D1. No --no-gitignore escape hatch (documented gap, but it should block merge)

The PR itself calls this out as a follow-up. But consider: if a project's .gitignore accidentally patterns-match important source files, there is no override. Examples:

  • A repo with *.ts in .gitignore (legitimate — ignoring compiled TS in a mixed TS/JS project) would cause GitNexus to index nothing
  • A monorepo with packages/ in .gitignore (keeping published dist out of git) but packages/ is source

This is the biggest future-proofing concern. It should ship with a GITNEXUS_NO_GITIGNORE=1 env variable or --no-gitignore CLI flag, even as a simple escape valve. Without it, the feature can be a footgun that is impossible to recover from without editing the gitignore.

D2. Negation patterns in .gitnexusignore cannot unignore hardcoded entries

# .gitnexusignore
!vendor/    ← This will NOT work to include vendor/

The ignore package would set ig.ignores('vendor/...') to false, but then shouldIgnorePath (line 289) would still return true via DEFAULT_IGNORE_LIST. This is documented as "intentional safety measure" — but it means .gitnexusignore is a weaker language than standard gitignore, which is unintuitive. Users writing .gitnexusignore will expect full gitignore semantics. The documentation (in the PR description) should be surfaced in a user-visible way, e.g., a comment in .gitnexusignore itself or in gitnexus analyze --help.

D3. dot: false + filter creates redundant checks for dot-directories

walkRepositoryPaths uses dot: false in glob (line 40 of filesystem-walker.ts), which prevents glob from matching or traversing into dot-directories (.git, .vscode, etc.). This means childrenIgnored is never called for these paths — they're excluded at the glob level before the filter runs. The DEFAULT_IGNORE_LIST entries for .git, .vscode, etc. in childrenIgnored (line 293) are dead code paths. This is harmless but creates a maintenance illusion: future developers may think DEFAULT_IGNORE_LIST is what protects against .git being indexed, and remove dot: false, causing a regression.


Test Coverage Gaps

Missing Test Why It Matters
.gitignore with bare-name directory pattern (local not local/) Covers Bug #3 above
.gitnexusignore negation pattern (!vendor/) Should document/verify the limitation
.gitignore with file-glob pattern (*.log) — tests only directory patterns Ensures file-level filtering works, not just directory pruning
Symlinks pointing into ignored directories Edge case that real repos hit
.gitignore with anchored patterns (/root-only-dir/) Anchoring is a gitignore feature the ignore package supports
isIgnoredDirectory with gitignore-only dirs Would expose the misleading export issue (Bug #2)
Permission error on .gitignore file Covers Bug #1 (silent failure)

Architecture & Future-Proofing Assessment

What's good:

  • The IgnoreLike interface integration is correct and glob-version-stable
  • childrenIgnored for true directory-level pruning (not just result filtering) is the right approach — avoids stat() on ignored trees
  • Async factory pattern (createIgnoreFilter) is clean and supports future async sources (network, git config reading)
  • The ignore npm package choice is correct — 30M weekly downloads, gitignore-spec compliant, handles negations/comments/globs

What limits future extensibility:

  1. Nested .gitignore files would require threading the Ignore instance through recursive traversal — doable but requires changing createIgnoreFilter's interface
  2. Per-directory ignore instances (needed for proper nested support) would require a different architecture than the current single-instance approach
  3. The shouldIgnorePath function (extension/filename based filtering) is still called via ignored() on every file — if the filter is later extended to be configurable, this function's coupling to the filter complicates it

Verdict: The architecture correctly separates concerns and the extensibility points needed are reachable without breaking changes. The biggest blocker for "future proof" is the missing escape hatch (D1).


Comparison with PR #301

Based on @jecanore's comment: PR #301 uses a "single .gitnexusignore via ignore npm package" and is "simpler to maximize merge likelihood"

Dimension PR #231 PR #301
.gitignore support ✅ Automatic, zero-config ❌ Not included
.gitnexusignore support ✅ Yes ✅ Yes
Scope Ignore filtering only Ignore + unsupported language fix
Implementation complexity Higher (3-layer) Lower (1 file, simpler)
Real-world problem solved Grafana/Docker-mounted plugin case works out of box Requires user to create .gitnexusignore duplicating .gitignore
Test coverage 18 integration tests Unknown
Merge risk Slightly higher (more change) Lower

Recommendation: PR #231's approach is architecturally superior for the core problem. The real-world motivation (repo with 34% index pollution from gitignored directories) is exactly the case where automatic .gitignore respect is the right default. PR #301's single-file approach forces every affected user to manually create and maintain .gitnexusignore duplicating what's already in .gitignore.

That said, PR #301's bundling with an unsupported-language fix means merging #231 alone still leaves a separate PR outstanding. Ideal outcome: merge PR #231's ignore logic (it's the more complete implementation), and cherry-pick or re-apply the unsupported-language fix from #301 separately.


Summary: What Needs to Happen Before Merge

Priority Issue Blocking?
🔴 High Add --no-gitignore escape hatch (env var or CLI flag) Yes — footgun risk
🟡 Medium Fix silent failure to distinguish ENOENT vs EACCES Recommended
🟡 Medium Add test for bare-name directory patterns in .gitignore Recommended
🟡 Medium Document/rename isIgnoredDirectory to signal it's hardcoded-only Recommended
🟢 Low Memoize createIgnoreFilter per repoPath Nice-to-have
🟢 Low Add negation-pattern limitation notice to user docs Nice-to-have

@github-actions

github-actions Bot commented Mar 16, 2026

Copy link
Copy Markdown
Contributor

CI Report

All checks passed

Pipeline Status

Stage Status Details
✅ Typecheck success tsc --noEmit
✅ Unit Tests success 3 platforms
✅ Integration success 3 OS x 4 groups = 12 jobs

Test Results

Suite Tests Passed Failed Skipped Duration
Unit 1370 1320 0 0 8s
Integration 837 820 0 17 53s
Total 2207 2140 0 17 61s

✅ All 2140 tests passed

17 test(s) skipped — expand for details

Integration:

  • Swift constructor-inferred type resolution > detects User and Repo classes, both with save methods
  • Swift constructor-inferred type resolution > resolves user.save() to Models/User.swift via constructor-inferred type
  • Swift constructor-inferred type resolution > resolves repo.save() to Models/Repo.swift via constructor-inferred type
  • Swift constructor-inferred type resolution > emits exactly 2 save() CALLS edges (one per receiver type)
  • Swift self resolution > detects User and Repo classes, each with a save function
  • Swift self resolution > resolves self.save() inside User.process to User.save, not Repo.save
  • Swift parent resolution > detects BaseModel and User classes plus Serializable protocol
  • Swift parent resolution > emits EXTENDS edge: User → BaseModel
  • Swift parent resolution > emits IMPLEMENTS edge: User → Serializable (protocol conformance)
  • Swift cross-file User.init() inference > resolves user.save() via User.init(name:) inference
  • Swift cross-file User.init() inference > resolves user.greet() via User.init(name:) inference
  • Swift return type inference > detects User class and getUser function
  • Swift return type inference > detects save function on User (Swift class methods are Function nodes)
  • Swift return type inference > resolves user.save() to User#save via return type of getUser() -> User
  • Swift return-type inference via function return type > resolves user.save() to User#save via return type of getUser()
  • Swift return-type inference via function return type > user.save() does NOT resolve to Repo#save
  • Swift return-type inference via function return type > resolves repo.save() to Repo#save via return type of getRepo()

Code Coverage

Combined (Unit + Integration)

Metric Coverage Covered Base Delta Status
Statements 52.2% 4862/9313 31.28% 📈 +20.9 🟢 ██████████░░░░░░░░░░
Branches 45.06% 3023/6708 25.58% 📈 +19.5 🟢 █████████░░░░░░░░░░░
Functions 54.36% 498/916 34.65% 📈 +19.7 🟢 ██████████░░░░░░░░░░
Lines 53.89% 4433/8225 33.22% 📈 +20.7 🟢 ██████████░░░░░░░░░░
Coverage breakdown by test suite

Unit Tests

Metric Coverage Covered Base Delta Status
Statements 41.6% 3875/9313 31.28% 📈 +10.3 🟢 ████████░░░░░░░░░░░░
Branches 36.71% 2463/6708 25.58% 📈 +11.1 🟢 ███████░░░░░░░░░░░░░
Functions 43.01% 394/916 34.65% 📈 +8.4 🟢 ████████░░░░░░░░░░░░
Lines 42.9% 3529/8225 33.22% 📈 +9.7 🟢 ████████░░░░░░░░░░░░

Integration Tests

Metric Coverage Covered Base Delta Status
Statements 24.17% 2251/9313 31.28% 📉 -7.1 🔴 ████░░░░░░░░░░░░░░░░
Branches 18.6% 1248/6708 25.58% 📉 -7.0 🔴 ███░░░░░░░░░░░░░░░░░
Functions 27.29% 250/916 34.65% 📉 -7.4 🔴 █████░░░░░░░░░░░░░░░
Lines 25.37% 2087/8225 33.22% 📉 -7.8 🔴 █████░░░░░░░░░░░░░░░

📋 View full run · Generated by CI

…iscovery

Add support for excluding files from indexing based on .gitignore and
.gitnexusignore patterns. Previously, GitNexus used only a hardcoded
ignore list, causing significant index pollution in repositories with
git-ignored directories containing code (e.g., Docker-mounted volumes).

Changes:
- Add `ignore` package for gitignore-spec pattern matching
- Add `loadIgnoreRules()` to parse .gitignore + .gitnexusignore
- Add `createIgnoreFilter()` returning glob-compatible IgnoreLike object
- Integrate filter into glob's `ignore` option for directory-level pruning
- Remove post-glob `.filter()` call (now handled during traversal)

The hardcoded DEFAULT_IGNORE_LIST remains as fallback for non-git repos.

Closes #228

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@ivkond ivkond force-pushed the feat/gitignore-support branch from e1b94ca to f0fc601 Compare March 16, 2026 08:11
@ivkond

ivkond commented Mar 16, 2026

Copy link
Copy Markdown
Contributor Author

Addressed review feedback and rebased onto upstream/main

Changes in f0fc601

Following the thorough analysis by @claude (triggered by @magyargergo), I've addressed the actionable items:

Bug fixes:

  1. ENOENT vs EACCES distinction (ignore-service.ts) — The catch block in loadIgnoreRules now distinguishes between "file not found" (silently skipped) and "file exists but unreadable" (emits console.warn). Previously, a permission-denied .gitignore would cause the feature to silently degrade with no feedback.

  2. Bare-name pattern matching (ignore-service.ts) — childrenIgnored now checks both ig.ignores(rel) and ig.ignores(rel + '/') to correctly handle bare-name gitignore patterns (e.g. local without trailing slash). Previously only the trailing-slash form was checked, which could miss valid gitignore patterns.

  3. Renamed isIgnoredDirectoryisHardcodedIgnoredDirectory — The old name implied a complete "is this directory ignored?" check, but it only consulted the hardcoded DEFAULT_IGNORE_LIST. The new name makes the scope explicit.

Design improvements:
4. Added GITNEXUS_NO_GITIGNORE env var — Escape hatch for cases where .gitignore accidentally excludes source files (e.g. *.ts in a mixed TS/JS project, or packages/ in a monorepo). When set, only .gitnexusignore is read. This addresses the review's top blocking concern (D1) without requiring CLI changes.

  1. Added clarifying comments for two design decisions flagged in the review:
    • D2: Negation patterns in .gitnexusignore intentionally cannot override DEFAULT_IGNORE_LIST (safety measure)
    • D3: dot: false in glob is the primary gate for dot-directories; DEFAULT_IGNORE_LIST check in childrenIgnored is defense-in-depth

New tests (7 added):

  • Bare-name directory pattern (no trailing slash)
  • File-glob pattern (*.log)
  • EACCES error handling (skipped on Windows where chmod doesn't apply)
  • GITNEXUS_NO_GITIGNORE env var — skips .gitignore, still reads .gitnexusignore

Deferred items (agreed to skip for now):

  • Bug 4 (memoization) — createIgnoreFilter is called once per analyze run, not worth the complexity
  • Bug 5 (~/.gitignore_global) — DEFAULT_IGNORE_LIST covers most common global patterns; can be a follow-up
  • D2 (negation override) — intentional by design, documented with comments

Rebase & force push

The branch was rebased onto current upstream/main (6c18ae0) and force-pushed. Rationale:

  1. Upstream diverged significantly — 150+ commits including the KuzuDB → LadybugDB migration (5a58508). Without rebase, the PR diff would include all upstream changes, making review impossible.
  2. Zero conflicts in our code — Our changes touch only ignore-service.ts and its tests, which are in the ingestion layer — completely independent from the storage backend migration.
  3. Only conflict was package.json/package-lock.json — Resolved by keeping upstream's @ladybugdb/core and adding our ignore dependency alongside it.
  4. This is a fork feature branch — No other contributors are working on feat/gitignore-support, so force push carries no risk of overwriting anyone's work.

The branch now contains exactly 2 clean commits on top of upstream/main:

  • 9ef7a3e feat(ingestion): respect .gitignore and .gitnexusignore during file discovery
  • f0fc601 fix(ingestion): address review feedback on ignore filtering

"tree-sitter-python": "^0.21.0",
"tree-sitter-ruby": "^0.23.1",
"tree-sitter-rust": "^0.21.0",
"tree-sitter-swift": "^0.6.0",

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you revert this line please?

- Distinguish ENOENT vs EACCES in loadIgnoreRules (warn on permission errors)
- Add GITNEXUS_NO_GITIGNORE env var to bypass .gitignore parsing
- Fix bare-name pattern matching in childrenIgnored (check both with/without trailing slash)
- Rename isIgnoredDirectory to isHardcodedIgnoredDirectory for clarity
- Add clarifying comments for design decisions (D2 negation, D3 dot:false redundancy)
- Add tests for bare-name patterns, file-glob patterns, EACCES handling, env var

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@ivkond ivkond force-pushed the feat/gitignore-support branch from f0fc601 to 6b580f2 Compare March 16, 2026 08:24
Repository owner deleted a comment from claude Bot Mar 16, 2026
@magyargergo

Copy link
Copy Markdown
Collaborator

@claude try to find gaps in this implementation and see use ultra-thinking if it maintains the architecture and it's future proof.

@claude

claude Bot commented Mar 16, 2026

Copy link
Copy Markdown

Claude finished @magyargergo's task in 3m 51s —— View job


Gap Analysis & Architecture Review — PR #231 (Post-Feedback State)

  • Gather context from PR comments and review history
  • Read current state of all changed files
  • Analyze remaining gaps after feedback was addressed
  • Assess architecture and future-proofing
  • Post complete analysis

Context

The previous Claude analysis identified 5 bugs and 3 design concerns. @ivkond addressed the key items in commit 6b580f2. This review assesses the current state of the code.


What Was Fixed (Verified)

Previous Issue Status Evidence
Silent ENOENT vs EACCES failure ✅ Fixed ignore-service.ts:272–277ENOENT skipped silently, others emit console.warn
Bare-name pattern mismatch in childrenIgnored ✅ Fixed ignore-service.ts:314 — checks both ig.ignores(rel) and ig.ignores(rel + '/')
Misleading isIgnoredDirectory name ✅ Fixed Renamed to isHardcodedIgnoredDirectory at line 250
No escape hatch for .gitignore ✅ Fixed GITNEXUS_NO_GITIGNORE env var at lines 262–265
D2/D3 design intent undocumented ✅ Fixed Comments added at lines 194–197, 304–307

Remaining Gaps

G1. GITNEXUS_NO_GITIGNORE is undiscoverable — no documentation surface

The env var is implemented and tested, but it's not surfaced anywhere a user would look:

  • Not in gitnexus analyze --help
  • Not in README or any docs
  • Not as a comment in generated .gitnexusignore

A user whose .gitignore accidentally excludes source files has no way to know this escape hatch exists unless they read the source code. This defeats the purpose of the escape hatch.

Fix this →

G2. ignored() and childrenIgnored() use p.relative() without normalization

// ignore-service.ts:296, 313
const rel = p.relative();

path-scurry's Path.relative() returns POSIX paths on all platforms, so this works on Windows. But this is an implicit contract — there's no guard or comment. If glob's path-scurry version changes this behavior, the ignore package will fail silently on Windows because it requires forward-slash paths. A one-line comment or explicit .replace(/\\/g, '/') guard would make this resilient.

G3. createIgnoreFilter env var is evaluated per-call, not per-process

// ignore-service.ts:263
const filenames = process.env.GITNEXUS_NO_GITIGNORE
  ? ['.gitnexusignore']
  : ['.gitignore', '.gitnexusignore'];

The env var is read inside loadIgnoreRules on every call. In a future watch-mode or incremental re-index scenario, if the env var changes between calls (unlikely but possible with test harnesses), behavior would be inconsistent within a run. Consider reading GITNEXUS_NO_GITIGNORE once at startup and passing it as a parameter — this also makes the function easier to unit-test without process.env mutation.

G4. Integration tests don't cover GITNEXUS_NO_GITIGNORE end-to-end

The unit tests cover the env var (lines 311–358 of ignore-service.test.ts), but no integration test verifies that walkRepositoryPaths correctly uses it. A directory that should be excluded via .gitignore should be included when GITNEXUS_NO_GITIGNORE=1 is set — this end-to-end path is untested.

G5. package-lock.json open review comment (line 38)

@magyargergo requested reverting line 38. After the rebase and force-push, line 38 in the current file is "tree-sitter-rust": "^0.21.0" — which is unchanged from origin/main. The actual PR-introduced change to this file is the "ignore": "^7.0.5" addition at line 24 and the corresponding node_modules/ignore block.

Likely the reviewer's line 38 reference was against the pre-rebase file and the lines shifted. Needs clarification with @magyargergo: is the concern about the ignore package being added as a dependency, or about something else? If the intent is to remove the ignore package entirely, the gitignore parsing would need to be implemented with manual pattern matching — significantly more complex.


Architecture Assessment

The 3-layer design is sound and extensible:

Layer 1: .gitignore / .gitnexusignore  →  ig.ignores() / ig.ignores(rel + '/')
Layer 2: DEFAULT_IGNORE_LIST           →  isHardcodedIgnoredDirectory() 
Layer 3: shouldIgnorePath()            →  extensions, exact filenames

Each layer has a clear responsibility and can be modified independently. The IgnoreLike interface integration is stable across glob versions.

Future-proofing verdict:

Dimension Assessment
Nested .gitignore support Requires interface change to createIgnoreFilter but architecture allows it — pass per-directory ig instances
--no-gitignore CLI flag Clean add: just thread the boolean from CLI through walkRepositoryPaths to createIgnoreFilter
Watch mode / incremental index G3 above is the only concern — minor
Cross-platform (Windows) Works via path-scurry POSIX normalization, but implicit (G2)
ignore package upgrades Stable API; ^7 semver range is safe

The architecture is future-proof for the documented scope. The two paths most likely to be added (nested .gitignore, --no-gitignore CLI flag) are reachable without breaking changes.


Summary

Priority Gap Blocking?
🔴 Clarify package-lock.json:38 review comment with @magyargergo Yes — open CHANGES_REQUESTED
🟡 Document GITNEXUS_NO_GITIGNORE in CLI help / README Recommended
🟢 Add integration test for GITNEXUS_NO_GITIGNORE + walkRepositoryPaths Nice-to-have
🟢 Add path normalization comment or guard in ignored()/childrenIgnored() Nice-to-have

The implementation is solid and correctly addresses the core problem. The only blocker is the unresolved package-lock.json review comment — once that's clarified, this PR is ready to merge from a code quality standpoint.

@magyargergo

Copy link
Copy Markdown
Collaborator

Out out curiousity: why are you using force-push?

@ivkond

ivkond commented Mar 16, 2026

Copy link
Copy Markdown
Contributor Author

Out out curiousity: why are you using force-push?

My preference is to keep PR history clean and transparent — 2 focused commits instead of a merge commit pulling in 150 unrelated upstream changes. This makes the diff reviewable and the intent of each commit clear.

That said, if the project prefers merge-based workflows over rebase, I'm happy to adjust for future contributions. Let me know!

- G1: Document GITNEXUS_NO_GITIGNORE in `analyze --help` and log when active
- G2: Add comment clarifying path-scurry POSIX normalization contract
- G3: Add IgnoreOptions interface — env var now falls back, callers can
  pass `noGitignore` explicitly for testability and future CLI flag
- G4: Add integration test verifying walkRepositoryPaths respects the env var

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@magyargergo

Copy link
Copy Markdown
Collaborator

I think these PRs should be coordinated.

They overlap on .gitnexusignore, but they are not solving exactly the same problem:

My preference would be to use #231 as the base for ignore handling, then extract or fold in the unsupported-language skipping logic from #301, rather than merging two separate ignore implementations.

That would let us:

So from my point of view, these are better treated as complementary pieces that should be consolidated.

@jecanore

Copy link
Copy Markdown
Contributor

@magyargergo Agreed — consolidation is the right call. Here's how I see the two PRs fitting together:

Where #231 is stronger

Automatic .gitignore respect. This is the key differentiator. #301 only reads .gitnexusignore, which means every user with gitignored noise (Docker-mounted plugins, Grafana dirs, build artifacts) must manually duplicate patterns they already have in .gitignore. @ivkond's real-world case — 34% index pollution from a gitignored core/local/grafana/ — would require manual intervention under #301 but works out of the box under #231.

Directory-level pruning via childrenIgnored. #231 integrates with glob's IgnoreLike interface to skip entire subtrees during traversal — no stat() calls on ignored files. #301 loads patterns into shouldIgnorePath() and filters after glob has already walked the tree. For monorepos with large ignored directories, #231 avoids the I/O entirely.

Escape hatch. GITNEXUS_NO_GITIGNORE env var (added in f0fc601 after the first Claude review) handles the footgun case where .gitignore accidentally patterns-match source files. #301 has no equivalent because it doesn't read .gitignore in the first place.

Test coverage. #231 has 25 unit + 8 integration tests across .gitignore, .gitnexusignore, combined, bare-name patterns, EACCES handling, and the env var escape hatch. #301 has 10 unit tests covering .gitnexusignore only.

What #301 adds that #231 doesn't

Unsupported language resilience. isLanguageAvailable() guards in parsing-processor.ts (sequential path) and parse-worker.ts (worker path) prevent crashes when optional grammars like Swift or Kotlin aren't installed. This is a separate concern from ignore filtering — mobile dev repos hit both issues simultaneously, but they're independent bugs.

Kotlin optional import fix. tree-sitter-kotlin was a hard import despite being in optionalDependencies. If not installed, the entire worker module fails to load before any isLanguageAvailable guard runs. #301 converts it to the same try/catch dynamic require pattern used for Swift.

Skip logging. Tracks and reports skipped unsupported languages with file counts (e.g., Skipped unsupported languages: swift: 47) in both sequential and worker paths.

Wiki generator coverage. Moved loadUserIgnore() from fullGeneration() to run() so incrementalUpdate() also loads ignore patterns before calling shouldIgnorePath().

Proposed consolidation path

  1. Merge feat(ingestion): respect .gitignore and .gitnexusignore during file discovery #231 as the ignore foundation — it has the stronger implementation, broader test coverage, and solves the zero-config use case
  2. Rebase style: improve parse-worker JSDoc for isLanguageAvailable #301 on top, dropping the ignore overlap — I'd remove loadUserIgnore/resetUserIgnore from ignore-service.ts and filesystem-walker.ts, keep the isLanguageAvailable guards, Kotlin dynamic import, skip logging, and wiki generator fix
  3. The resulting style: improve parse-worker JSDoc for isLanguageAvailable #301 diff would touch only parser-loader.ts, parse-worker.ts, parsing-processor.ts, and generator.ts — no overlap with feat(ingestion): respect .gitignore and .gitnexusignore during file discovery #231's files

@ivkond happy to coordinate on timing. If #231 merges first, I can rebase #301 within a day to cleanly layer the language resilience on top.

@magyargergo

Copy link
Copy Markdown
Collaborator

One more thing I would like to ask befor I merge this, is to add some e2e test coverage with a small fixture example test set to have this tested end-to-end. Plus you'll need to update the GitHub worlflow to include that new test.

Thank you!

…rammars

Port unsupported language resilience from PR #301 by @jecanore.
- Make Kotlin import optional (like Swift) in parser-loader and parse-worker
- Add worker-local isLanguageAvailable() with filePath param for tsx distinction
- Track and log skipped files per language in both sequential and worker paths
- Add skippedLanguages to ParseWorkerResult for worker→main aggregation
- Add isLanguageAvailable unit tests

Refs: #301, #155, #228

Co-Authored-By: jecanore <juan@housingbase.io>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@ivkond

ivkond commented Mar 16, 2026

Copy link
Copy Markdown
Contributor Author

Consolidated with PR #301 (unsupported language resilience)

Per @magyargergo's request, this commit ports the language-skip logic from PR #301 by @jecanore into this branch.

Ported from PR #301

  • Kotlin import made optional (try/catch dynamic require, matching Swift pattern) in parser-loader.ts and parse-worker.ts
  • isLanguageAvailable() guards before setLanguage()/loadLanguage() in both worker and sequential paths
  • skippedLanguages tracking in ParseWorkerResult — aggregated across workers, logged as summary (e.g. Skipped unsupported languages: swift: 47)
  • Unit tests for isLanguageAvailable

Not ported

PR #301's ignore-service changes — our implementation is more complete (.gitignore + .gitnexusignore, env var escape hatch, EACCES handling, directory pruning).

Attribution

Commit includes Co-Authored-By: jecanore <juan@housingbase.io>.

This PR now addresses both #228 and #155 in a single branch.

@ivkond

ivkond commented Mar 16, 2026

Copy link
Copy Markdown
Contributor Author

One more thing I would like to ask befor I merge this, is to add some e2e test coverage with a small fixture example test set to have this tested end-to-end. Plus you'll need to update the GitHub worlflow to include that new test.

Thank you!

Thanks for the feedback! To clarify the scope, do you mean:

  1. A fixture repo (e.g. test/fixtures/sample-repo/) with a .gitignore, .gitnexusignore, and a few source files (including a .swift file to trigger the language skip path), then an E2E test that runs walkRepositoryPathsprocessParsing on it and asserts:

    • Gitignored paths are excluded
    • .gitnexusignore patterns are respected
    • Unsupported language files are skipped (not crashed), with correct skippedLanguages counts
  2. CI workflow update — should this be added to the existing test job, or as a separate step?

I want to make sure the fixture scope and workflow placement match your expectations before implementing.

@magyargergo

magyargergo commented Mar 16, 2026

Copy link
Copy Markdown
Collaborator

One more thing I would like to ask befor I merge this, is to add some e2e test coverage with a small fixture example test set to have this tested end-to-end. Plus you'll need to update the GitHub worlflow to include that new test.
Thank you!

Thanks for the feedback! To clarify the scope, do you mean:

  1. A fixture repo (e.g. test/fixtures/sample-repo/) with a .gitignore, .gitnexusignore, and a few source files (including a .swift file to trigger the language skip path), then an E2E test that runs walkRepositoryPathsprocessParsing on it and asserts:
    ...

Yes please!

  1. CI workflow update — should this be added to the existing test job, or as a separate step?

Already existing test job in .github\workflows\ci-integration.yml

image image

This is a bit of mess but hoping for automatic discovery after my PR gets merged in LadybugDB/ladybug-nodejs#1.

@magyargergo

Copy link
Copy Markdown
Collaborator

Consolidated with PR #301 (unsupported language resilience)

Per @magyargergo's request, this commit ports the language-skip logic from PR #301 by @jecanore into this branch.

Ported from PR #301

  • Kotlin import made optional (try/catch dynamic require, matching Swift pattern) in parser-loader.ts and parse-worker.ts
  • isLanguageAvailable() guards before setLanguage()/loadLanguage() in both worker and sequential paths
  • skippedLanguages tracking in ParseWorkerResult — aggregated across workers, logged as summary (e.g. Skipped unsupported languages: swift: 47)
  • Unit tests for isLanguageAvailable

Not ported

PR #301's ignore-service changes — our implementation is more complete (.gitignore + .gitnexusignore, env var escape hatch, EACCES handling, directory pruning).

Attribution

Commit includes Co-Authored-By: jecanore <juan@housingbase.io>.

This PR now addresses both #228 and #155 in a single branch.

@jecanore arre you happy with this approach? This is not what I had in mind honestly but works either ways.

Add a fixture repo (test/fixtures/ignore-and-skip-repo/) with .gitignore,
.gitnexusignore, TypeScript source files, and a Swift file to exercise
all three features end-to-end:

- File discovery: verifies .gitignore excludes data/ and *.log,
  .gitnexusignore excludes vendor/, source files are discovered
- Parsing: verifies TypeScript files produce Function nodes and DEFINES
  relationships, Swift files are skipped gracefully when grammar is
  unavailable

Add the test to the standalone group in ci-integration.yml and coverage job.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Comment thread .github/workflows/ci-integration.yml Outdated
Repository owner deleted a comment from claude Bot Mar 16, 2026
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@magyargergo

Copy link
Copy Markdown
Collaborator

Your tests failed and please make sure you run this locally!

image

jecanore added a commit to jecanore/GitNexus that referenced this pull request Mar 16, 2026
Clarify that the filePath parameter handles .tsx distinction at the
worker level, while the main-thread version does this inside
loadLanguage() instead.

Rebased on top of abhigyanpatwari#231 — all ignore and language resilience changes
from abhigyanpatwari#301 were already ported by ivkond in 505fc9b.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
ivkond and others added 2 commits March 16, 2026 12:43
The fixture's .gitignore prevented data/seed.json and debug.log from
being committed — these files would be missing after checkout in CI.

Switch to creating the entire test structure in a temp directory via
beforeAll (matching filesystem-walker.test.ts pattern). This ensures
all files exist regardless of git ignore rules.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Use graph.nodes property getter instead of graph.getNodes(), and check
Function node filePath instead of non-existent File nodes (File nodes
are created by processStructure, not processParsing).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Repository owner deleted a comment from claude Bot Mar 16, 2026
@magyargergo

Copy link
Copy Markdown
Collaborator

This is unreleated but can you please add the workflows permission into .github/workflows/ci-integration.yml ?

@ivkond

ivkond commented Mar 16, 2026

Copy link
Copy Markdown
Contributor Author

@magyargergo, shure, omw.

@ivkond

ivkond commented Mar 16, 2026

Copy link
Copy Markdown
Contributor Author

@magyargergo, just to clarify — ci-integration.yml is a reusable workflow, so permissions are passed from the caller in ci.yml. Should I add workflows: read to the caller block in ci.yml (line 39), or do you want a permissions block inside ci-integration.yml itself?

@magyargergo

Copy link
Copy Markdown
Collaborator

... do you want a permissions block inside ci-integration.yml itself?

This one please 🙏

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Repository owner deleted a comment from claude Bot Mar 16, 2026
Comment thread .github/workflows/ci-integration.yml Outdated
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Repository owner deleted a comment from claude Bot Mar 16, 2026
@magyargergo

Copy link
Copy Markdown
Collaborator

Apologies, could we move this up to the ci.yaml at the beginning? 🙏

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Repository owner deleted a comment from claude Bot Mar 16, 2026
…s perm

Three issues prevented Claude from running on fork PRs:

1. claude-code-review.yml lacked workflows:write — push failed when
   fork PRs modify .github/workflows/ files
2. claude.yml had no fork PR support — checked out main and couldn't
   fetch the fork's branch from origin
3. Cleanup step unconditionally deleted branches even when push failed,
   breaking the concurrent claude.yml workflow

Also removes workflows:write from ci.yml's integration job — CI tests
don't need that permission. The permission belongs on the claude
workflows that push fork branches.

Changes:
- Add workflows:write to both claude workflow permissions blocks
- Add fork PR detection + branch push/cleanup to claude.yml
- Add step id to push-fork; cleanup only runs if push succeeded
- Pass branch names via env vars to prevent shell injection (security)
- Add concurrency groups to prevent race conditions between workflows
- Remove misplaced workflows:write from ci.yml integration job
GITHUB_TOKEN cannot have 'workflows' permission — it's only valid for
PATs and GitHub Apps. This means git push fails whenever a fork PR
modifies .github/workflows/ files.

Replace git push with the GitHub REST API (POST/PATCH /git/refs) to
create temporary branch refs. The API creates a pointer to the
already-existing PR head commit without triggering the workflow file
push protection. Similarly, cleanup uses DELETE /git/refs instead of
git push --delete.

Also removes the invalid 'workflows: write' from permissions blocks.
Repository owner deleted a comment from claude Bot Mar 16, 2026
@magyargergo magyargergo merged commit fbff6d0 into abhigyanpatwari:main Mar 16, 2026
20 of 21 checks passed
@magyargergo

Copy link
Copy Markdown
Collaborator

Thank you for your contribution!

motolese pushed a commit to motolese/datamoto-gitnexus that referenced this pull request Apr 23, 2026
…iscovery (abhigyanpatwari#231)

* feat(ingestion): respect .gitignore and .gitnexusignore during file discovery

Add support for excluding files from indexing based on .gitignore and
.gitnexusignore patterns. Previously, GitNexus used only a hardcoded
ignore list, causing significant index pollution in repositories with
git-ignored directories containing code (e.g., Docker-mounted volumes).

Changes:
- Add `ignore` package for gitignore-spec pattern matching
- Add `loadIgnoreRules()` to parse .gitignore + .gitnexusignore
- Add `createIgnoreFilter()` returning glob-compatible IgnoreLike object
- Integrate filter into glob's `ignore` option for directory-level pruning
- Remove post-glob `.filter()` call (now handled during traversal)

The hardcoded DEFAULT_IGNORE_LIST remains as fallback for non-git repos.

Closes abhigyanpatwari#228

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix(ingestion): address review feedback on ignore filtering

- Distinguish ENOENT vs EACCES in loadIgnoreRules (warn on permission errors)
- Add GITNEXUS_NO_GITIGNORE env var to bypass .gitignore parsing
- Fix bare-name pattern matching in childrenIgnored (check both with/without trailing slash)
- Rename isIgnoredDirectory to isHardcodedIgnoredDirectory for clarity
- Add clarifying comments for design decisions (D2 negation, D3 dot:false redundancy)
- Add tests for bare-name patterns, file-glob patterns, EACCES handling, env var

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix(ingestion): address second round of review feedback

- G1: Document GITNEXUS_NO_GITIGNORE in `analyze --help` and log when active
- G2: Add comment clarifying path-scurry POSIX normalization contract
- G3: Add IgnoreOptions interface — env var now falls back, callers can
  pass `noGitignore` explicitly for testability and future CLI flag
- G4: Add integration test verifying walkRepositoryPaths respects the env var

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* feat(ingestion): gracefully skip files with unavailable tree-sitter grammars

Port unsupported language resilience from PR abhigyanpatwari#301 by @jecanore.
- Make Kotlin import optional (like Swift) in parser-loader and parse-worker
- Add worker-local isLanguageAvailable() with filePath param for tsx distinction
- Track and log skipped files per language in both sequential and worker paths
- Add skippedLanguages to ParseWorkerResult for worker→main aggregation
- Add isLanguageAvailable unit tests

Refs: abhigyanpatwari#301, abhigyanpatwari#155, abhigyanpatwari#228

Co-Authored-By: jecanore <juan@housingbase.io>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* test(e2e): add ignore + language-skip end-to-end test with fixture repo

Add a fixture repo (test/fixtures/ignore-and-skip-repo/) with .gitignore,
.gitnexusignore, TypeScript source files, and a Swift file to exercise
all three features end-to-end:

- File discovery: verifies .gitignore excludes data/ and *.log,
  .gitnexusignore excludes vendor/, source files are discovered
- Parsing: verifies TypeScript files produce Function nodes and DEFINES
  relationships, Swift files are skipped gracefully when grammar is
  unavailable

Add the test to the standalone group in ci-integration.yml and coverage job.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix(ci): move ignore-and-skip-e2e test to e2e group per review feedback

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix(test): use temp directory instead of fixture for e2e ignore test

The fixture's .gitignore prevented data/seed.json and debug.log from
being committed — these files would be missing after checkout in CI.

Switch to creating the entire test structure in a temp directory via
beforeAll (matching filesystem-walker.test.ts pattern). This ensures
all files exist regardless of git ignore rules.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix(test): correct graph API usage in e2e ignore test

Use graph.nodes property getter instead of graph.getNodes(), and check
Function node filePath instead of non-existent File nodes (File nodes
are created by processStructure, not processParsing).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* ci: add workflows permission to ci-integration.yml

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* ci: change workflows permission to write per review

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* ci: move workflows permission from ci-integration.yml to ci.yml caller

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix(ci): fix Claude workflows for fork PRs, remove misplaced workflows perm

Three issues prevented Claude from running on fork PRs:

1. claude-code-review.yml lacked workflows:write — push failed when
   fork PRs modify .github/workflows/ files
2. claude.yml had no fork PR support — checked out main and couldn't
   fetch the fork's branch from origin
3. Cleanup step unconditionally deleted branches even when push failed,
   breaking the concurrent claude.yml workflow

Also removes workflows:write from ci.yml's integration job — CI tests
don't need that permission. The permission belongs on the claude
workflows that push fork branches.

Changes:
- Add workflows:write to both claude workflow permissions blocks
- Add fork PR detection + branch push/cleanup to claude.yml
- Add step id to push-fork; cleanup only runs if push succeeded
- Pass branch names via env vars to prevent shell injection (security)
- Add concurrency groups to prevent race conditions between workflows
- Remove misplaced workflows:write from ci.yml integration job

* fix(ci): use GitHub API for fork branch refs instead of git push

GITHUB_TOKEN cannot have 'workflows' permission — it's only valid for
PATs and GitHub Apps. This means git push fails whenever a fork PR
modifies .github/workflows/ files.

Replace git push with the GitHub REST API (POST/PATCH /git/refs) to
create temporary branch refs. The API creates a pointer to the
already-existing PR head commit without triggering the workflow file
push protection. Similarly, cleanup uses DELETE /git/refs instead of
git push --delete.

Also removes the invalid 'workflows: write' from permissions blocks.

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: jecanore <juan@housingbase.io>
Co-authored-by: Gergo Magyar <gergomagyar@icloud.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Support .gitignore / .gitnexusignore for excluding directories from indexing

3 participants