Skip to content

feat(cli): infer registry name from git remote.origin.url#981

Merged
magyargergo merged 5 commits into
mainfrom
copilot/fix-index-name-inference
Apr 19, 2026
Merged

feat(cli): infer registry name from git remote.origin.url#981
magyargergo merged 5 commits into
mainfrom
copilot/fix-index-name-inference

Conversation

Copilot AI commented Apr 19, 2026

Copy link
Copy Markdown
Contributor

gitnexus analyze derives the registry name from path.basename(repoPath), which produces useless / colliding names for layouts where the leaf directory is generic — Gas-Town <rig>/refinery/rig/, monorepo subprojects (packages/api, packages/web), git worktrees, vendored mirrors. PR #955 added --name and a collision guard (Fix A + C from the issue) but did nothing for the common case where the user runs analyze with no flags.

This PR adds Fix B: when --name is not supplied, derive the name from git config --get remote.origin.url before falling back to the basename.

Changes

  • gitnexus/src/storage/git.ts — new getRemoteOriginUrl, parseRepoNameFromUrl (handles HTTPS, SSH git@host:, ssh://, git://, file://; strips .git and trailing slashes), and getInferredRepoName wrapper. All defensive — return null on missing git / --skip-git / no origin.
  • gitnexus/src/storage/repo-manager.tsregisterRepo precedence is now explicit --name > preserved alias > remote-inferred > basename. The git config subprocess is skipped when --name is supplied; otherwise the inferred name is computed once and shared between the alias-preservation check and the fallback. hasCustomAlias takes the inferred name as a parameter to avoid re-deriving it. registerRepo now returns the resolved name (was void); existing callers await and discard, so non-breaking.
  • gitnexus/src/core/run-analyze.tsprojectName is now the value returned by registerRepo, so AGENTS.md / skill / AI-context files reference the same name MCP clients will look up. Early-return path uses the same inference for consistency.

Behaviour

~/gt/lume_spark/refinery/rig$ git remote -v
origin  git@github.com:gastown/lume_spark.git

~/gt/lume_spark/refinery/rig$ gitnexus analyze
# before: registered as "rig"  (collides with every other rig/)
# after:  registered as "lume_spark"

Precedence chain in registerRepo:

if (opts?.name !== undefined) {
  name = opts.name;                                  // 1. explicit --name
} else {
  const inferred = getInferredRepoName(resolved);
  if (existing && hasCustomAlias(existing, inferred)) {
    name = existing.name;                            // 2. preserved alias
  } else {
    name = inferred ?? path.basename(resolved);      // 3. remote → 4. basename
  }
}

Notes for reviewers

  • A repo previously registered under its basename will, on next analyze, switch to the remote-derived name when those differ. This is the intended fix; the operator can pin the old value with --name <basename>.
  • The collision guard is intentionally not extended to inferred-name collisions — feat(cli): analyze --name <alias> + duplicate-name guard for the repo registry #955 deliberately keeps un-aliased collisions silent for backward compat. Remote inference makes these collisions much rarer in practice; for the residual cases --name / --allow-duplicate-name remain the escape hatches.
  • hasCustomAlias treats a stored remote-inferred name as an inference (not a sticky alias), so re-analyses keep tracking remote renames.

Tests

gitnexus/test/unit/repo-manager.test.ts — 14 new tests:

  • parseRepoNameFromUrl: HTTPS, SSH (git@host:), ssh://, git://, file://, trailing slashes, mixed-case .GIT, empty / null / undefined.
  • getInferredRepoName against real git init repos: no .git, no origin, with origin.
  • registerRepo precedence with real git init + remote add origin: Gas-Town <rig>/refinery/rig/ repro, two distinctly-remoted rig/ leaves no longer collide, --name wins, preserved alias wins, basename fallback when no remote.

Warning

Firewall rules blocked me from connecting to one or more addresses (expand for details)

I tried to connect to the following addresses, but was blocked by firewall rules:

  • extension.ladybugdb.com
    • Triggering command: /home/REDACTED/work/_temp/ghcca-node/node/bin/node node /home/REDACTED/work/GitNexus/GitNexus/gitnexus/node_modules/.bin/vitest run test/unit/repo-manager.test.ts /hom�� arget/node_modul-I --64 (dns block)
    • Triggering command: /home/REDACTED/work/_temp/ghcca-node/node/bin/node node /home/REDACTED/work/GitNexus/GitNexus/gitnexus/node_modules/.bin/vitest run test/unit/repo-manager.test.ts -c stdout.buffer.wr--hash-style=gnu (dns block)
    • Triggering command: /home/REDACTED/work/_temp/ghcca-node/node/bin/node node /home/REDACTED/work/GitNexus/GitNexus/gitnexus/node_modules/.bin/vitest run test/unit/run-analyze.test.ts test/unit/index-repo-command.test.ts test/unit/repo-manager.test.ts (dns block)

If you need me to access, download, or install something from one of these locations, you can either:

@vercel

vercel Bot commented Apr 19, 2026

Copy link
Copy Markdown

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
gitnexus Ready Ready Preview, Comment Apr 19, 2026 7:48am

Request Review

Agent-Logs-Url: https://github.com/abhigyanpatwari/GitNexus/sessions/95064d2d-b1da-4c89-9069-5b3e9cc2636a

Co-authored-by: magyargergo <11230420+magyargergo@users.noreply.github.com>
Copilot AI changed the title [WIP] Fix index name inference for generic leaf directories feat(cli): infer registry name from git remote.origin.url Apr 19, 2026
Copilot AI requested a review from magyargergo April 19, 2026 07:03
@magyargergo

Copy link
Copy Markdown
Collaborator

@claude Act as a senior reviewer for GitNexus. Your job is to determine whether this PR is production-ready for this repo, not to give a generic code review.

You are reviewing a PR in the GitNexus monorepo:

  • gitnexus/ → CLI + MCP
  • gitnexus-web/
  • gitnexus-shared/

Your task has 2 phases, in this exact order:

PHASE 1 — DEFINE THE BAR
Before reviewing the diff, establish a concise repo-specific definition of “production-ready” for GitNexus, based only on the repo docs and the affected area.
Keep this definition practical and reviewable. Do not invent standards that are not grounded in the repo.

PHASE 2 — REVIEW THE PR AGAINST THAT BAR
Review the actual diff only after defining the bar.
Stay tightly scoped to the changed code and its direct consequences.


CONTEXT TO LOAD FIRST
Read these before reviewing:

  • AGENTS.md
  • GUARDRAILS.md
  • CONTRIBUTING.md
  • TESTING.md
  • ARCHITECTURE.md

Additional context:


PRIMARY OBJECTIVE
Decide whether this PR is safe, correct, maintainable, and operationally acceptable to merge into production for GitNexus.

Do not optimize for completeness at the expense of signal.
Do not pad the review.
Do not propose unrelated refactors.
Do not restate the PR description unless needed for verification.


REVIEW RULES

  • Every finding must be grounded in specific evidence from the diff or directly relevant surrounding code.
  • Every finding must include path:line.
  • If you make a behavioral claim, cite the code that proves it.
  • If you make a performance claim, explain the mechanism.
  • If something cannot be verified from the diff alone, explicitly say so.
  • Distinguish clearly between:
    • verified issue
    • plausible risk
    • unverified concern
  • Avoid vague wording like “might be better” or “could be improved” unless you explain exactly why.
  • Keep the review focused on this PR’s scope only.

For each finding, assign one severity:

  • BLOCKING → must be fixed before merge
  • NON-BLOCKING → valid issue, but merge may still be acceptable
  • NIT → stylistic/minor, not merge-relevant

REPO-SPECIFIC REVIEW CHECKLIST
Use these exact headings.

1. Correctness & functional completeness

Check:

  • Does the implementation actually satisfy the PR claim?
    • ManifestExtractor is truly invoked
    • config.links produces non-zero cross-links where expected
  • Resolver contracts are preserved:
    • resolveSymbol remains exact-match
    • label-scoped Cypher remains correct per contract type
    • flag any regression toward fuzzy or unscoped matching
  • Graph schema integrity is preserved:
    • no silent changes to node labels
    • no silent changes to edge types
    • no silent changes to ID generation (generateId)
  • Call out any missing wiring, partial integration, dead branch, or mismatch between tests and runtime behavior

2. Code clarity & clean code

Check:

  • naming quality
  • local cohesion
  • dead code
  • unnecessary abstraction
  • hidden control flow
  • confusing indirection
  • adherence to repo conventions:
    • direct imports from gitnexus-shared
    • no barrel re-export regressions
    • no // removed comments
    • no unused re-exports
  • no drive-by refactors outside stated scope per CONTRIBUTING.md and GUARDRAILS.md § Scope

3. Test coverage & change safety

Evaluate against TESTING.md:

  • Are there unit tests under gitnexus/test/unit/ covering the newly wired path?
  • Is there a regression guard for 0-link → N-link behavior?
  • Are assertions meaningful rather than tautological?
  • Are fixtures realistic for manifest inputs?
  • If memoization/cache was introduced, is there a test proving hit/miss behavior and correctness?
  • Is there evidence the expected validation path would pass for staged gitnexus/ files?
    • tsc --noEmit
    • vitest run --project default
      If not verifiable, say exactly what is missing.

4. Performance

Inspect for:

  • hot-path overhead in ingestion/group sync
  • excess allocations per manifest entry
  • redundant Cypher round-trips
  • missed batching or missed parallelism (Promise.all) where it materially matters
  • O(n²) or repeated lookup patterns on large repos
  • memoization tradeoffs:
    • correctness
    • invalidation
    • bounded vs unbounded memory growth
      Do not speculate casually; explain the mechanism and likely impact.

5. Operational risk

Check:

  • Windows/cross-platform safety:
    • stream lifecycle
    • FD/file handle lifecycle
    • path separator assumptions
    • anything resembling prior ENOTEMPTY-style lifecycle regressions
  • LadybugDB single-writer invariant is preserved
  • Embeddings preservation:
    • no silent breakage of --embeddings
    • .gitnexus/meta.json.stats.embeddings not silently zeroed by changed paths
  • MCP contracts remain compatible:
    • group_*
    • query
    • context
    • impact
    • detect_changes
    • rename
    • cypher
      Flag any schema or contract break without migration note
  • staleness behavior still triggers correctly (gitnexus/src/mcp/staleness.ts)
  • rollback safety:
    • can this PR be reverted safely without re-analyze?
    • if not, explain why

6. Maintainability

Check:

  • Does the change respect Pipeline Phase DAG boundaries?
  • Is cross-phase coupling being introduced?
  • Are manifest-extractor entry points discoverable from group/sync.ts?
  • Would the next contributor know where to extend this?
  • Is shared logic placed appropriately:
    • in gitnexus-shared/
    • or gitnexus/src/core/**/utils.ts
    • and not duplicated between CLI and web bridge (server/api.ts)?

NON-NEGOTIABLES TO VERIFY
From GUARDRAILS.md / CONTRIBUTING.md:

  • no secrets, tokens, or machine-specific paths committed
  • no blind find-and-replace renames
  • if a symbol rename occurred, rename MCP tool or equivalent careful review was used
  • upstream impact was considered for modified shared symbols
  • no HIGH/CRITICAL impact left unaddressed
  • PR title follows conventional commit style + [area] Short description
  • PolyForm Noncommercial license headers/metadata untouched

OUTPUT FORMAT

Production-ready bar for this PR

Give a concise repo-specific definition in 5–10 bullets.

Findings

For each finding, use this structure exactly:

[BLOCKING|NON-BLOCKING|NIT] Short title

  • Evidence: path:line, path:line
  • Why it matters: ...
  • Recommendation: ...
  • Confidence: High / Medium / Low

Only include findings that are real and supported.
If there are no findings in a section, say:

  • No issues found in this area based on the diff and reviewed context.

What I could not fully verify

List any behavior you could not prove from the diff, tests, or available context.

Final verdict

Choose exactly one:

  • ✅ Ready to merge — no blocking issues.
  • ⚠️ Merge after addressing blocking items.
  • ❌ Not ready — fundamental concerns.

If verdict is not ready, list the blocking items in priority order.


IMPORTANT REVIEW BEHAVIOR

  • Be strict on correctness and contracts.
  • Be conservative about operational risk.
  • Be explicit about uncertainty.
  • Do not suggest unrelated refactors.
  • Do not turn nits into blockers.
  • Do not reward intent; review actual implementation.

@github-actions

github-actions Bot commented Apr 19, 2026

Copy link
Copy Markdown
Contributor

Claude finished @magyargergo's task in 3m 47s —— View job


PR Review: feat(cli): infer registry name from git remote.origin.url

Tasks

  • Initialize review
  • Read repo docs (AGENTS.md, GUARDRAILS.md, CONTRIBUTING.md, TESTING.md, ARCHITECTURE.md)
  • Read changed files (git.ts, repo-manager.ts, run-analyze.ts, test file, analyze.ts, server-mapping.ts)
  • Phase 1: Define production-ready bar
  • Phase 2: Review diff against bar
  • Post final verdict

Production-ready bar for this PR

This PR touches the gitnexus/src/storage/ layer (registry name resolution) and gitnexus/src/core/run-analyze.ts (orchestrator output). The bar for this area:

  1. Registry correctness — name resolution precedence must be deterministic and non-lossy; existing aliases must survive re-analysis; collision guard behavior is unchanged.
  2. No MCP contract breakageregistry.json discovery key (name) changing on next analyze is intended and documented; no MCP tool signatures changed.
  3. Subprocess safetyexecSync calls are wrapped in try/catch returning null; no unhandled exits.
  4. Cross-platform — Windows path normalization via path.resolve/path.basename is already in use; the new regex operates on URL strings (forward-slash safe).
  5. Test coverage — critical precedence paths (explicit --name, preserved alias, remote inference, basename fallback) tested against real git init repos; parseRepoNameFromUrl covers all documented URL shapes.
  6. No secrets / no blind renames — diff is scoped to storage + orchestrator; no shared symbols renamed without impact analysis.
  7. Backward compat — un-aliased basename collisions still register silently; registerRepo's void → string return is non-breaking at call sites.
  8. Embeddings preservation--embeddings path is unchanged; no silent zeroing of stats.embeddings.

Findings

1. Correctness & functional completeness

The 4-level precedence chain in registerRepo (repo-manager.ts:368–381) is correctly implemented:

opts?.name  →  hasCustomAlias (preserved alias)  →  getInferredRepoName  →  path.basename

hasCustomAlias correctly treats a stored remote-inferred name as non-sticky (repo-manager.ts:316–321): it returns false if entry.name === inferredName, ensuring remote renames propagate on next analyze. If inferredName is null (no git or no remote), it falls through correctly.

The --name supplied path skips getInferredRepoName entirely (repo-manager.ts:368–369). Confirmed no subprocess wasted.

The early-return repoName at run-analyze.ts:155–157 uses getInferredRepoName rather than reading the registry. This appears inconsistent (wouldn't reflect a preserved alias), but is verified harmless: analyze.ts:231–242 shows the CLI discards repoName entirely on the alreadyUpToDate: true path — it just prints "Already up to date" and returns without using result.repoName.

  • No issues found in correctness.

2. Code clarity & clean code

No // removed comments, no barrel re-export regressions, no drive-by refactors. Naming is clear. hasCustomAlias is a module-private function, appropriately so.

[NON-BLOCKING] Embedding projectName still uses basename

  • Evidence: run-analyze.ts:288–289
    const projectName = path.basename(repoPath);    // inside if (!embeddingSkipped) block
    const serverName = await readServerMapping(projectName);
    run-analyze.ts:347 then re-declares const projectName = await registerRepo(...) in the outer scope for AI-context file generation.
  • Why it matters: server-mapping.ts:26 confirms readServerMapping keys its lookup on repoName. For a Gas-Town-style rig/ repo, the registry name becomes lume_spark but server-mapping.json must still be keyed by "rig". A user setting up server mapping post-PR would have a confusing split: registry key = remote-inferred name, server-mapping key = basename.
  • Recommendation: Document in the PR that server-mapping.json remains basename-keyed, or align projectName in the embedding path with the resolved registry name. Not blocking since this was already the existing behavior and does not regress anyone.
  • Confidence: High

3. Test coverage & change safety

  • parseRepoNameFromUrl: 6 test cases covering HTTPS, SSH git@host:, ssh://, git://, file://, trailing slashes, mixed-case .GIT, empty/null/undefined. Comprehensive.
  • getInferredRepoName: no-git, no-origin, with-origin — all exercised against real git init repos. Correct approach for subprocess-dependent logic.
  • registerRepo precedence: Gas-Town repro, two distinct remotes, --name wins, preserved alias wins, basename fallback. The critical regression scenario is covered end-to-end.

The Vitest run had firewall DNS blocks (extension.ladybugdb.com) but those are known CI environment issues, not test failures. The test logic itself is sound.

[NIT] Unquoted URL in shell string in test helper

  • Evidence: test/unit/repo-manager.test.ts:329
    execSync(`git remote add origin ${remoteUrl}`, { cwd: dir });
  • Why it matters: If remoteUrl contained spaces, this would fail. Test fixtures use hardcoded URLs so there is no practical risk here.
  • Recommendation: Use execSync('git remote add origin', { ... }) with args array form, or just leave it — the test URLs are safe.
  • Confidence: High (safe as-is given the test data)

4. Performance

getInferredRepoName spawns one git config --get remote.origin.url subprocess per registerRepo call when --name is not supplied. This is at step 9 of the runFullAnalysis flow (post-pipeline), executes once per analyze, and is a trivially cheap command. No hot-path concern.

The early-return path at run-analyze.ts:155 also calls getInferredRepoName. Since this path exits before opening LadybugDB, the subprocess cost is negligible.

  • No issues found in performance.

5. Operational risk

Windows/cross-platform: parseRepoNameFromUrl uses [/:] splitting on URL strings — forward-slash-only, safe on Windows. path.resolve normalizes registry paths. No file-handle or stream changes.

LadybugDB single-writer invariant: Not touched.

Embeddings: The --embeddings code path is structurally unchanged. stats.embeddings is populated the same way. No silent zeroing.

MCP contracts: No tool signatures changed. The name key in registry.json changes on next analyze for repos where basename ≠ remote-derived name. This is the intended fix and is documented in the PR. After one analyze, both the registry and the regenerated AGENTS.md (via generateAIContextFiles at run-analyze.ts:369) will be consistent. The risk window (stale external repo: "rig" references) is small and documented.

Rollback safety: Reverting this PR does not require re-analyze. The registry entries written with remote-derived names will persist until the next analyze, at which point the reverted code would revert them to basename. Stateless revert is safe.

  • No blocking issues found in operational risk.

6. Maintainability

getRemoteOriginUrl, parseRepoNameFromUrl, and getInferredRepoName are correctly placed in gitnexus/src/storage/git.ts — the existing module for git subprocess utilities. No Pipeline Phase DAG boundaries are crossed (this is storage/metadata, not ingestion).

hasCustomAlias is correctly co-located with registerRepo as a private helper. The comment at repo-manager.ts:307–315 fully explains the invariant, including the parameter change rationale.

The next contributor wanting to extend inference (e.g., support pushurl, multiple remotes) has a clear single entry point: getRemoteOriginUrl / parseRepoNameFromUrl.

  • No issues found in maintainability.

Non-negotiables check

  • ✅ No secrets, tokens, or machine-specific paths committed
  • ✅ No blind find-and-replace — only the new getInferredRepoName import and registerRepo return type changed; no symbol renames
  • registerRepo return type change (void → string) checked: call sites in run-analyze.ts now use the return value; other callers that discarded the result compile cleanly
  • ✅ PR title feat(cli): infer registry name from \git remote.origin.url`` follows conventional commit format
  • ✅ PolyForm license headers/metadata untouched
  • ✅ No HIGH/CRITICAL impact left unaddressed

What I could not fully verify

  • runEmbeddingPipeline's { repoName, serverName } parameter usage — whether repoName is purely metadata/logging or used as a lookup key that could diverge from the registry name for Gas-Town repos. Based on server-mapping.ts, it appears serverName (not repoName) is the operative key, but I did not read embedding-pipeline.ts.
  • Whether any callers of runFullAnalysis outside the CLI (server workers, --skills path) depend on AnalyzeResult.repoName from the early-return path for anything beyond display. Based on analyze.ts:231–242, the CLI does not; server worker callers not checked.
  • CI green status — the test run had DNS blocks that prevented full Vitest execution in the copilot environment.

Final verdict

Ready to merge — no blocking issues.

The implementation is correct, the precedence chain is sound, test coverage is comprehensive against real git repos, and the one NON-BLOCKING finding (embedding projectName still using basename for server-mapping.json lookup) preserves existing behavior rather than breaking it. The operational risk (registry name changes on next analyze for affected repos) is documented and intentional.

@magyargergo

Copy link
Copy Markdown
Collaborator

@copilot please format the code as per CI job requirements

@magyargergo magyargergo marked this pull request as ready for review April 19, 2026 07:46
Agent-Logs-Url: https://github.com/abhigyanpatwari/GitNexus/sessions/a4bf631d-ea6b-4d84-b426-29b1e5c3539f

Co-authored-by: magyargergo <11230420+magyargergo@users.noreply.github.com>

Copilot AI commented Apr 19, 2026

Copy link
Copy Markdown
Contributor Author

@copilot please format the code as per CI job requirements

Done in 97ee04dprettier --write reformatted one line in gitnexus/src/core/run-analyze.ts. prettier --check is now clean across all four changed files.

@magyargergo magyargergo merged commit 9926804 into main Apr 19, 2026
14 checks passed
@magyargergo magyargergo deleted the copilot/fix-index-name-inference branch April 19, 2026 08:14
github714801013 pushed a commit to github714801013/GitNexus that referenced this pull request Apr 28, 2026
…patwari#981)

* Initial plan

* Plan: smarter index name inference via git remote URL

Agent-Logs-Url: https://github.com/abhigyanpatwari/GitNexus/sessions/95064d2d-b1da-4c89-9069-5b3e9cc2636a

Co-authored-by: magyargergo <11230420+magyargergo@users.noreply.github.com>

* feat(cli): infer registry name from git remote.origin.url (abhigyanpatwari#979)

Agent-Logs-Url: https://github.com/abhigyanpatwari/GitNexus/sessions/95064d2d-b1da-4c89-9069-5b3e9cc2636a

Co-authored-by: magyargergo <11230420+magyargergo@users.noreply.github.com>

* refactor: skip git subprocess when --name was supplied (review feedback)

Agent-Logs-Url: https://github.com/abhigyanpatwari/GitNexus/sessions/95064d2d-b1da-4c89-9069-5b3e9cc2636a

Co-authored-by: magyargergo <11230420+magyargergo@users.noreply.github.com>

* style: prettier --write on run-analyze.ts

Agent-Logs-Url: https://github.com/abhigyanpatwari/GitNexus/sessions/a4bf631d-ea6b-4d84-b426-29b1e5c3539f

Co-authored-by: magyargergo <11230420+magyargergo@users.noreply.github.com>

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: magyargergo <11230420+magyargergo@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Index name inference uses leaf directory; collides for monorepo / worktree / agent-workspace layouts

2 participants