Skip to content

fix(wiki): Remove the hidden 60s default timeout, validate gitnexus wiki timeout/retry flags, and surface timeout errors#1651

Merged
magyargergo merged 13 commits into
mainfrom
copilot/fix-gitnexus-wiki-timeout-error
May 17, 2026
Merged

fix(wiki): Remove the hidden 60s default timeout, validate gitnexus wiki timeout/retry flags, and surface timeout errors#1651
magyargergo merged 13 commits into
mainfrom
copilot/fix-gitnexus-wiki-timeout-error

Conversation

Copilot AI commented May 17, 2026

Copy link
Copy Markdown
Contributor

Large repos could fail gitnexus wiki after 60 seconds even though wiki generation is expected to run for many minutes. The command exposed --timeout, but the runtime path still imposed a default 60s per-request LLM timeout.

This PR also tightens the CLI boundary for --timeout based on review feedback so invalid or overflow-prone timeout values fail fast with a clear error instead of being silently ignored, surfaces a dedicated user-facing message when the configured timeout actually engages, and applies the same fail-fast validation to --retries.

  • Runtime behavior

    • Stop applying an implicit 60s AbortSignal.timeout(...) to wiki LLM requests.
    • Treat request timeout as opt-in: if --timeout is not provided, the request runs without a built-in ceiling.
    • Preserve the existing explicit override path so gitnexus wiki --timeout <seconds> still enforces a request timeout.
    • When a configured wiki request timeout fires, surface a clear timeout error telling the user the request timed out and that they can increase --timeout or omit it to disable the request timeout.
    • Avoid mislabeling generic aborted connection errors as timeout errors.
  • CLI / docs

    • Update help text and wiki docs to reflect the real behavior: timeout is now disabled by default, not 60.
    • Correct the timeout wording to describe the actual request-scoped behavior.
    • Validate gitnexus wiki --timeout at the CLI boundary:
      • reject empty, whitespace-only, zero, negative, non-numeric, and fractional values
      • reject values that would overflow when converted from seconds to milliseconds
    • Validate gitnexus wiki --retries at the CLI boundary with the same positive-integer fail-fast behavior.
    • Surface timeout failures in gitnexus wiki as a dedicated Timeout: message instead of a generic error.
  • Regression coverage

    • Add focused tests for the real callLLM path:
      • no timeout signal when requestTimeoutMs is omitted
      • explicit timeout signal when requestTimeoutMs is set
      • clear timeout errors when the configured request timeout fires
      • millisecond-format timeout messaging for non-whole-second values
      • timeout-like non-DOM errors still map to the same user-facing timeout message
      • generic aborted connections are not mislabeled as timeout errors
    • Add command-path tests for wikiCommand:
      • --timeout 120 maps to requestTimeoutMs = 120000
      • omitting --timeout leaves requestTimeoutMs unset
      • --retries 5 maps to maxAttempts = 5
      • timeout failures are surfaced with the dedicated CLI timeout message
    • Add CLI validation tests for invalid timeout inputs, invalid retry inputs, and overflow cases
signal:
  config.requestTimeoutMs !== undefined
    ? AbortSignal.timeout(config.requestTimeoutMs)
    : undefined;

> [!WARNING]
>
>

Copilot AI linked an issue May 17, 2026 that may be closed by this pull request
@vercel

vercel Bot commented May 17, 2026

Copy link
Copy Markdown

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
gitnexus Ready Ready Preview, Comment May 17, 2026 9:48am

Request Review

Copilot AI changed the title [WIP] Fix gitnexus wiki command timeout error Remove the hidden 60s default timeout from gitnexus wiki May 17, 2026
Copilot AI requested a review from magyargergo May 17, 2026 06:59
@magyargergo magyargergo changed the title Remove the hidden 60s default timeout from gitnexus wiki fix(wiki): Remove the hidden 60s default timeout from gitnexus wiki May 17, 2026
@github-actions

github-actions Bot commented May 17, 2026

Copy link
Copy Markdown
Contributor

Claude finished @magyargergo's task in 4m 39s —— View job


PR #1651 Review — fix(wiki): Remove hidden 60s default timeout

Review tasks:

  • Read repo context files (DoD.md, GUARDRAILS.md — TESTING.md, CONTRIBUTING.md, ARCHITECTURE.md also available; no DoD gaps)
  • Inspect all changed files and trace runtime flow
  • Search for hidden timeouts in resilientFetch, generator.ts, provider code
  • Assess CLI option parsing and defaults (wiki.ts + index.ts)
  • Assess test quality and Vitest wiring
  • Run hygiene / stale-docs checks
  • Check CI status on head SHA
  • Compile and post final review

Review bar for this PR

  • Default gitnexus wiki must not impose a per-request LLM timeout when --timeout is absent.
  • gitnexus wiki --timeout <seconds> must apply a per-attempt signal to every fetch attempt.
  • No second hidden 60s ceiling anywhere on the wiki LLM path (resilientFetch, generator, provider).
  • Unit tests must exercise the real callLLM implementation and be wired into CI.
  • All public help text and docs must consistently reflect "default: disabled."
  • CI (lint, typecheck, tests) must be green on the final head SHA before merge.

Current PR state

Field Value
Title fix(wiki): Remove the hidden 60s default timeout from gitnexus wiki
State Draft — must be marked ready before merge
Base / Head maincopilot/fix-gitnexus-wiki-timeout-error
Head SHA reviewed d056021
Linked issue #1537
Changed files README.md, gitnexus-claude-plugin/skills/gitnexus-cli/SKILL.md, gitnexus/src/cli/index.ts, gitnexus/src/core/wiki/llm-client.ts, gitnexus/test/unit/wiki-llm-client.test.ts
CI status CodeQL ✅ · Gitleaks ✅ · Dependency Review ✅ · CI workflow in_progress · Docker Build in_progress
Visibility gap Main CI job (typecheck + unit tests) has not yet completed — final verdict cannot be fully confirmed until green

Branch hygiene assessment

Clean feature/fix PR. Two commits (cc7a450 Initial plan, d056021 fix: remove default wiki llm timeout ceiling). The "Initial plan" commit is an empty Copilot agent planning marker with no file changes; the actual diff is entirely in d056021. Five files changed, all directly relevant to the fix. No merge-from-main commits, no unrelated churn.


Understanding of the change

Runtime flow:
gitnexus wiki [path]wikiCommand (cli/wiki.ts) → resolveLLMConfig(overrides) builds llmConfig (no requestTimeoutMs field, since resolveLLMConfig doesn't populate it) → if --timeout N was passed, wiki.ts:353-355 sets llmConfig.requestTimeoutMs = secs * 1000new WikiGenerator(..., llmConfig, ...)generator.invokeLLM()callLLM(prompt, config)resilientFetch(url, { signal: ... }).

Before this PR: signal: AbortSignal.timeout(config.requestTimeoutMs ?? 60_000) — always created a timeout signal, defaulting to 60 s.

After this PR: signal: config.requestTimeoutMs !== undefined ? AbortSignal.timeout(config.requestTimeoutMs) : undefined — no signal when requestTimeoutMs is absent; caller-supplied signal when it's set. resilientFetch passes init straight through to fetchImpl(input, init); when signal is undefined, native fetch runs without an abort deadline. This is the correct fix.


Findings

Finding 1 — No second hidden 60s LLM ceiling

  • Evidence: Searched all 60_000 / AbortSignal.timeout / requestTimeoutMs occurrences in gitnexus/src/. The other 60_000 references are: generator.ts:172 (DB touch interval — not a request timeout), hf-env.ts:19 (embeddings circuit-breaker reset — unrelated to wiki), community-processor.ts:163 (Leiden algorithm timeout — unrelated). resilientFetch has no internal per-request timeout; it accepts and forwards the caller-supplied init.signal. No second ceiling exists.
  • Risk: None.
  • Blocks merge: No.

Finding 2 — resolveLLMConfig correctly omits requestTimeoutMs

  • Evidence: llm-client.ts:44-75resolveLLMConfig return object never includes requestTimeoutMs. The field is undefined on the resolved config unless wiki.ts:353-355 explicitly sets it. saveCLIConfig and the saved ~/.gitnexus/config.json have no requestTimeoutMs field in the saved schema. No saved-config injection path exists.
  • Risk: None.
  • Blocks merge: No.

Finding 3 — Commander has no default value for --timeout

  • Evidence: index.ts:164.option('--timeout <seconds>', '...') — no third argument (no commander default). options.timeout is undefined when not provided. Confirmed correct.
  • Risk: None.
  • Blocks merge: No.

Finding 4 — Invalid --timeout values are silently ignored (pre-existing behavior, not a regression)

  • Evidence: wiki.ts:353-355: const secs = parseInt(options.timeout, 10); if (!isNaN(secs) && secs > 0) llmConfig.requestTimeoutMs = secs * 1000;--timeout 0, --timeout -1, --timeout abc all silently fall through and leave requestTimeoutMs unset. This was also the case pre-PR (the only change is that the old code fell back to the ?? 60_000 default in callLLM, while now it runs without a timeout). A user passing --timeout 0 expecting a tighter bound now gets no timeout instead of 60 s, with no error message.
  • Risk: Medium UX risk — user-visible regression for the invalid-value case. The intent presumably differs ("I want a very short timeout" vs "no timeout"). Not a production hang risk since omitting timeout is the intended default.
  • Recommended fix: Add a clear console.error / early exit for non-positive / NaN values, consistent with how other CLI numeric options are validated. Fix this →
  • Blocks merge: No — not a regression from the fix, but worth a follow-up issue.

Finding 5 — AbortError/TimeoutError is not retried (correct, but a semantic nuance)

  • Evidence: resilient-fetch.ts:92-98TimeoutError and AbortError are classified as terminal-network, routing through breaker.recordNeutral() and rethrowing immediately. This means gitnexus wiki --timeout 30 that fires on a slow LLM will surface after one attempt, not after 3 retries. This is intentional (per comment in resilient-fetch) and documented behavior, but subtly different from the retry behavior of network errors. Docs don't explicitly call this out.
  • Risk: Low — users who set --timeout and hit it won't benefit from retry attempts. Acceptable by design (retrying with the same already-fired signal would fail immediately). Not a blocker.
  • Blocks merge: No.

Finding 6 — No command-path test for CLI option → requestTimeoutMs mapping

  • Evidence: wiki-llm-client.test.ts tests callLLM directly. There is no test for wikiCommand that asserts --timeout 120 sets requestTimeoutMs = 120_000 and omitting --timeout leaves it undefined. DoD item 10 requires "command-path or integration-level verification." The parseInt + > 0 guard in wiki.ts:353-355 is simple, but untested.
  • Risk: Low — the code path is trivially correct by inspection. No production risk, but the DoD is not fully met.
  • Recommended fix: Add a focused unit test for the wikiCommand timeout-mapping path, or a brief note in the PR description that this was verified manually.
  • Blocks merge: No — acceptable as a follow-up.

Finding 7 — Test quality is solid

  • Evidence: New describe('callLLM — timeout handling') block (lines 240–295): uses vi.spyOn(AbortSignal, 'timeout') and vi.stubGlobal('fetch', fetchSpy). afterEach calls vi.restoreAllMocks() + vi.unstubAllGlobals() — clean teardown. No-timeout test asserts timeoutSpy was never called AND init.signal === undefined. Explicit-timeout test asserts timeoutSpy called with 120_000 and init.signal === timeoutSignal. Dynamic import(...) reuses the cached module, which is fine here since no module-level state differs between calls. The tests exercise the real callLLM implementation via a stubbed global fetch.
  • Risk: None.
  • Blocks merge: No.

Finding 8 — Tests ARE wired into CI

  • Evidence: vitest.config.ts:79 — the default project includes test/**/*.test.ts. test/unit/wiki-llm-client.test.ts matches and is not excluded. CI will run these tests.
  • Risk: None.
  • Blocks merge: No.

Finding 9 — Docs consistently updated, no stale references

  • Evidence: All three changed doc surfaces (README.md, SKILL.md, index.ts help text) now say "default: disabled". Searched all .md files for default: 60, 60 seconds, timeout.*60 — no matches found outside the test infrastructure (which uses timeout: 60000 for subprocess timeouts, unrelated to wiki). No stale public references remain.
  • Risk: None.
  • Blocks merge: No.

Finding 10 — CI incomplete, PR is Draft

  • Evidence: CI workflow is in_progress at time of review. Main gates (typecheck, unit tests, lint) have not completed. CodeQL, Gitleaks, and Dependency Review are green. PR is still in Draft state.
  • Risk: Cannot confirm final green.
  • Blocks merge: Yes — merge must wait for CI completion and for PR to be marked ready.

Finding 11 — No hidden Unicode

  • Evidence: git diff --check HEAD~2..HEAD returned no output (no whitespace errors). The changed TypeScript files contain only ASCII characters (confirmed by manual inspection of the diffs). No bidirectional controls or zero-width characters detected.
  • Risk: None.
  • Blocks merge: No.

PR-specific assessment

Issue-fit and repro coverage: ✅ The fix directly removes the ?? 60_000 fallback that caused the reported 60s abort. No second hidden ceiling exists on the wiki LLM path. The fix is exactly scoped to the failure mode described in #1537.

CLI contract and option parsing:--timeout has no commander default; omitting it leaves options.timeout undefined; wiki.ts only sets requestTimeoutMs for positive numeric values; resolveLLMConfig and saved config do not inject requestTimeoutMs. ⚠️ Invalid values (0, negative, non-numeric) are silently ignored — pre-existing behavior, not a regression.

Runtime timeout/retry behavior:signal: undefined is passed through by resilientFetch to native fetch without substituting any ceiling. Explicit timeout creates a fresh AbortSignal.timeout(requestTimeoutMs) per attempt. Timeout errors are terminal (not retried), which is correct and intentional. Retry semantics (maxAttempts, backoff) are unchanged.

Test coverage and CI wiring: ✅ New tests cover the real callLLM path for both the no-timeout and explicit-timeout cases. Tests are wired into the Vitest default project. ⚠️ No command-path test for CLI option mapping. ⚠️ CI still running.

Docs/help consistency: ✅ All three public doc surfaces updated to "default: disabled." No stale references found.

Security/hygiene/hidden Unicode: ✅ No new injection surfaces. validateLLMBaseUrl unchanged and still gates all fetch calls. No Unicode issues. No secrets or credentials in changed files.

Branch hygiene: ✅ Clean feature/fix PR. Two commits, 5 files, focused. The "Initial plan" commit is an empty Copilot planning marker.


Specific likely issues verified

Issue Status Evidence
1. Hidden timeout not fully removed Not present Only one AbortSignal.timeout call in wiki path; all other 60_000 refs are unrelated
2. CLI default still injects timeout Not present --timeout has no commander default; resolveLLMConfig omits requestTimeoutMs; saved config has no field
3. Explicit --timeout path broken Not present wiki.ts:353-355 correctly maps to requestTimeoutMs * 1000; covered by unit test
4. Invalid timeout silently ignored Confirmed --timeout 0/-1/abc all silently produce no timeout; pre-existing behavior, not a regression
5. Unit test fragility from module caching/spies Not present vi.restoreAllMocks() + vi.unstubAllGlobals() in afterEach; module caching is benign here
6. CI not proving the regression test Not present test/unit/wiki-llm-client.test.ts matched by test/**/*.test.ts in the default Vitest project
7. Docs stale outside changed files Not present Full search of .md files found no remaining default: 60 or 60 seconds references
8. Draft PR merged before final review Confirmed PR is still Draft; CI still running — process gate, not a code defect

Final verdict

production-ready with minor follow-ups — once CI is green and draft status is resolved.

The core fix is correct and precisely scoped: the hidden ?? 60_000 fallback is removed from the only location in the wiki LLM call path that created it, resolveLLMConfig and saved config introduce no substitute default, and resilientFetch correctly passes signal: undefined without substituting its own ceiling. The new unit tests execute the real callLLM implementation, assert signal state directly, and are wired into CI via the Vitest default project. All public docs consistently reflect the new "default: disabled" behavior.

Two items should be tracked as follow-ups (neither blocks the fix): (1) invalid --timeout values (0, negative, non-numeric) are silently ignored rather than surfaced as user errors; (2) the CLI option-to-config mapping in wiki.ts is untested at the command level. Neither is a regression introduced by this PR. Merge gate: wait for the CI workflow to complete and the PR to be marked ready for review.
| Branch

@magyargergo

Copy link
Copy Markdown
Collaborator

@copilot fetch all the comment and filter for claude's comments and make sure you get all the findings and they are addressed with tests and each findings is a separate commit. Follow DoD.md and act as a senior principal GitNexus engineer.

Copilot AI commented May 17, 2026

Copy link
Copy Markdown
Contributor Author

@copilot fetch all the comment and filter for claude's comments and make sure you get all the findings and they are addressed with tests and each findings is a separate commit. Follow DoD.md and act a...

Addressed the actionable Claude findings in separate commits: f8981ed (invalid --timeout values now fail fast), 66522c4 (command-path tests for timeout→requestTimeoutMs mapping). I also tightened the follow-up edge cases in 24ad004, e85de16, and d74c5dd to cover empty/whitespace/fractional inputs and overflow-safe timeout validation. Verified with npx vitest run test/unit/wiki-flags.test.ts test/unit/wiki-llm-client.test.ts and npx tsc --noEmit.

Copilot AI changed the title fix(wiki): Remove the hidden 60s default timeout from gitnexus wiki fix(wiki): Remove the hidden 60s default timeout and validate gitnexus wiki --timeout May 17, 2026
@magyargergo

Copy link
Copy Markdown
Collaborator

@copilot we need to surface information for the user if the timeout got engaged

Copilot AI commented May 17, 2026

Copy link
Copy Markdown
Contributor Author

@copilot we need to surface information for the user if the timeout got engaged

Addressed in fae3dba, with follow-up hardening in 523f3a0 and ce17a63, plus regression coverage in e4fbfe7. When the configured wiki request timeout fires, gitnexus wiki now surfaces a dedicated Timeout: message telling the user the request timed out and that they can increase --timeout or omit it to disable the per-attempt timeout. Verified with npx vitest run test/unit/wiki-llm-client.test.ts test/unit/wiki-flags.test.ts and npx tsc --noEmit.

Copilot AI changed the title fix(wiki): Remove the hidden 60s default timeout and validate gitnexus wiki --timeout fix(wiki): Remove the hidden 60s default timeout, validate gitnexus wiki --timeout, and surface engaged timeout errors May 17, 2026
@github-actions

github-actions Bot commented May 17, 2026

Copy link
Copy Markdown
Contributor

Claude finished @magyargergo's task in 4m 56s —— View job


PR #1651 Review — fix(wiki): Remove hidden 60s default timeout

Review tasks:

  • Read repo context files (DoD.md, AGENTS.md, GUARDRAILS.md — TESTING.md, CONTRIBUTING.md, ARCHITECTURE.md also available; no DoD gaps)
  • Inspect all changed files and trace runtime flow
  • Search for hidden timeouts in resilientFetch, generator.ts, provider code
  • Assess CLI option parsing and defaults (wiki.ts + index.ts)
  • Assess test quality and Vitest wiring
  • Run hygiene / stale-docs checks
  • Check CI status on head SHA
  • Compile and post final review

Review bar for this PR

  • Default gitnexus wiki must not impose a per-request LLM timeout when --timeout is absent.
  • gitnexus wiki --timeout <seconds> must apply a timeout signal to every fetch call, and invalid values must fail fast.
  • No second hidden 60s ceiling anywhere on the wiki LLM path (resilientFetch, generator, provider).
  • Unit tests must exercise the real callLLM implementation and be wired into CI.
  • All public help text and docs must consistently reflect "default: disabled."
  • CI (lint, typecheck, tests) must be green on the final head SHA before merge.

Current PR state

Field Value
Title fix(wiki): Remove the hidden 60s default timeout, validate gitnexus wiki --timeout, and surface engaged timeout errors
State Draft — must be marked ready before merge
Base / Head maincopilot/fix-gitnexus-wiki-timeout-error
Head SHA reviewed ce17a63
Changed files README.md, SKILL.md, gitnexus/src/cli/index.ts, gitnexus/src/cli/wiki.ts, gitnexus/src/core/wiki/llm-client.ts, gitnexus/test/unit/wiki-flags.test.ts, gitnexus/test/unit/wiki-llm-client.test.ts
CI status ⛔ All 9 workflow runs show conclusion: action_required with 0 jobs executed — CI has never run on the current HEAD

Branch hygiene assessment

Clean feature/fix PR. 11 focused commits, all directly relevant to the fix. Files changed: 2 src files (wiki.ts, llm-client.ts), 1 CLI registration (index.ts), 2 test files, 2 doc files. No merge-from-main commits, no unrelated churn. The "Initial plan" commit is an empty Copilot planning marker with no file changes.


Understanding of the change

Runtime flow: gitnexus wiki [path]wikiCommandparsePositiveIntegerOption(options?.timeout, '--timeout', 1000) (fails fast on invalid input) → resolveLLMConfig(overrides) builds llmConfig (no requestTimeoutMs in return object) → if timeoutSeconds !== undefined, wiki.ts:379-381 sets llmConfig.requestTimeoutMs = timeoutSeconds * 1000new WikiGenerator(..., llmConfig, ...)generator.invokeLLM()callLLM(prompt, config)resilientFetch(url, { signal: config.requestTimeoutMs !== undefined ? AbortSignal.timeout(config.requestTimeoutMs) : undefined }).

Before this PR: signal: AbortSignal.timeout(config.requestTimeoutMs ?? 60_000) — always created a 60s timeout, silently imposing the ceiling even without --timeout.

After this PR: signal: config.requestTimeoutMs !== undefined ? AbortSignal.timeout(config.requestTimeoutMs) : undefined — no signal when requestTimeoutMs is absent; invalid --timeout values now fail fast before generator construction.


Findings

Finding 1 — Core fix is correct and complete

  • Evidence: llm-client.ts:256-259signal is undefined when requestTimeoutMs is absent. resilientFetch passes init straight to fetchImpl(input, init); when signal is undefined, native fetch runs without an abort deadline. resilientFetch has no internal per-request timeout of its own — confirmed by reading gitnexus-shared/src/integrations/resilient-fetch.ts in full.
  • Risk: None.
  • Blocks merge: No.

Finding 2 — No second hidden 60s LLM ceiling

  • Evidence: Searched all 60_000 / AbortSignal.timeout / requestTimeoutMs in gitnexus/src/. Other 60_000 references: generator.ts:172 (DB touch interval), hf-env.ts:19 (embeddings circuit-breaker reset), community-processor.ts:163 (Leiden algorithm timeout), pool-adapter.ts:118 (pool idle check). None are LLM request timeouts. resilientFetch classifies DOMException('TimeoutError') and DOMException('AbortError') as terminal-network (not retried), and rethrows them directly — no hidden ceiling substituted.
  • Risk: None.
  • Blocks merge: No.

Finding 3 — resolveLLMConfig correctly omits requestTimeoutMs

  • Evidence: llm-client.ts:55-74 — return object has no requestTimeoutMs field. Saved config (~/.gitnexus/config.json) schema does not include it. No env var injection path. The field is only set in wiki.ts:379-381 when timeoutSeconds is defined.
  • Risk: None.
  • Blocks merge: No.

Finding 4 — Commander has no default for --timeout; CLI validation is solid

  • Evidence: index.ts:164.option('--timeout <seconds>', 'Per-attempt LLM request timeout in seconds (default: disabled)') — no third argument, so options.timeout is undefined when omitted. wiki.ts:40-55parsePositiveIntegerOption uses /^[1-9]\d*$/ regex to reject empty, whitespace-only, zero, negative, non-numeric, and fractional values, plus an overflow check (parsed > Math.floor(Number.MAX_SAFE_INTEGER / multiplier)). All cases are tested in wiki-flags.test.ts:286-353 with it.each.
  • Risk: None.
  • Blocks merge: No.

Finding 5 — Timeout error surfacing is correct end-to-end

  • Evidence: llm-client.ts:278-283 — when requestTimeoutMs !== undefined and isTimeoutLikeError(err) is true, throws LLM request timed out after Xs. Increase --timeout or omit it.... wiki.ts:591-592 — CLI catch block matches 'LLM request timed out after' and surfaces \n Timeout: ${err.message}\n. wiki-flags.test.ts:479-548 — dedicated wikiCommand timeout messaging describe block asserts the full Timeout: prefix message and process.exitCode = 1.
  • Risk: None.
  • Blocks merge: No.

Finding 6 — isTimeoutLikeError has a broad abort pattern when requestTimeoutMs is set

  • Evidence: llm-client.ts:94/time(d)?\s*out|timeout|abort/i matches any error with "abort" in the message (e.g., "Connection aborted by server") when requestTimeoutMs !== undefined. This would surface a TCP reset as "LLM request timed out after Xs" which is a misleading message. The test at wiki-llm-client.test.ts:338-357 covers non-DOM timeout-like errors (new Error('request timed out while waiting for response')) but does not cover the "abort" leg.
  • Risk: Low. Misattributing a ECONNRESET as a timeout is confusing but not a production hang or data-loss risk. Only affects users who set --timeout.
  • Recommended fix: Either narrow the regex to exclude bare abort or add a test asserting that a generic "connection aborted" error with requestTimeoutMs set produces the same timeout message (to document the intentional behavior). Fix this →
  • Blocks merge: No.

Finding 7 — Signal is per-call, not per-attempt; PR description is inaccurate

  • Evidence: llm-client.ts:256-259AbortSignal.timeout(config.requestTimeoutMs) is created once per callLLM invocation and passed to resilientFetch. If attempt 1 takes 25s against a 30s timeout and gets a 5xx, attempt 2 would only have 5s. However, resilientFetch classifies DOMException('TimeoutError') as terminal-network and does NOT retry — so in practice, the signal fires only once and the loop exits. The behavior is correct; the documentation ("per-attempt timeout") is slightly misleading since it's actually a deadline from the moment callLLM is invoked. This was also the pre-existing behavior; the PR only removed the default.
  • Risk: None in practice (timeout aborts are not retried).
  • Recommended fix: Update the JSDoc/comment in callLLM to say "per-call timeout deadline" rather than "per-attempt". Not a blocker.
  • Blocks merge: No.

Finding 8 — --retries still uses old silent-ignore pattern

  • Evidence: wiki.ts:382-384const n = parseInt(options.retries, 10); if (!isNaN(n) && n > 0) llmConfig.maxAttempts = n; — silently ignores invalid values. Pre-existing behavior; not introduced by this PR. The new parsePositiveIntegerOption helper was not applied to --retries.
  • Risk: Low. Pre-existing UX inconsistency. Not a regression.
  • Recommended fix: Apply parsePositiveIntegerOption to --retries in a follow-up. Fix this →
  • Blocks merge: No.

Finding 9 — Tests are solid and exercise the real callLLM path

  • Evidence: wiki-llm-client.test.ts:240-358describe('callLLM — timeout handling') uses vi.spyOn(AbortSignal, 'timeout') and vi.stubGlobal('fetch', fetchSpy). afterEach calls vi.restoreAllMocks() + vi.unstubAllGlobals() — clean teardown. No-timeout test asserts timeoutSpy not called AND init.signal === undefined. Explicit-timeout test asserts timeoutSpy called with 120_000 and signal passed to fetch. wiki-flags.test.ts:356-459 — command-path tests cover --timeout 120 → requestTimeoutMs = 120_000 and omitted --timeout → undefined. Vitest default project (include: ['test/**/*.test.ts']) picks up both files.
  • Risk: None.
  • Blocks merge: No.

Finding 10 — CI has NOT run on the current HEAD SHA — hard merge gate

  • Evidence: All 9 workflow runs show conclusion: action_required with 0 jobs (confirmed via API). This conclusion is set by GitHub when workflows triggered by external/copilot PRs require maintainer approval before any job executes. The copilot agent reported running npx vitest run ... and npx tsc --noEmit locally in PR comments, but formal CI has not been gated on ce17a63. The DoD §4.2 requires cd gitnexus && npm test and npx tsc --noEmit to pass before merge.
  • Risk: High. Tests could fail for environment reasons, import resolution, or Vitest config mismatches not caught locally.
  • Recommended fix: A maintainer must approve the CI workflow runs for this branch, wait for green, then mark the PR ready for review.
  • Blocks merge: Yes.

Finding 11 — Docs consistently updated; no stale references

  • Evidence: README.md:728--timeout <seconds> # Per-attempt LLM request timeout in seconds (default: disabled). SKILL.md:65| \--timeout ` | Per-attempt LLM request timeout in seconds (default: disabled) |. index.ts:164— help text uses(default: disabled). Full search of .mdfiles found no remainingdefault: 60or60 seconds` references outside unrelated contexts.
  • Risk: None.
  • Blocks merge: No.

Finding 12 — No hidden Unicode

  • Evidence: git diff --check 2376912..HEAD — no output (no whitespace errors). Non-ASCII characters in README.md are all visible emoji and punctuation — intentional and acceptable. No bidirectional controls or zero-width characters found in any changed file.
  • Risk: None.
  • Blocks merge: No.

PR-specific assessment

Issue-fit and repro coverage: ✅ The fix directly removes the ?? 60_000 fallback — the precise cause of #1537. No second hidden 60s ceiling exists on the wiki LLM path. The explicit --timeout override path is preserved and tested end-to-end.

CLI contract and option parsing:--timeout has no commander default. parsePositiveIntegerOption rejects all invalid values with a clear error and early exit before generator construction. ⚠️ --retries still uses the old silent-ignore pattern — pre-existing inconsistency.

Runtime timeout/retry behavior:signal: undefined is passed through by resilientFetch without substituting any ceiling. Explicit timeout creates AbortSignal.timeout(requestTimeoutMs). Timeout errors are terminal-network (not retried), which is correct. ⚠️ Signal is per-call, not per-attempt — PR description is slightly inaccurate, but behavior is correct in practice.

Test coverage and CI wiring: ✅ New tests cover the real callLLM path for no-timeout and explicit-timeout cases. Command-path tests cover wikiCommand option mapping and timeout messaging. Both files are wired into the Vitest default project. ⛔ CI has not run on the current HEAD — action_required with 0 jobs executed.

Docs/help consistency: ✅ All three public doc surfaces updated to (default: disabled). No stale references found.

Security/hygiene/hidden Unicode:validateLLMBaseUrl unchanged and still gates all fetch calls. isTimeoutLikeError does not introduce any injection surface. No secrets in changed files. No Unicode issues.

Branch hygiene: ✅ Clean feature/fix PR. 11 focused commits, 7 directly relevant files. No unrelated churn.


Specific likely issues verified

Issue Status Evidence
1. Hidden timeout not fully removed Not present Only one AbortSignal.timeout call in wiki path; resilientFetch has no internal per-request ceiling
2. CLI default still injects timeout Not present No commander default; resolveLLMConfig omits requestTimeoutMs; saved config has no field
3. Explicit --timeout path broken Not present wiki.ts:379-381 maps correctly; command-path tests in wiki-flags.test.ts assert both cases
4. Invalid timeout silently ignored Fixed in this PR parsePositiveIntegerOption rejects all invalid values; tested with it.each over 7 cases
5. Unit test fragility from module caching/spies Not present vi.restoreAllMocks() + vi.unstubAllGlobals() in afterEach; module caching is benign
6. CI not proving the regression test Not present test/**/*.test.ts glob in default Vitest project covers both test files
7. Docs stale outside changed files Not present Full search found no remaining default: 60 references
8. Draft PR / CI not green Confirmed PR is Draft; CI shows action_required with 0 jobs on current HEAD

Final verdict

not production-ready

The core runtime fix is correct and precisely scoped: the hidden ?? 60_000 fallback is removed, resolveLLMConfig introduces no substitute default, resilientFetch passes signal: undefined without a hidden ceiling, CLI validation is now robust and fail-fast, and all public docs consistently reflect the new behavior. However, CI has not run on the current HEAD commit (ce17a63) — all 9 workflow runs show conclusion: action_required with zero jobs executed, meaning the typecheck, lint, and unit tests have not been formally verified against the current state of the branch. A maintainer must approve the CI workflows, wait for green, and mark the PR ready for review before merge. Findings 6, 7, and 8 (broad isTimeoutLikeError abort match, per-call vs. per-attempt documentation, --retries inconsistency) are minor and should be tracked as follow-ups after merge.
| Branch

@magyargergo

Copy link
Copy Markdown
Collaborator

@copilot get claude's new findings and make sure they are all address from the comments

Copilot AI commented May 17, 2026

Copy link
Copy Markdown
Contributor Author

@copilot get claude's new findings and make sure they are all address from the comments

Addressed the new actionable Claude findings in b83c9cc: narrowed timeout matching so generic aborted connections are not mislabeled as timeouts, made --retries fail fast with the same validation path as --timeout, and updated the timeout wording/docs to match the actual request-scoped behavior. Added regression coverage in the wiki unit tests and re-verified with npx vitest run test/unit/wiki-llm-client.test.ts test/unit/wiki-flags.test.ts and npx tsc --noEmit. The remaining Claude CI note is procedural only: GitHub still shows the PR workflows as action_required with 0 jobs started, so a maintainer needs to approve/run CI on the current head b83c9cc. CLI-only change, so no UI screenshot applies.

Copilot AI changed the title fix(wiki): Remove the hidden 60s default timeout, validate gitnexus wiki --timeout, and surface engaged timeout errors fix(wiki): Remove the hidden 60s default timeout, validate gitnexus wiki timeout/retry flags, and surface timeout errors May 17, 2026
@magyargergo magyargergo marked this pull request as ready for review May 17, 2026 08:35
@magyargergo

Copy link
Copy Markdown
Collaborator

@copilot CI must be healthy

Copilot AI commented May 17, 2026

Copy link
Copy Markdown
Contributor Author

@copilot CI must be healthy

I investigated the GitHub Actions state. No wiki-related code change is needed from this comment: the old failing CI run on b83c9cc failed in a single unrelated test (test/unit/hooks.test.ts, PreToolUse concurrency guard), and the current merged-head run on cb3e653 has the completed checks healthy so far (lint, format, typecheck, typecheck-web, CodeQL, Gitleaks, Dependency Review, and completed scope-parity shards are green) while the long-running test lanes are still in progress. No UI changes here, so no screenshot applies.

Warning

Firewall rules blocked me from connecting to one or more addresses (expand for details)

I tried to connect to the following addresses, but was blocked by firewall rules:

  • extension.ladybugdb.com
    • Triggering command: /usr/local/bin/node /usr/local/bin/node /home/REDACTED/work/GitNexus/GitNexus/gitnexus/scripts/install-duckdb-extension.mjs fts 17179869184 k/Gi�� le.encode(&#39;utf-8-I (dns block)
    • Triggering command: /usr/local/bin/node /usr/local/bin/node /home/REDACTED/work/GitNexus/GitNexus/gitnexus/scripts/install-duckdb-extension.mjs fts 17179869184 k/Gi�� le.encode(&#39;utf-8-I et/tree_sitter_k/home/REDACTED/.cache/node-gyp/20.20.2/deps/openssl/config p/bin/python3 (dns block)
    • Triggering command: /usr/local/bin/node /usr/local/bin/node /home/REDACTED/work/GitNexus/GitNexus/gitnexus/scripts/install-duckdb-extension.mjs fts 17179869184 et/t�� le.encode(&#39;utf-8&#39;)); (dns block)

If you need me to access, download, or install something from one of these locations, you can either:

@github-actions

Copy link
Copy Markdown
Contributor

CI Report

All checks passed

Pipeline Status

Stage Status Details
✅ Typecheck success tsc --noEmit
✅ Tests success unit tests, 3 platforms
✅ E2E success gitnexus-web changes only

Test Results

Tests Passed Failed Skipped Duration
9171 9170 0 1 468s

✅ All 9170 tests passed

1 test(s) skipped — expand for details
  • buildTypeEnv > known limitations (documented skip tests) > Ruby block parameter: users.each { |user| } — closure param inference, different feature

Code Coverage

Tests

Metric Coverage Covered Base Delta Status
Statements 78.58% 29429/37447 N/A% 🟢 ███████████████░░░░░
Branches 67.09% 18740/27929 N/A% 🟢 █████████████░░░░░░░
Functions 83.37% 2934/3519 N/A% 🟢 ████████████████░░░░
Lines 81.9% 26554/32421 N/A% 🟢 ████████████████░░░░

📋 View full run · Generated by CI

@magyargergo magyargergo merged commit ed50a67 into main May 17, 2026
32 checks passed
hohaivu pushed a commit to hohaivu/GitNexus that referenced this pull request May 19, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

gitnexus wiki command throws timeout error

2 participants