feat(ingestion): log deferred resolution progress when verbose (#1741)#1773
Conversation
Add [deferred-profile] timing logs for post-chunk import, heritage, heritage-map, and legacy call resolution. Enabled on GITNEXUS_VERBOSE / analyze -v (and optionally GITNEXUS_PROFILE_DEFERRED) to diagnose analyze stalls on large repos (issue abhigyanpatwari#1741). Co-authored-by: Cursor <cursoragent@cursor.com>
|
Someone is attempting to deploy a commit to the NexusCore Team on Vercel. A member of the Team first needs to authorize it. |
✨ PR AutofixFound fixable formatting / unused-import issues across 29 changed lines. Comment |
|
/autofix |
|
Claude finished @magyargergo's task in 5m 39s —— View job Production-Readiness Review: PR #1773Todo List
Review bar for this PRThis PR is reviewed against DoD.md v2.0.0 §§2.1–2.9 and the PR-specific DoD stated in the review request. The bar is: observability-only change, accurate logs that enable #1741 reporters to pinpoint where deferred resolution time goes, no behavior regression, tests that would catch broken wiring, and clean CI. Problem being solvedIssue #1741: Current PR stateSingle commit Merge status and mergeabilityCould not directly verify required-check conclusions (cannot fetch CI without network access), but the PR Autofix bot output is visible evidence that lint/format checks have unresolved findings in the diff. This is a merge blocker per DoD §4.2, which requires Repository history considered
Branch hygiene assessmentClean. One commit, one domain (ingestion observability), no unrelated formatting churn, no generated artifacts, no dependency bumps, no workflow changes, no release-note edits. Diff maps exactly to #1741's stated need. Understanding of the change
Runtime activation path confirmed: FindingsFinding 1 — Pre-reported: Registry-primary skip counter dead codeRisk: Claimed as a confirmed bug in the pre-analysis. Actual status: NOT a bug in this PR. The diff shows a single However, a related accuracy issue is confirmed: the per-file progress log at A reporter sees 47 call sites attributed to
Finding 2 — Pre-reported: E1 log gated by
|
|
✅ Applied autofix and pushed a commit. (apply run) |
…review Move deferred call progress logs after the registry-primary skip so sites= counts match files actually resolved. Only time buildHeritageMap when heritage records exist; otherwise log an explicit skip. Add wiring tests that assert [deferred-profile] emission from buildHeritageMap and processCallsFromExtracted. Snapshot GITNEXUS_PROFILE_DEFERRED env vars in analyze CLI isolation. Co-authored-by: Cursor <cursoragent@cursor.com>
CI Report✅ All checks passed Pipeline Status
Test Results
✅ All 9506 tests passed 1 test(s) skipped — expand for details
Code CoverageTests
📋 View full run · Generated by CI |
P0 - Replace forbidden toBeGreaterThanOrEqual/toBeLessThan in profileElapsedMs test with exact-arithmetic vi.spyOn(hrtime.bigint) asserting .toBe(2.5) and .toBe(0). DoD §2.7 compliance. P2 - Use Number() (not parseInt) when parsing GITNEXUS_PROFILE_DEFERRED_SLOW_MS so scientific notation like '1e9' doesn't silently parse to 1 and turn the slow-file log into a per-file log storm. - Introduce startTimer(enabled): bigint | null and endTimer(start, format) helpers in deferred-resolution-profile.ts; refactor 6+ timing blocks in parse-impl.ts and call-processor.ts to use them. Removes the 0n sentinel that conflated 'disabled' with 'zero elapsed time' and let TS narrow correctly. - Split the call-processor file counter: filesProcessed (all iterated) vs resolvedFiles (post registry-primary skip). Key the every-N progress log and the start-of-phase log on resolvedFiles so mixed Python+JVM repos where the skipped language sorts first still emit 'calls 1/1 file=...' on the first non-skipped file. Adds a wiring test for the mixed-language ordering case. P3 - Restore the original isDev '🔗 E1: Seeded ...' logger.info line so log scrapers keyed on the emoji marker still match; emit the [deferred-profile] variant only when deferredProfile && !isDev. - Move tFile = startTimer(profileCalls) below the registry-primary skip so skipped files don't trigger an hrtime.bigint() call. - Document GITNEXUS_PROFILE_DEFERRED and GITNEXUS_PROFILE_DEFERRED_SLOW_MS in the README env-var table.
Three narrow-form env-var truthy checkers (verbose.ts, registry-primary-flag.ts, deferred-resolution-profile.ts) each had their own `'1' | 'true' | 'yes'` parser with subtle divergences (trim or no trim, set vs disjunction). Consolidate on a single `parseTruthyEnv(raw)` helper in utils/env.ts — the module already serves as the centralization point for shared ingestion env constants. logger.ts's broader `isTruthyEnv` (negative-list, pino-debug convention) stays untouched — different intent, different semantics. New table-driven test at test/unit/env.test.ts covers case variants, whitespace, and rejection of falsy / unknown tokens.
Replace magic literals 10 / 100 / 3_000 / 5_000 in deferred-resolution-profile.ts with module-private named constants LOG_EVERY_N_VERBOSE, LOG_EVERY_N_PROFILE, DEFAULT_SLOW_MS_VERBOSE, DEFAULT_SLOW_MS. Not exported — internal tuning knobs. Pure refactor; existing tests assert the exact values and still pass unchanged.
The live per-file denominator in processCallsFromExtracted previously read `totalFiles - skippedRegistryPrimaryFiles` at log time. On mixed Python+JVM repos where the skipped language interleaves with the resolved one, the denominator drifts upward as the loop iterates — files iterated before later skips have been seen carry an inflated denominator. The live ratio only self-corrects after the final file has been classified. Fix: one-pass pre-count over byFile.keys() before the work loop computes resolvedTotal once. The denominator is then stable from the first emission onward. The pre-pass runs only on the enabled path (profileCalls=true) so the disabled path keeps zero extra work. Adds a wiring test exercising the alternating [ts, py, ts, py, ...] order that triggered the drift, asserting every emitted line uses `/4` and no other denominator slips through.
… (U2, A2)
The post-chunk E1 enrichment log used `if (isDev) {...} else if
(deferredProfile) {...}` which is mutually exclusive. On combined runs
(NODE_ENV=development + GITNEXUS_PROFILE_DEFERRED=1) the [deferred-
profile] line was silently swallowed — operators grepping that prefix
saw a gap between wildcard-synth and heritage timings, while the
inline comment promised dual emission.
Fix: two independent `if` statements so both branches fire when both
flags are set. The original emoji-prefixed `🔗 E1: Seeded` line keeps
its phrasing for any dev-mode log scrapers that depend on the marker.
Pinning test (parse-impl-e1-emission-shape.test.ts) reads the source
and asserts (a) both branches exist as standalone `if` statements and
(b) the closing `}` of the isDev branch is followed by `if`, not
`else if`. Source-shape pins are the right test scope for a purely
structural change — the regression we are guarding against is exactly
how a future reader greps for it.
The existing maxNameCartesian / ambiguousHeritageRecords counters in buildHeritageMap only observed records where BOTH the child and parent name lookups resolved. On JVM monorepos the actual pathological case is one side empty (typically an unresolved external supertype with many same-named children, or vice versa) — those records were silently dropped from the metric. Add `unresolvedChildLookups` and `unresolvedParentLookups` in a separate `if (profileHeritage)` block placed immediately after the two `lookupClassByName` calls (so it observes the unresolved cases the length-guarded ambiguity block below cannot see). Both counters reuse the existing childDefs / parentDefs values — no additional lookups. Done-summary log extended to include the two new counters. Wiring test covers both directions (unresolved parent, unresolved child) plus the existing "both resolved" baseline now asserts the new counters report zero for that case.
Wrap the format callback in endTimer in a try/catch so a throwing formatter (custom toString, JSON.stringify on a circular object, future heavier serializers) cannot abort the deferred resolution band. Observability code must never escalate to a load-bearing failure mode. On catch we emit a single `[deferred-profile] formatter error: …` line via logDeferredProfile and return; the caller's stage continues as if profiling had no-op'd for this timer. DoD §2.8 is satisfied — the failure is surfaced, not silently swallowed. Tests cover the four cases: happy path emits the formatted line, null start no-ops without invoking the formatter, throwing formatter is caught and surfaces one error line, non-Error throws are coerced via String() in the message.
…Profile (U4) Wrap logger.info inside logDeferredProfile in a try/catch so a throwing underlying logger cannot abort the deferred resolution band. Pino with sync:false (the current SonicBoom destination) does not throw synchronously for `info(string)` calls, but first-use construction paths (pino-pretty resolve, level validation) and any future transport reconfiguration could. The wrap is belt-and-suspenders coverage; the counter makes silent failures visible. A module-private droppedLogLines counter accumulates dropped lines. Two helpers — getDeferredProfileDroppedCount() and resetDeferredProfileDroppedCount() — expose the counter. The handler deliberately does NOT call the failing logger; that would risk an infinite loop if the failure is steady-state. processCallsFromExtracted resets the counter at entry (so each analyze run gets a fresh count rather than accumulating across the process lifetime — relevant for the MCP server, eval harness, integration tests), and surfaces the count in the done-summary as `note: N profile log lines dropped (logger errors)` when greater than zero. DoD §2.8 (no silent diagnostic catches) is satisfied. Tests cover the helper API (zero at entry, idempotent reset) and the happy path; the catch arm is pinned via source-shape assertion since the logger Proxy can't be vi.spyOn'd directly (lazy `get` trap, no own-property to wrap).
The env-var row mentioned integer / scientific notation only, but the underlying parser (`Number(raw)` since the U2 fix in PR abhigyanpatwari#1773) also accepts decimals like `.5` and hex like `0x10`. Document the actual acceptance set plus the non-finite / non-positive fallback so operators setting unusual values know what to expect.
Summary
[deferred-profile]structured logs for the post-chunk deferred resolution band: imports, heritage,buildHeritageMap, and legacyprocessCallsFromExtracted.gitnexus analyze -v/GITNEXUS_VERBOSE=1, so reporters debugging large-repo analyze stalls (issue GitNexus analyze gets stuck at 49% during “Resolving calls”. #1741) do not need a separate env var.lookupClassByNamechild×parent product).Motivation
Issue #1741: analyze appears stuck at Resolving calls (all chunks) on ~25k-file Java/Kotlin repos while CPU stays high. The UI only advances every 100 files and intermediate stages are silent in logs. v1.3.11 completes quickly; 1.6.x defers full-repo call resolution.
These logs pinpoint whether time is spent in imports, heritage map build, or specific pathological files.
How to use
Optional overrides:
GITNEXUS_PROFILE_DEFERRED=1— enable without full verbose ingestion noise (100-file log interval)GITNEXUS_PROFILE_DEFERRED_SLOW_MS=3000— slow-file thresholdLook for lines prefixed with
[deferred-profile].Test plan
npx vitest run test/unit/deferred-resolution-profile.test.tsnpx tsc --noEmit(gitnexus package)-vand confirm logs advance through deferred bandmax productis identified