fix(community): improve heuristic cluster labeling (#413) by ShunsukeHayashi · Pull Request #457 · abhigyanpatwari/GitNexus

ShunsukeHayashi · 2026-03-22T11:55:30Z

Summary

Most clusters get generic Cluster_N names because the heuristic labeling only looks at the immediate parent directory and skips a short list of 7 common folder names. This PR improves labeling with three strategies.

Addresses #413

Changes

1. Expanded generic folder skip list (7 → 28)

Added: internal, pkg, app, modules, components, services, controllers, models, views, types, interfaces, test(s), __tests__, spec(s), dist, build, out, bin, scripts

2. Grandparent directory fallback

For deeply nested structures like src/auth/middleware/validate.ts, the parent directory is middleware (generic). Now also considers auth (grandparent) as a candidate label, with 0.5 weight.

3. Dominant node type + symbol name

When no folder-based label works and no common name prefix exists, instead of Cluster_7:

Check the dominant node type in the cluster (Function, Class, etc.)
Combine with the shortest symbol name: Function_validateUser

Labeling cascade

1. Most-common non-generic parent folder → "Auth"
2. Grandparent directory (0.5 weight) → "Auth"  
3. Common name prefix → "validate"
4. Dominant type + shortest name → "Function_validateUser"
5. Fallback → "Cluster_7"

Tests

1900 passed | 1 skipped
npx tsc --noEmit  # No errors

…N names (#413) community-processor.ts: - Expand generic folder skip list from 7 to 22 entries (adds: internal, pkg, app, modules, components, services, controllers, models, views, types, interfaces, test, tests, __tests__, spec, specs, dist, build, out, bin, scripts) - Add grandparent directory fallback for deeply nested structures (e.g., src/auth/middleware/ now considers "auth" as a candidate) - Add dominant node type + symbol name as label before Cluster_N fallback (e.g., "Function_validateUser" instead of "Cluster_7") Before: Clusters defaulting to Cluster_N when parent folder was generic. After: Three additional labeling strategies before falling back to Cluster_N. Addresses #413

vercel · 2026-03-22T11:55:35Z

@ShunsukeHayashi is attempting to deploy a commit to the NexusCore Team on Vercel.

A member of the Team first needs to authorize it.

github-actions · 2026-03-22T12:01:30Z

CI Report

✅ All checks passed 6117f09

Pipeline

Stage	Status	Ubuntu	Windows	macOS
Typecheck	✅ `success`	—	—	—
Tests	✅ `success`	✅	✅	✅

Tests

Metric	Value
Total	3694
Passed	3674
Skipped	20
Files	1059
Duration	2m 0s

✅ All 3674 tests passed across 1059 files

20 test(s) skipped

buildTypeEnv > known limitations (documented skip tests) > Ruby block parameter: users.each { |user| } — closure param inference, different feature
Python match/case as-pattern type binding > resolves u.save() to User#save via match/case as-pattern binding
Python match/case as-pattern type binding > does NOT resolve u.save() to Repo#save (negative disambiguation)
Swift constructor-inferred type resolution > detects User and Repo classes, both with save methods
Swift constructor-inferred type resolution > resolves user.save() to Models/User.swift via constructor-inferred type
Swift constructor-inferred type resolution > resolves repo.save() to Models/Repo.swift via constructor-inferred type
Swift constructor-inferred type resolution > emits exactly 2 save() CALLS edges (one per receiver type)
Swift self resolution > detects User and Repo classes, each with a save function
Swift self resolution > resolves self.save() inside User.process to User.save, not Repo.save
Swift parent resolution > detects BaseModel and User classes plus Serializable protocol
Swift parent resolution > emits EXTENDS edge: User → BaseModel
Swift parent resolution > emits IMPLEMENTS edge: User → Serializable (protocol conformance)
Swift cross-file User.init() inference > resolves user.save() via User.init(name:) inference
Swift cross-file User.init() inference > resolves user.greet() via User.init(name:) inference
Swift return type inference > detects User class and getUser function
Swift return type inference > detects save function on User (Swift class methods are Function nodes)
Swift return type inference > resolves user.save() to User#save via return type of getUser() -> User
Swift return-type inference via function return type > resolves user.save() to User#save via return type of getUser()
Swift return-type inference via function return type > user.save() does NOT resolve to Repo#save
Swift return-type inference via function return type > resolves repo.save() to Repo#save via return type of getRepo()

Coverage

Metric	Coverage	Covered	Base (main)	Delta
Statements	69.29%	9253/13354	69.06%	📈 +0.2%
Branches	60.2%	6258/10394	60.01%	📈 +0.2%
Functions	72.39%	826/1141	71.88%	📈 +0.5%
Lines	71.54%	8256/11539	71.34%	📈 +0.2%

_{📋 Full run · Coverage from Ubuntu · Generated by CI}

magyargergo · 2026-03-22T13:48:20Z

@claude review performance and check what's stated in the summary is actually delivered in this PR.

github-actions · 2026-03-22T13:48:37Z

Claude finished @magyargergo's task in 1m 24s —— View job

PR Review: Performance & Summary Verification

Gather context
Read changed file (community-processor.ts)
Verify PR summary claims are delivered in code
Analyze performance implications
Post full review

Summary Verification

All three strategies described in the PR summary are implemented correctly.

Claim	Delivered?	Notes
Expanded generic folder skip list	✅ Yes	But count is wrong — see below
Grandparent directory fallback with 0.5 weight	✅ Yes	Implemented at lines 322–328
Dominant node type + shortest symbol name	✅ Yes	Implemented at lines 363–388
Labeling cascade (5-step order)	✅ Yes	Order matches description

Correctness Issues

1. Skip list count is wrong in the PR description

The PR claims the list grows from 7 → 22 entries, but the actual Set in the code contains 28 entries:

src, lib, core, utils, common, shared, helpers,        ← original 7
internal, pkg, app, modules, components, services,
controllers, models, views, types, interfaces,
test, tests, __tests__, spec, specs,
dist, build, out, bin, scripts                         ← 21 additions = 28 total

The PR description uses test(s) and spec(s) as shorthand, but the code correctly adds both test/tests and spec/specs as separate entries (which is good). The count just needs correcting in the description — minor doc issue.

Performance Issues

2. GENERIC_FOLDERS Set is recreated on every loop iteration — highest impact issue

// community-processor.ts line 304
memberIds.forEach(nodeId => {
  // ...
  const GENERIC_FOLDERS = new Set([  // ← new Set() on EVERY node
    'src', 'lib', 'core', ...28 items
  ]);

For a cluster with N members, this allocates N Set objects, each with 28 string lookups during construction. This should be a module-level constant — it never changes:

// At module scope, outside the function
const GENERIC_FOLDERS = new Set([
  'src', 'lib', 'core', 'utils', 'common', 'shared', 'helpers',
  // ...
]);

Fix this →

3. Three separate passes over memberIds

memberIds is iterated three times in sequence:

Line 304: folder counting
Line 350: name collection
Line 365: type counting

For large clusters these can be merged into a single pass. That said, community sizes are typically bounded and the per-iteration work is cheap (map lookups/sets), so this is a minor concern compared to issue #2.

4. names.sort() mutates the array in-place

// line 382
const sortedNames = names.sort((a, b) => a.length - b.length);

Array.sort mutates names. The names array is only read once before this point (for findCommonPrefix), so there's no actual bug, but it's a subtle footgun. [...names].sort(...) or names.toSorted(...) (TS5+) would be safer.

Minor Observations

The PR description's labeling cascade describes grandparent as a standalone step 2, but in the code it's blended into the same folderCounts map as step 1 (just with 0.5 weight). This means grandparent can win over parent if multiple nodes share the same grandparent. The behavior is correct but the documentation implies a stricter priority order than what's implemented.
No new tests were added for the three new strategies. The PR cites 1900 passed (likely pre-existing tests) — it would be good to have at least one test covering the grandparent fallback and dominant-type paths.

Verdict

The implementation delivers what the summary promises and the logic is correct. The main actionable issue is moving GENERIC_FOLDERS outside the loop — it's a straightforward fix that eliminates unnecessary allocations per node in every cluster labeling call.

Copilot

Pull request overview

Improves community (cluster) heuristic labeling to reduce generic Cluster_N names by extracting stronger signals from member file paths and symbol metadata, addressing issue #413.

Changes:

Expands the list of “generic” folder names that should not be used as labels.
Adds a grandparent-directory fallback (weighted) for deeply nested paths when the parent is generic.
Adds a final fallback that combines the dominant node type with a short symbol name when no folder/prefix label is found.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-03-22T13:50:58Z

+      const GENERIC_FOLDERS = new Set([
+        'src', 'lib', 'core', 'utils', 'common', 'shared', 'helpers',
+        'internal', 'pkg', 'app', 'modules', 'components', 'services',
+        'controllers', 'models', 'views', 'types', 'interfaces',
+        'test', 'tests', '__tests__', 'spec', 'specs',
+        'dist', 'build', 'out', 'bin', 'scripts',
+      ]);


GENERIC_FOLDERS is re-created for every memberIds.forEach iteration, which adds avoidable overhead and makes the set harder to maintain. Define it once (module-level constant, or at least once per generateHeuristicLabel call) and reuse it inside the loop.

Copilot · 2026-03-22T13:50:58Z

+    const labels = graph.getNodeAttribute(nodeId, 'labels');
+    const nodeType = Array.isArray(labels) ? labels[0] : (typeof labels === 'string' ? labels : '');


The dominant-type labeling logic reads graph.getNodeAttribute(nodeId, 'labels'), but nodes added in buildGraphologyGraph use the attribute key type (and name/filePath). As written, labels will be undefined and dominantType will never be selected, so this fallback won’t work. Use the existing type attribute (or ensure labels is actually set on nodes) when counting types.

Suggested change

const labels = graph.getNodeAttribute(nodeId, 'labels');

const nodeType = Array.isArray(labels) ? labels[0] : (typeof labels === 'string' ? labels : '');

// Prefer explicit `type` attribute; fall back to `labels` if present

const typeAttr = graph.getNodeAttribute(nodeId, 'type');

const labelsAttr = typeAttr ? undefined : graph.getNodeAttribute(nodeId, 'labels');

const nodeType = typeof typeAttr === 'string'

? typeAttr

: (Array.isArray(labelsAttr)

? labelsAttr[0]

: (typeof labelsAttr === 'string' ? labelsAttr : ''));

Copilot · 2026-03-22T13:50:58Z

+  // Use dominant type + most-connected symbol name
+  if (dominantType && names.length > 0) {
+    const sortedNames = names.sort((a, b) => a.length - b.length);
+    const shortestName = sortedNames[0];
+    return `${dominantType}_${shortestName}`;


The comment says "Use dominant type + most-connected symbol name", but the implementation sorts by string length and picks the shortest name. Please update the comment to match the actual behavior (or adjust the implementation if “most-connected” was intended).

Resolves the critical issue flagged in review: the GENERIC_FOLDERS Set (28 entries) was being recreated inside memberIds.forEach on every node iteration. For a cluster with N members this allocates N Set objects unnecessarily. Moved to a module-level const so it is created once at load time and shared across all calls to generateHeuristicLabel.

Spread to a new array before sorting to avoid mutating the original names array, as noted in review feedback.

ShunsukeHayashi · 2026-03-22T14:08:12Z

Both issues from the review have been addressed:

GENERIC_FOLDERS module scope (commit 908b264): Moved from inside the memberIds.forEach loop to a module-level constant — resolves the N Set allocations per community.
names.sort() mutation (commit b6ff67f): Changed to [...names].sort() to avoid mutating the original array.

Thanks for the thorough review!

magyargergo · 2026-03-22T15:58:32Z

@claude run a final review if it is delivered what's in the pr summary. Do we clustering properly?

github-actions · 2026-03-22T15:58:44Z

Claude finished @magyargergo's task in 1m 15s —— View job

Final Review: Cluster Labeling Implementation

Gather context
Read changed file (community-processor.ts)
Verify all 3 PR summary strategies are delivered
Check for the labels vs type attribute bug flagged by Copilot
Assess overall clustering correctness
Post final review

Summary Verification

Strategy	Status	Notes
1. Expanded generic folder skip list	✅ Delivered	28 entries at module scope (fixed from inner loop in follow-up commits)
2. Grandparent directory fallback (0.5 weight)	✅ Delivered	Lines 327–332, blended into same `folderCounts` map
3. Dominant node type + shortest symbol name	❌ Broken — dead code	See critical bug below

Critical Bug: Strategy 3 Is Dead Code

The dominant-type fallback at line 370 reads the labels attribute:

// community-processor.ts line 370
const labels = graph.getNodeAttribute(nodeId, 'labels');
const nodeType = Array.isArray(labels) ? labels[0] : (typeof labels === 'string' ? labels : '');

But nodes are added to the graphology graph in buildGraphologyGraph with the attribute key type, not labels:

// community-processor.ts line 213–217
graph.addNode(node.id, {
  name: node.properties.name,
  filePath: node.properties.filePath,
  type: node.label,   // ← key is 'type'
});

graph.getNodeAttribute(nodeId, 'labels') will always return undefined. This means nodeType is always '', typeCounts is always empty, and dominantType is never set. The entire Strategy 3 fallback never fires — the code falls straight through to Cluster_N, the exact outcome this PR was meant to reduce.

The fix is a one-liner:

// Change:
const labels = graph.getNodeAttribute(nodeId, 'labels');
const nodeType = Array.isArray(labels) ? labels[0] : (typeof labels === 'string' ? labels : '');

// To:
const nodeType = graph.getNodeAttribute(nodeId, 'type') ?? '';

Fix this →

Minor Issues

Comment mismatch (line 384): The comment says "Use dominant type + most-connected symbol name", but the code sorts by string length and picks the shortest name, not most-connected. Either update the comment or implement degree-based selection (most-connected by graph edges is arguably better for label quality). Low priority, but misleading.

Cascade is weighted, not ordered: The PR description implies a strict priority order (grandparent only considered if parent fails), but the actual implementation blends parent and grandparent into the same folderCounts map with weights. A grandparent shared by many nodes can outrank a parent. The behavior is reasonable but differs from the documented cascade — worth noting in the PR description.

names.length > 2 threshold: The prefix check requires more than 2 names. This means a 2-member cluster skips the prefix step entirely. Likely intentional (prefix of 2 strings is noisy), but undocumented.

Verdict

Strategies 1 and 2 are correctly implemented. Strategy 3 is broken due to a wrong attribute key ('labels' vs 'type'). The PR will continue producing Cluster_N names for any community that has no clear folder signal or name prefix, because the dominant-type fallback silently no-ops. This needs to be fixed before merge.

magyargergo · 2026-03-23T07:46:40Z

@ShunsukeHayashi Could you please look at these findings? 🙏

The dominant node type fallback was reading `graph.getNodeAttribute(nodeId, 'labels')` but nodes are stored with key `type` (set in buildGraphologyGraph as `type: node.label`). This caused `labels` to always be undefined, `typeCounts` to always be empty, and Strategy 3 to silently no-op — falling straight through to `Cluster_N`. Fixes: - Use `graph.getNodeAttribute(nodeId, 'type')` instead of `labels` - Fix comment: 'most-connected symbol name' → 'shortest symbol name' - Use `[...names].sort()` to avoid mutating the original array Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

ShunsukeHayashi · 2026-03-23T07:50:23Z

Thanks for the review @magyargergo! The findings are correct — Strategy 3 was indeed dead code.

Fixes applied in 6117f09:

Critical bug fixed: Changed graph.getNodeAttribute(nodeId, 'labels') → graph.getNodeAttribute(nodeId, 'type') ?? ''
- Nodes are stored with key type in buildGraphologyGraph (type: node.label), so reading 'labels' always returned undefined
- Strategy 3 will now correctly fire and produce labels like Function_validateUser
Comment mismatch fixed: // Use dominant type + most-connected symbol name → // Use dominant type + shortest symbol name
Non-mutating sort: names.sort(...) → [...names].sort(...) (the remote branch already had this from b6ff67f, now confirmed in the rebased commit)

All three strategies in the cascade are now functional.

magyargergo · 2026-03-23T08:00:16Z

@claude could you propose a good e2e test for this to usecase the strength of this algorithm? Also think about negative cases.

github-actions · 2026-03-23T08:00:28Z

Claude finished @magyargergo's task in 4m 13s —— View job

E2E Test Proposal for Cluster Labeling

Gather context
Read implementation and existing tests
Propose e2e tests (positive and negative cases)

Proposed test file: `test/unit/cluster-labeling.test.ts`

The existing cohesion-consistency.test.ts already establishes the right pattern: exercise generateHeuristicLabel indirectly through processCommunities using isolated fully-connected cliques that Leiden will deterministically keep together. The proposed tests follow that pattern.

/**
 * Unit tests for the heuristic cluster labeling strategies in generateHeuristicLabel.
 *
 * Since generateHeuristicLabel is not exported, tests exercise it indirectly through
 * processCommunities. Each test builds an isolated fully-connected clique so that
 * Leiden assigns all nodes to a single community deterministically.
 *
 * Cascade under test:
 *   1. Most-common non-generic parent folder  → e.g. "Auth"
 *   2. Grandparent directory (0.5 weight)     → e.g. "Auth" when parent is generic
 *   3. Common name prefix (> 2 members, > 2 chars) → e.g. "Validate"
 *   4. Dominant type + shortest symbol name   → e.g. "Function_dispatch"
 *   5. Fallback                               → "Cluster_N"
 */
import { describe, it, expect } from 'vitest';
import { createKnowledgeGraph } from '../../src/core/graph/graph.js';
import type { GraphNode, GraphRelationship } from '../../src/core/graph/types.js';
import { processCommunities } from '../../src/core/ingestion/community-processor.js';

// ============================================================================
// HELPERS
// ============================================================================

function makeNode(id: string, name: string, label: GraphNode['label'], filePath: string): GraphNode {
  return { id, label, properties: { name, filePath, startLine: 1, endLine: 10, isExported: false } };
}

function makeRel(id: string, src: string, tgt: string): GraphRelationship {
  return { id, sourceId: src, targetId: tgt, type: 'CALLS', confidence: 1.0, reason: '' };
}

/**
 * Build a fully-connected clique and return the heuristic label of the resulting community.
 * All nodes default to 'Function' unless overridden.
 */
async function getLabelForNodes(
  nodes: Array<{ id: string; name: string; filePath: string; label?: GraphNode['label'] }>,
): Promise<string> {
  const graph = createKnowledgeGraph();
  for (const n of nodes) {
    graph.addNode(makeNode(n.id, n.name, n.label ?? 'Function', n.filePath));
  }
  let relIdx = 0;
  for (let i = 0; i < nodes.length; i++) {
    for (let j = i + 1; j < nodes.length; j++) {
      graph.addRelationship(makeRel(`r${relIdx++}`, nodes[i].id, nodes[j].id));
    }
  }
  const result = await processCommunities(graph);
  // A single isolated clique should produce exactly one community
  expect(result.communities).toHaveLength(1);
  return result.communities[0].heuristicLabel;
}

// ============================================================================
// STRATEGY 1: Parent folder label
// ============================================================================

describe('generateHeuristicLabel — cluster labeling', () => {
  describe('Strategy 1: parent folder label', () => {
    it('uses a non-generic parent folder as the label', async () => {
      const label = await getLabelForNodes([
        { id: 'n1', name: 'login',   filePath: 'src/auth/login.ts' },
        { id: 'n2', name: 'logout',  filePath: 'src/auth/logout.ts' },
        { id: 'n3', name: 'refresh', filePath: 'src/auth/refresh.ts' },
      ]);
      expect(label).toBe('Auth');
    });

    it('capitalizes the first letter of the folder name', async () => {
      const label = await getLabelForNodes([
        { id: 'n1', name: 'charge', filePath: 'src/payment/charge.ts' },
        { id: 'n2', name: 'refund', filePath: 'src/payment/refund.ts' },
        { id: 'n3', name: 'verify', filePath: 'src/payment/verify.ts' },
      ]);
      expect(label).toBe('Payment');
    });

    it('picks the most-common folder when members span multiple non-generic folders', async () => {
      // 3 nodes in "auth", 1 node in "billing" → "Auth" wins
      const label = await getLabelForNodes([
        { id: 'n1', name: 'login',   filePath: 'src/auth/login.ts' },
        { id: 'n2', name: 'logout',  filePath: 'src/auth/logout.ts' },
        { id: 'n3', name: 'refresh', filePath: 'src/auth/refresh.ts' },
        { id: 'n4', name: 'charge',  filePath: 'src/billing/charge.ts' },
      ]);
      expect(label).toBe('Auth');
    });

    // ── NEGATIVE ──────────────────────────────────────────────────────────────
    it('does NOT produce a label from generic parent folders (utils, src, lib, …)', async () => {
      const genericFolders = [
        'src', 'lib', 'core', 'utils', 'common', 'shared', 'helpers',
        'components', 'services', 'controllers', 'models',
      ];
      for (const folder of genericFolders) {
        const label = await getLabelForNodes([
          { id: `${folder}_n1`, name: 'fn1', filePath: `${folder}/action1.ts` },
          { id: `${folder}_n2`, name: 'fn2', filePath: `${folder}/action2.ts` },
          { id: `${folder}_n3`, name: 'fn3', filePath: `${folder}/action3.ts` },
        ]);
        const capitalized = folder.charAt(0).toUpperCase() + folder.slice(1);
        expect(label, `folder "${folder}" should be skipped`).not.toBe(capitalized);
      }
    });
  });

  // ============================================================================
  // STRATEGY 2: Grandparent directory fallback
  // ============================================================================

  describe('Strategy 2: grandparent directory fallback', () => {
    it('falls back to grandparent when immediate parent is generic (e.g. middleware)', async () => {
      // src/auth/middleware/validate.ts → parent="middleware" (generic), grandparent="auth"
      const label = await getLabelForNodes([
        { id: 'n1', name: 'validateToken',    filePath: 'src/auth/middleware/validateToken.ts' },
        { id: 'n2', name: 'sanitizeInput',    filePath: 'src/auth/middleware/sanitizeInput.ts' },
        { id: 'n3', name: 'checkPermissions', filePath: 'src/auth/middleware/checkPermissions.ts' },
      ]);
      expect(label).toBe('Auth');
    });

    it('strong parent folder (count=3) beats a shared grandparent (count=1.0)', async () => {
      // payment parent direct × 3  vs  auth grandparent × 2 nodes (each 0.5 = 1.0)
      const label = await getLabelForNodes([
        { id: 'n1', name: 'charge',  filePath: 'src/payment/charge.ts' },
        { id: 'n2', name: 'refund',  filePath: 'src/payment/refund.ts' },
        { id: 'n3', name: 'verify',  filePath: 'src/payment/verify.ts' },
        { id: 'n4', name: 'login',   filePath: 'src/auth/services/login.ts' },  // grandparent=auth +0.5
        { id: 'n5', name: 'logout',  filePath: 'src/auth/services/logout.ts' }, // grandparent=auth +0.5 = total 1.0
      ]);
      // payment(3.0) > auth(1.0) → "Payment"
      expect(label).toBe('Payment');
    });

    // ── NEGATIVE ──────────────────────────────────────────────────────────────
    it('does NOT use grandparent when it is also generic', async () => {
      // src/lib/utils/validate.ts → parent="utils" (generic), grandparent="lib" (generic)
      const label = await getLabelForNodes([
        { id: 'n1', name: 'validateEmail', filePath: 'src/lib/utils/validateEmail.ts' },
        { id: 'n2', name: 'validatePhone', filePath: 'src/lib/utils/validatePhone.ts' },
        { id: 'n3', name: 'validateDate',  filePath: 'src/lib/utils/validateDate.ts' },
      ]);
      expect(label).not.toBe('Utils');
      expect(label).not.toBe('Lib');
    });
  });

  // ============================================================================
  // STRATEGY 3: Common name prefix
  // ============================================================================

  describe('Strategy 3: common name prefix', () => {
    it('extracts common prefix from 3+ symbol names in a generic folder', async () => {
      const label = await getLabelForNodes([
        { id: 'n1', name: 'validateUser',    filePath: 'src/utils/validateUser.ts' },
        { id: 'n2', name: 'validateOrder',   filePath: 'src/utils/validateOrder.ts' },
        { id: 'n3', name: 'validatePayment', filePath: 'src/utils/validatePayment.ts' },
      ]);
      expect(label).toBe('Validate');
    });

    // ── NEGATIVE ──────────────────────────────────────────────────────────────
    it('does NOT use a prefix of 2 characters or fewer', async () => {
      // Common prefix "fn" — only 2 chars, should be skipped
      const label = await getLabelForNodes([
        { id: 'n1', name: 'fnApply',    filePath: 'src/utils/fnApply.ts' },
        { id: 'n2', name: 'fnCompose',  filePath: 'src/utils/fnCompose.ts' },
        { id: 'n3', name: 'fnPipe',     filePath: 'src/utils/fnPipe.ts' },
      ]);
      expect(label).not.toBe('Fn');
    });

    // ── NEGATIVE ──────────────────────────────────────────────────────────────
    it('skips prefix strategy when the cluster has only 2 members (names.length <= 2 check)', async () => {
      // Exactly 2 nodes — the `names.length > 2` guard blocks prefix extraction
      const label = await getLabelForNodes([
        { id: 'n1', name: 'validateUser',  filePath: 'src/utils/validateUser.ts' },
        { id: 'n2', name: 'validateOrder', filePath: 'src/utils/validateOrder.ts' },
      ]);
      // Falls through to dominant type strategy
      expect(label).toMatch(/^Function_/);
    });
  });

  // ============================================================================
  // STRATEGY 4: Dominant type + shortest symbol name
  // ============================================================================

  describe('Strategy 4: dominant type + shortest symbol name', () => {
    it('produces Function_<shortest> when no folder or prefix signal exists', async () => {
      // All generic folders, names have no common prefix
      const label = await getLabelForNodes([
        { id: 'n1', name: 'processRequest', filePath: 'src/utils/processRequest.ts' },
        { id: 'n2', name: 'handleEvent',    filePath: 'lib/helpers/handleEvent.ts' },
        { id: 'n3', name: 'dispatch',       filePath: 'src/core/dispatch.ts' },
      ]);
      // Sorted by length: dispatch(8) < handleEvent(11) < processRequest(14)
      expect(label).toBe('Function_dispatch');
    });

    it('uses the dominant Class type when the majority of nodes are Classes', async () => {
      const label = await getLabelForNodes([
        { id: 'n1', name: 'UserService', label: 'Class',    filePath: 'src/services/UserService.ts' },
        { id: 'n2', name: 'AuthService', label: 'Class',    filePath: 'src/services/AuthService.ts' },
        { id: 'n3', name: 'configure',   label: 'Function', filePath: 'src/services/configure.ts' },
      ]);
      // 2 Class, 1 Function — dominant=Class, shortest name = "configure" (9 chars)
      expect(label).toBe('Class_configure');
    });

    // ── NEGATIVE ──────────────────────────────────────────────────────────────
    it('picks the shortest name, not the alphabetically first name', async () => {
      const label = await getLabelForNodes([
        { id: 'n1', name: 'aaaVeryLongFunctionName', filePath: 'src/utils/a.ts' },
        { id: 'n2', name: 'zzz',                     filePath: 'lib/helpers/b.ts' },
        { id: 'n3', name: 'mmmmmmm',                 filePath: 'src/core/c.ts' },
      ]);
      // "zzz" is shortest (3 chars) even though it sorts last alphabetically
      expect(label).toBe('Function_zzz');
    });
  });

  // ============================================================================
  // CASCADE PRIORITY
  // ============================================================================

  describe('Cascade priority', () => {
    it('folder label takes precedence over a common name prefix', async () => {
      // Both signals present: nodes in "auth/" AND names share "validate" prefix
      const label = await getLabelForNodes([
        { id: 'n1', name: 'validateToken',   filePath: 'src/auth/validateToken.ts' },
        { id: 'n2', name: 'validateSession', filePath: 'src/auth/validateSession.ts' },
        { id: 'n3', name: 'validateCookie',  filePath: 'src/auth/validateCookie.ts' },
      ]);
      expect(label).toBe('Auth');   // not "Validate"
    });

    it('name prefix takes precedence over dominant-type label', async () => {
      const label = await getLabelForNodes([
        { id: 'n1', name: 'parseJSON', filePath: 'src/utils/parseJSON.ts' },
        { id: 'n2', name: 'parseXML',  filePath: 'src/utils/parseXML.ts' },
        { id: 'n3', name: 'parseCSV',  filePath: 'src/utils/parseCSV.ts' },
      ]);
      // "parse" (5 chars > 2) wins over Function_parseCSV
      expect(label).toBe('Parse');
    });
  });

  // ============================================================================
  // EDGE CASES
  // ============================================================================

  describe('Edge cases', () => {
    it('singleton nodes are excluded from community results', async () => {
      const graph = createKnowledgeGraph();
      graph.addNode(makeNode('lone', 'loneFunction', 'Function', 'src/auth/lone.ts'));
      // No edges — will be a singleton and filtered out
      const result = await processCommunities(graph);
      expect(result.communities).toHaveLength(0);
    });

    it('nodes with empty filePath fall back gracefully to type-based label', async () => {
      const label = await getLabelForNodes([
        { id: 'n1', name: 'anonymousFn', filePath: '' },
        { id: 'n2', name: 'anotherFn',   filePath: '' },
        { id: 'n3', name: 'thirdFn',     filePath: '' },
      ]);
      // No path signal, no common prefix → type-based label
      expect(label).toMatch(/^(Function_|Cluster_)/);
    });

    it('two isolated cliques in one graph get distinct, descriptive labels', async () => {
      const graph = createKnowledgeGraph();

      // Clique 1: auth
      for (let i = 0; i < 3; i++) {
        graph.addNode(makeNode(`a${i}`, `authFn${i}`, 'Function', `src/auth/f${i}.ts`));
      }
      graph.addRelationship(makeRel('r1', 'a0', 'a1'));
      graph.addRelationship(makeRel('r2', 'a1', 'a2'));
      graph.addRelationship(makeRel('r3', 'a0', 'a2'));

      // Clique 2: payment (isolated — no cross edges)
      for (let i = 0; i < 3; i++) {
        graph.addNode(makeNode(`p${i}`, `paymentFn${i}`, 'Function', `src/payment/f${i}.ts`));
      }
      graph.addRelationship(makeRel('r4', 'p0', 'p1'));
      graph.addRelationship(makeRel('r5', 'p1', 'p2'));
      graph.addRelationship(makeRel('r6', 'p0', 'p2'));

      const result = await processCommunities(graph);
      expect(result.communities).toHaveLength(2);

      const labels = result.communities.map(c => c.heuristicLabel);
      expect(labels).toContain('Auth');
      expect(labels).toContain('Payment');
    });
  });
});

Coverage summary

Test	Strategy	Type
non-generic parent folder used as label	1	✅ positive
folder name is capitalized	1	✅ positive
most-common folder wins across mixed paths	1	✅ positive
generic folders (`src`, `utils`, `lib`, …) are never used	1	❌ negative
grandparent used when parent is generic	2	✅ positive
strong parent (×3) beats shared grandparent (×1.0)	2	✅ positive
generic grandparent also skipped	2	❌ negative
common prefix extracted from ≥3 names in generic folder	3	✅ positive
prefix of ≤2 chars skipped	3	❌ negative
prefix skipped when cluster has exactly 2 members	3	❌ negative
`Function_<shortest>` when no folder/prefix signal	4	✅ positive
`Class_<shortest>` when majority are Classes	4	✅ positive
shortest name wins over alphabetically-first name	4	❌ negative
folder wins over name prefix (cascade order)	cascade	✅ positive
name prefix wins over dominant type (cascade order)	cascade	✅ positive
singleton nodes excluded from results	edge	✅ positive
empty `filePath` handled gracefully	edge	✅ positive
two isolated cliques → two distinct descriptive labels	edge	✅ positive

Notes

Cluster_N is now unreachable — all community members have a valid graphology type attribute set by buildGraphologyGraph, so dominantType is always populated. The Cluster_N fallback at line 394 is dead code. It might be worth adding an explicit test that confirms this and/or removing the dead branch.
names.length > 2 threshold is undocumented — the test for exactly-2-member clusters makes this guard explicit and prevents regressions if someone changes it.
Grandparent blending vs strict cascade — the grandparent node can win outright over a parent if many nodes share the same grandparent. The test 'strong parent (×3) beats shared grandparent (×1.0)' documents this weighted behaviour so it doesn't get "fixed" away.

magyargergo · 2026-03-23T15:47:49Z

@ShunsukeHayashi can you please look into this? 🙏

magyargergo · 2026-03-28T14:43:40Z

⚠️ Upcoming Prettier formatting — rebase instructions

PR #563 adds Prettier as the code formatter for the repo. When it merges, the bulk format commit will touch ~350 files (style-only: whitespace, quotes, trailing commas). Your branch will likely conflict.

After #563 merges, rebase your branch:

git fetch origin
git checkout <your-branch>
git rebase origin/main

# Conflicts will be formatting-only — accept your version:
git checkout --theirs .
git add .
git rebase --continue

# Then re-format your branch to match the new style:
npx prettier --write .
git add -A
git commit -m "style: apply prettier formatting"
git push --force-with-lease

New setup step: Run npm install at the repo root (not just in gitnexus/) to get prettier + activate the pre-commit hook. The hook auto-formats staged files on every commit going forward.

magyargergo requested a review from Copilot March 22, 2026 13:48

Copilot started reviewing on behalf of magyargergo March 22, 2026 13:48 View session

Copilot AI reviewed Mar 22, 2026

View reviewed changes

ShunsukeHayashi added 2 commits March 22, 2026 23:02

style: avoid in-place sort mutation on names array

b6ff67f

Spread to a new array before sorting to avoid mutating the original names array, as noted in review feedback.

ShunsukeHayashi closed this Apr 7, 2026

		const labels = graph.getNodeAttribute(nodeId, 'labels');
		const nodeType = Array.isArray(labels) ? labels[0] : (typeof labels === 'string' ? labels : '');

-    const labels = graph.getNodeAttribute(nodeId, 'labels');
-    const nodeType = Array.isArray(labels) ? labels[0] : (typeof labels === 'string' ? labels : '');
+    // Prefer explicit `type` attribute; fall back to `labels` if present
+    const typeAttr = graph.getNodeAttribute(nodeId, 'type');
+    const labelsAttr = typeAttr ? undefined : graph.getNodeAttribute(nodeId, 'labels');
+    const nodeType = typeof typeAttr === 'string'
+      ? typeAttr
+      : (Array.isArray(labelsAttr)
+          ? labelsAttr[0]
+          : (typeof labelsAttr === 'string' ? labelsAttr : ''));

Uh oh!

Conversation

ShunsukeHayashi commented Mar 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes

1. Expanded generic folder skip list (7 → 28)

2. Grandparent directory fallback

3. Dominant node type + symbol name

Labeling cascade

Tests

Uh oh!

vercel Bot commented Mar 22, 2026

Uh oh!

github-actions Bot commented Mar 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

CI Report

Pipeline

Tests

Coverage

Uh oh!

magyargergo commented Mar 22, 2026

Uh oh!

github-actions Bot commented Mar 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Review: Performance & Summary Verification

Summary Verification

Correctness Issues

Performance Issues

Minor Observations

Verdict

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI Mar 22, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 22, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 22, 2026

Choose a reason for hiding this comment

Uh oh!

ShunsukeHayashi commented Mar 22, 2026

Uh oh!

magyargergo commented Mar 22, 2026

Uh oh!

github-actions Bot commented Mar 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Final Review: Cluster Labeling Implementation

Summary Verification

Critical Bug: Strategy 3 Is Dead Code

Minor Issues

Verdict

Uh oh!

magyargergo commented Mar 23, 2026

Uh oh!

ShunsukeHayashi commented Mar 23, 2026

Uh oh!

magyargergo commented Mar 23, 2026

Uh oh!

github-actions Bot commented Mar 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

E2E Test Proposal for Cluster Labeling

Proposed test file: test/unit/cluster-labeling.test.ts

Coverage summary

Notes

Uh oh!

magyargergo commented Mar 23, 2026

Uh oh!

magyargergo commented Mar 28, 2026

⚠️ Upcoming Prettier formatting — rebase instructions

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

ShunsukeHayashi commented Mar 22, 2026 •

edited

Loading

github-actions Bot commented Mar 22, 2026 •

edited

Loading

github-actions Bot commented Mar 22, 2026 •

edited

Loading

github-actions Bot commented Mar 22, 2026 •

edited

Loading

github-actions Bot commented Mar 23, 2026 •

edited

Loading

Proposed test file: `test/unit/cluster-labeling.test.ts`