Skip to content

feat: add markdown file indexing (headings + cross-links)#399

Merged
abhigyanpatwari merged 1 commit into
mainfrom
feat/markdown-indexing
Mar 20, 2026
Merged

feat: add markdown file indexing (headings + cross-links)#399
abhigyanpatwari merged 1 commit into
mainfrom
feat/markdown-indexing

Conversation

@abhigyanpatwari

Copy link
Copy Markdown
Owner

Summary

Supersedes #286 — all review feedback from that PR has been incorporated (endLine spans, dedup, level property, .mdx support, unused code removed).

What it does

  • Headings (# h1 through ###### h6) become Section nodes with hierarchy via CONTAINS edges (File→Section, Section→Section for nested headings)
  • Links ([text](relative/path.md)) that resolve to files within the repo become IMPORTS edges between File nodes
  • External URLs, anchors, and mailto links are ignored
  • Zero new dependencies — uses simple regex patterns

Files changed (5 modified, 1 new)

  • src/core/graph/types.ts — Add Section to NodeLabel union + level property
  • src/core/lbug/schema.ts — Add Section node table + relation entries
  • src/core/lbug/csv-generator.ts — Add Section to CSV writers
  • src/core/ingestion/pipeline.ts — Insert markdown processing step
  • src/core/ingestion/markdown-processor.ts — New file, ~157 lines

Test plan

  • Build succeeds (tsc clean) ✅ verified locally
  • Indexing a code-only repo works unchanged
  • Indexing a markdown-heavy repo produces correct Section nodes
  • MCP tools (query, cypher) can find Section nodes

🤖 Generated with Claude Code

Co-Authored-By: Dennis Palatov dp-web4@users.noreply.github.com

Parse .md/.mdx files using regex (no tree-sitter dependency) to extract:
- Section nodes from headings (h1-h6) with hierarchy via CONTAINS edges
- Cross-file IMPORTS edges from markdown links to other repo files

Ported from #286 to resolve conflicts with kuzu→lbug rename.

Co-Authored-By: Dennis Palatov <dp-web4@users.noreply.github.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@vercel

vercel Bot commented Mar 20, 2026

Copy link
Copy Markdown

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
gitnexus Ready Ready Preview, Comment Mar 20, 2026 9:48pm

Request Review

@abhigyanpatwari abhigyanpatwari merged commit 00758b1 into main Mar 20, 2026
3 of 6 checks passed
@github-actions

Copy link
Copy Markdown
Contributor

CI Report

Some checks failed4746675

Pipeline

Stage Status Ubuntu Windows macOS
Typecheck success
Tests failure

Coverage

Metric Coverage Covered Base (main) Delta
Statements 0% 0/12764 68.16% 📉 -68.2%
Branches 0% 0/9978 59.66% 📉 -59.7%
Functions 0% 0/1073 70.13% 📉 -70.1%
Lines 0% 0/11031 70.49% 📉 -70.5%

📋 Full run · Coverage from Ubuntu · Generated by CI

dp-web4 added a commit to dp-web4/GitNexus that referenced this pull request Mar 20, 2026
…→lbug migration

The kuzu→lbug migration (abhigyanpatwari#275) didn't carry forward three pieces from
the markdown indexing PR (abhigyanpatwari#399):

1. 'Section' missing from NODE_TABLES constant — LadybugDB type system
   doesn't recognize Section as a valid node type
2. SECTION_SCHEMA missing from NODE_SCHEMA_QUERIES — Section table never
   created in the database (already fixed in abhigyanpatwari#399 merge, confirming)
3. getCopyQuery falls through to 7-column multi-lang default for Section,
   but Section CSV has 8 columns (includes 'level'). Causes:
   "Binder exception: Number of columns mismatch. Expected 7 but got 8"

Reproduces on any repo with .md files. Tested fix against a 2K+ markdown
file repo (40K nodes, 37K edges) — indexes in 38s with no crashes.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
abhigyanpatwari pushed a commit that referenced this pull request Mar 21, 2026
Release includes markdown indexing (#399) and the Section table
registration fix (#401), making PR #403 unnecessary.

https://claude.ai/code/session_015WxcTDYrGi4sWY8iY8gDZP
icodebuster pushed a commit to icodebuster/GitNexus that referenced this pull request Mar 22, 2026
* main: (67 commits)
  fix(server): allow private/LAN network origins in CORS (abhigyanpatwari#390)
  fix(ingestion): calculate confidence per resolution tier for heritage/MRO edges (abhigyanpatwari#412)
  fix(lbug): retry on DB lock with session-safe cleanup (abhigyanpatwari#325)
  fix(analyze): address review — rename --no-git to --skip-git, fix stale cache
  fix(analyze): address Copilot review — ESM import, CLI option, .gitignore guard
  docs(schema): add Community and Process node properties to cypher tool description (abhigyanpatwari#411)
  fix(analyze): allow indexing folders without a .git directory (abhigyanpatwari#384)
  token trunking
  updated mahalanobis threshold to be multi-dim aware
  fix: clarify that ORT CUDA binaries are linux/x64 only
  updating workflow verbage
  fixed prop cutoff issue for pr/issue filtering
  fix: update symbol and relationship counts in documentation
  fix: address PR abhigyanpatwari#409 review findings (P0-P3) and simplify import resolution API
  refactor: unify language dispatch with compile-time exhaustive tables
  feat: implement cross-file binding propagation for multiple languages
  fix: hydrate worker DB in server mode + fix LadybugDB getAll API mismatch (abhigyanpatwari#398) (abhigyanpatwari#404)
  docs: add gitnexus-stable-ops to community integrations
  fix: register Section in NODE_TABLES and NODE_SCHEMA_QUERIES (abhigyanpatwari#401)
  feat: add markdown file indexing (headings + cross-links) (abhigyanpatwari#399)
  ...
motolese pushed a commit to motolese/datamoto-gitnexus that referenced this pull request Apr 23, 2026
…twari#399)

feat: add markdown file indexing (headings + cross-links)

Ports abhigyanpatwari#286 by @dp-web4 onto current main, resolving conflicts from kuzu→lbug rename.
Closes abhigyanpatwari#286

Co-Authored-By: Dennis Palatov <dp-web4@users.noreply.github.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant