feat: source-sink security scanning with BFS path discovery by marxo126 · Pull Request #578 · abhigyanpatwari/GitNexus

marxo126 · 2026-03-28T19:51:04Z

Summary

New source_sink MCP tool for security-relevant data path scanning
Identifies source functions (user input, env vars, file reads) and sink functions (SQL queries, command exec, file writes) using configurable pattern catalogs
BFS over CALLS edges to discover reachable paths from sources to sinks
OWASP category filtering (A03-injection, A07-xss, A10-ssrf)
User-defined catalog extensions via .gitnexus/security.json
Risk-ranked findings (critical/high/medium)

Split from #561 — this PR contains only the source-sink scanning feature. Parameter data flow tracking (PASSES_TO, DATA_FLOWS_TO) will be addressed separately as part of the PDG subsystem (#567).

Test plan

30 unit tests for source-sink scanner and catalogs
Smoke-tested on real project (fluentiagrant-app): 1,417 sources, 2,536 sinks, 3,913 paths found
All 2,484 existing unit tests pass
TypeScript compiles clean

🤖 Generated with Claude Code

vercel · 2026-03-28T19:51:08Z

@TESTPERSONAL is attempting to deploy a commit to the NexusCore Team on Vercel.

A member of the Team first needs to authorize it.

github-actions · 2026-03-28T19:59:19Z

CI Report

✅ All checks passed

Pipeline Status

Stage	Status	Details
✅ Typecheck	`success`	tsc --noEmit
✅ Tests	`success`	unit tests, 3 platforms
✅ E2E	`success`	gitnexus-web changes only

Test Results

Tests	Passed	Failed	Skipped	Duration
6639	6542	0	97	252s

✅ All 6542 tests passed

97 test(s) skipped — expand for details

Swift MethodExtractor > isTypeDeclaration > recognizes class_declaration
Swift MethodExtractor > isTypeDeclaration > recognizes protocol_declaration
Swift MethodExtractor > isTypeDeclaration > rejects import_declaration
Swift MethodExtractor > visibility > extracts public method
Swift MethodExtractor > visibility > extracts private method
Swift MethodExtractor > visibility > defaults to internal when no modifier
Swift MethodExtractor > protocol methods > marks protocol method as abstract
Swift MethodExtractor > static and class methods > detects static func as isStatic
Swift MethodExtractor > static and class methods > detects class func as isStatic
Swift MethodExtractor > parameters > extracts parameters with types and default values
Swift MethodExtractor > return type > extracts return type from -> annotation
Swift MethodExtractor > annotations > extracts @objc attribute
Swift MethodExtractor > isFinal > detects final func
Swift MethodExtractor > isFinal > is false when not final
Swift MethodExtractor > isAsync > detects async func
Swift MethodExtractor > isOverride > detects override method
buildTypeEnv > constructor inference (Tier 1 fallback) > lookupClassByName regression coverage > Swift lookupClassByName regression coverage > Swift cross-file constructor inference uses lookupClassByName
buildTypeEnv > constructor inference (Tier 1 fallback) > lookupClassByName regression coverage > Swift lookupClassByName regression coverage > Swift explicit init inference uses lookupClassByName
buildTypeEnv > constructor inference (Tier 1 fallback) > lookupClassByName regression coverage > Swift lookupClassByName regression coverage > Swift cross-file constructor inference does not bind plain functions
buildTypeEnv > known limitations (documented skip tests) > Ruby block parameter: users.each { |user| } — closure param inference, different feature
Swift constructor-inferred type resolution > detects User and Repo classes, both with save methods
Swift constructor-inferred type resolution > resolves user.save() to Models/User.swift via constructor-inferred type
Swift constructor-inferred type resolution > resolves repo.save() to Models/Repo.swift via constructor-inferred type
Swift constructor-inferred type resolution > emits exactly 2 save() CALLS edges (one per receiver type)
Swift self resolution > detects User and Repo classes, each with a save function
Swift self resolution > resolves self.save() inside User.process to User.save, not Repo.save
Swift parent resolution > detects BaseModel and User classes plus Serializable protocol
Swift parent resolution > emits EXTENDS edge: User → BaseModel
Swift parent resolution > emits IMPLEMENTS edge: User → Serializable (protocol conformance)
Swift cross-file User.init() inference > resolves user.save() via User.init(name:) inference
Swift cross-file User.init() inference > resolves user.greet() via User.init(name:) inference
Swift return type inference > detects User class and getUser function
Swift return type inference > detects save function on User (Swift class methods are Function nodes)
Swift return type inference > resolves user.save() to User#save via return type of getUser() -> User
Swift return-type inference via function return type > resolves user.save() to User#save via return type of getUser()
Swift return-type inference via function return type > user.save() does NOT resolve to Repo#save
Swift return-type inference via function return type > resolves repo.save() to Repo#save via return type of getRepo()
Swift implicit imports (cross-file visibility) > detects UserService class in Models.swift
Swift implicit imports (cross-file visibility) > resolves UserService() constructor call across files (no explicit import)
Swift implicit imports (cross-file visibility) > resolves service.fetchUser() member call across files
Swift implicit imports (cross-file visibility) > creates IMPORTS edges between files in the same module
Swift extension deduplication > detects Product class
Swift extension deduplication > resolves Product() constructor despite extension creating duplicate class node
Swift extension deduplication > resolves product.save() to Product.swift (primary definition)
Swift constructor call fallback (no new keyword) > resolves OCRService() as constructor call across files
Swift constructor call fallback (no new keyword) > resolves ocr.recognize() member call via constructor-inferred type
Swift export visibility (internal vs private) > resolves PublicService() constructor across files
Swift export visibility (internal vs private) > resolves internalHelper() across files (internal = module-scoped)
Swift if let / guard let binding resolution > detects User and Repo classes
Swift if let / guard let binding resolution > resolves user.save() inside if-let to User#save
Swift if let / guard let binding resolution > resolves repo.save() inside guard-let to Repo#save
Swift if let / guard let binding resolution > user.save() in if-let does NOT resolve to Repo#save
Swift await / try expression unwrapping > resolves user.save() via await fetchUser() return type
Swift await / try expression unwrapping > resolves repo.save() via try parseRepo() return type
Swift await / try expression unwrapping > detects fetchUser and parseRepo as functions
Swift for-in loop element type inference > detects User and Repo classes
Swift for-in loop element type inference > creates implicit import edges between files
Swift field-type resolution > detects classes and their properties
Swift field-type resolution > emits HAS_PROPERTY edges from class to field
Swift field-type resolution > resolves field-chain call user.address.save() → Address#save
Swift field-type resolution > emits ACCESSES edges for field reads in chains
Swift field-type resolution > populates field metadata (visibility, declaredType) on Property nodes
Swift call-result binding > resolves call-result-bound method call user.save() → User#save
Swift call-result binding > getUser() is present as a defined function
Swift call-result binding > emits processUser -> getUser CALLS edge for let-assigned free function call
Swift method enrichment > detects Animal protocol and Dog class
Swift method enrichment > emits IMPLEMENTS edge Dog -> Animal
Swift method enrichment > emits HAS_METHOD edges for Dog methods
Swift method enrichment > marks protocol Animal.speak as isAbstract
Swift method enrichment > marks Dog.speak as NOT isAbstract
Swift method enrichment > marks breathe as isFinal
Swift method enrichment > marks classify as isStatic
Swift method enrichment > captures @objc annotation on breathe
Swift method enrichment > populates parameterTypes for classify(_ name: String)
Swift method enrichment > records parameterCount for classify
Swift method enrichment > records returnType for speak
Swift method enrichment > resolves dog.speak() CALLS edge
Swift method enrichment > resolves Dog.classify("dog") CALLS edge
Swift abstract dispatch > detects Repository protocol and SqlRepository class
Swift abstract dispatch > emits IMPLEMENTS edge SqlRepository -> Repository
Swift abstract dispatch > emits HAS_METHOD edges for Repository.find and Repository.save
Swift abstract dispatch > emits HAS_METHOD edges for SqlRepository.find and SqlRepository.save
Swift abstract dispatch > marks base Repository.find as isAbstract
Swift abstract dispatch > marks base Repository.save as isAbstract
Swift abstract dispatch > marks concrete SqlRepository.find as NOT isAbstract
Swift abstract dispatch > resolves repo.find(id: 42) CALLS edge
Swift abstract dispatch > resolves repo.save(entity: user) CALLS edge
Swift abstract dispatch > populates parameterTypes for Repository.find
Swift abstract dispatch > populates parameterTypes for Repository.save
Swift abstract dispatch > records returnType for SqlRepository.find
Swift abstract dispatch > emits METHOD_IMPLEMENTS edges from SqlRepository methods → Repository protocol methods
Swift overloaded method disambiguation > detects 2 distinct find Method nodes on SqlRepository
Swift overloaded method disambiguation > emits METHOD_IMPLEMENTS edges for both find overloads
Swift overloaded method disambiguation > emits METHOD_IMPLEMENTS edge for save
Swift overloaded method disambiguation > emits exactly 3 METHOD_IMPLEMENTS edges total
Swift Child extends Parent — inherited method resolution (SM-9) > detects Parent and Child classes
Swift Child extends Parent — inherited method resolution (SM-9) > resolves c.parentMethod() to Parent.parentMethod via first-wins MRO walk

Code Coverage

Tests

Metric	Coverage	Covered	Base	Delta	Status
Statements	73.09%	17909/24502	73.22%	📉 -0.1	🔴 ██████████████░░░░░░
Branches	62.12%	11370/18301	62.24%	📉 -0.1	🔴 ████████████░░░░░░░░
Functions	77.67%	1705/2195	77.82%	📉 -0.1	🔴 ███████████████░░░░░
Lines	75.63%	16232/21462	75.78%	📉 -0.2	🔴 ███████████████░░░░░

_{📋 View full run · Generated by CI}

xkonjin

I found one correctness issue in the new scanner.

sourceSinkScan() never loads or passes a node language, and patternMatches() only applies the language filter when a language value is present:

if (entry.languages && language && !entry.languages.includes(language)) return false;

With language === undefined, every language-specific pattern is treated as eligible. In practice that means Java, Go, Ruby, Rust, PHP, etc. source/sink signatures can all match any function body that happens to contain the same token text, which will inflate findings quite a bit.

Example: a TypeScript function containing params[ or system( can be classified using the Rails-only rules, and a JS file containing Command::new in a string/comment could be treated as a Rust sink.

I think this needs one of these fixes before relying on the results:

return false for language-scoped entries when the node language is unknown, or
fetch/store the language on Function/Method nodes and pass it into getMatchingSources() / getMatchingSinks().

I'd also add a regression test that proves a language-scoped pattern does not match when scanning content from a different language.

magyargergo · 2026-03-29T16:42:14Z

@claude is it ready to be merged?

github-actions · 2026-03-29T16:42:26Z

Claude finished @magyargergo's task in 1m 18s —— View job

Merge Readiness Assessment

Gather context
Read relevant source files
Assess merge readiness

Short answer: Not yet — there is one open correctness bug that needs to be fixed first.

Blocking Issue: Language Filter is Never Applied

The reviewer xkonjin identified a real bug, and reading the code confirms it.

In local-backend.ts the Cypher query fetches id, name, filePath, and content, but not language:

// local-backend.ts:2971-2981
MATCH (n:Function)
WHERE n.id IS NOT NULL
RETURN n.id AS id, n.name AS name, n.filePath AS filePath, n.content AS content

As a result, getMatchingSources and getMatchingSinks are always called with undefined as the language parameter:

// local-backend.ts:3022, 3037
const matchedSources = getMatchingSources(content, undefined, compiledSources);
const matchedSinks   = getMatchingSinks(content, undefined, compiledSinks);

And the guard in catalogs.ts:577 short-circuits the language check when language is falsy:

// catalogs.ts:577
if (entry.languages && language && !entry.languages.includes(language)) return false;
//                     ^^^^^^^^ — always undefined → filter never fires

This means every language-scoped pattern applies to every function, regardless of the actual file language. A TypeScript file containing params[ could be flagged by the Rails-only Ruby rule; a JS comment containing Command::new would match the Rust sink rule. This inflates findings and makes results unreliable — especially problematic for a security tool.

The fix is straightforward (add n.language to the query and thread it through), but it needs a regression test proving a language-scoped pattern does not match content from a different language. Fix this →

Everything Else Looks Good

Check	Status
CI (typecheck + tests + E2E)	✅ All 4700 tests pass
TypeScript	✅ Compiles clean
Test coverage	⚠️ Minor drop (~0.2%) — not blocking
BFS path discovery logic	✅ Correct
Risk ranking	✅ Reasonable
User-defined catalog extension	✅ Clean design
OWASP filtering	✅ Works once language bug is fixed

The coverage drop is minor and expected given the new code paths. The architecture (BFS over CALLS, catalog-based matching, risk ranking) is solid. The language filter bug is the only thing standing between this and a merge.

marxo126 · 2026-03-31T10:12:24Z

Fixed the language filter bug — good catch.

What changed:

Import getLanguageFromFilename from gitnexus-shared in local-backend.ts
Derive language from filePath and pass it to getMatchingSources/getMatchingSinks
Added 3 regression tests proving language-scoped patterns don't cross-match (Python→TS, PHP→JS)

Ready for merge.

Add source_sink MCP tool that scans for security-relevant data paths. Identifies source functions (user input, env vars, file reads) and sink functions (SQL queries, command exec, file writes) using configurable pattern catalogs, then uses BFS over CALLS edges to find reachable paths. Supports OWASP category filtering, user-defined catalog extensions via .gitnexus/security.json, and risk-ranked findings (critical/high/medium). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Derive language from filePath via getLanguageFromFilename and pass it to getMatchingSources/getMatchingSinks. Without this, language-scoped catalog patterns (e.g. Python-only request.data, PHP-only $_GET) matched all files regardless of language, inflating findings. Adds 3 regression tests proving language-scoped patterns don't cross-match. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

magyargergo · 2026-05-11T07:04:49Z

Please submit a new PR if this is still relevant

This was referenced Mar 28, 2026

feat: source-sink security scanning + Parameter data flow #561

Closed

RFC: Program Dependence Graph (PDG) Subsystem Design #567

Closed

feat: guard clause and guarded call detection #559

Closed

xkonjin reviewed Mar 28, 2026

View reviewed changes

Test and others added 3 commits April 18, 2026 13:03

fix: prettier formatting + remove unused import

e1fc420

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

marxo126 force-pushed the feat/source-sink-scanner branch from 47198bd to e6a295d Compare April 18, 2026 11:05

magyargergo closed this May 11, 2026

marxo126 deleted the feat/source-sink-scanner branch May 11, 2026 09:29

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: source-sink security scanning with BFS path discovery#578

feat: source-sink security scanning with BFS path discovery#578
marxo126 wants to merge 3 commits into
abhigyanpatwari:mainfrom
marxo126:feat/source-sink-scanner

marxo126 commented Mar 28, 2026

Uh oh!

vercel Bot commented Mar 28, 2026

Uh oh!

github-actions Bot commented Mar 28, 2026 •

edited

Loading

Uh oh!

xkonjin left a comment

Uh oh!

magyargergo commented Mar 29, 2026

Uh oh!

github-actions Bot commented Mar 29, 2026 •

edited

Loading

Uh oh!

marxo126 commented Mar 31, 2026

Uh oh!

magyargergo commented May 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

marxo126 commented Mar 28, 2026

Summary

Test plan

Uh oh!

vercel Bot commented Mar 28, 2026

Uh oh!

github-actions Bot commented Mar 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

CI Report

Pipeline Status

Test Results

Code Coverage

Tests

Uh oh!

xkonjin left a comment

Choose a reason for hiding this comment

Uh oh!

magyargergo commented Mar 29, 2026

Uh oh!

github-actions Bot commented Mar 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Merge Readiness Assessment

Blocking Issue: Language Filter is Never Applied

Everything Else Looks Good

Uh oh!

marxo126 commented Mar 31, 2026

Uh oh!

magyargergo commented May 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

github-actions Bot commented Mar 28, 2026 •

edited

Loading

github-actions Bot commented Mar 29, 2026 •

edited

Loading