Skip to content

feat: source-sink security scanning with BFS path discovery#578

Closed
marxo126 wants to merge 3 commits into
abhigyanpatwari:mainfrom
marxo126:feat/source-sink-scanner
Closed

feat: source-sink security scanning with BFS path discovery#578
marxo126 wants to merge 3 commits into
abhigyanpatwari:mainfrom
marxo126:feat/source-sink-scanner

Conversation

@marxo126

Copy link
Copy Markdown
Contributor

Summary

  • New source_sink MCP tool for security-relevant data path scanning
  • Identifies source functions (user input, env vars, file reads) and sink functions (SQL queries, command exec, file writes) using configurable pattern catalogs
  • BFS over CALLS edges to discover reachable paths from sources to sinks
  • OWASP category filtering (A03-injection, A07-xss, A10-ssrf)
  • User-defined catalog extensions via .gitnexus/security.json
  • Risk-ranked findings (critical/high/medium)

Split from #561 — this PR contains only the source-sink scanning feature. Parameter data flow tracking (PASSES_TO, DATA_FLOWS_TO) will be addressed separately as part of the PDG subsystem (#567).

Test plan

  • 30 unit tests for source-sink scanner and catalogs
  • Smoke-tested on real project (fluentiagrant-app): 1,417 sources, 2,536 sinks, 3,913 paths found
  • All 2,484 existing unit tests pass
  • TypeScript compiles clean

🤖 Generated with Claude Code

@vercel

vercel Bot commented Mar 28, 2026

Copy link
Copy Markdown

@TESTPERSONAL is attempting to deploy a commit to the NexusCore Team on Vercel.

A member of the Team first needs to authorize it.

@github-actions

github-actions Bot commented Mar 28, 2026

Copy link
Copy Markdown
Contributor

CI Report

All checks passed

Pipeline Status

Stage Status Details
✅ Typecheck success tsc --noEmit
✅ Tests success unit tests, 3 platforms
✅ E2E success gitnexus-web changes only

Test Results

Tests Passed Failed Skipped Duration
6639 6542 0 97 252s

✅ All 6542 tests passed

97 test(s) skipped — expand for details
  • Swift MethodExtractor > isTypeDeclaration > recognizes class_declaration
  • Swift MethodExtractor > isTypeDeclaration > recognizes protocol_declaration
  • Swift MethodExtractor > isTypeDeclaration > rejects import_declaration
  • Swift MethodExtractor > visibility > extracts public method
  • Swift MethodExtractor > visibility > extracts private method
  • Swift MethodExtractor > visibility > defaults to internal when no modifier
  • Swift MethodExtractor > protocol methods > marks protocol method as abstract
  • Swift MethodExtractor > static and class methods > detects static func as isStatic
  • Swift MethodExtractor > static and class methods > detects class func as isStatic
  • Swift MethodExtractor > parameters > extracts parameters with types and default values
  • Swift MethodExtractor > return type > extracts return type from -> annotation
  • Swift MethodExtractor > annotations > extracts @objc attribute
  • Swift MethodExtractor > isFinal > detects final func
  • Swift MethodExtractor > isFinal > is false when not final
  • Swift MethodExtractor > isAsync > detects async func
  • Swift MethodExtractor > isOverride > detects override method
  • buildTypeEnv > constructor inference (Tier 1 fallback) > lookupClassByName regression coverage > Swift lookupClassByName regression coverage > Swift cross-file constructor inference uses lookupClassByName
  • buildTypeEnv > constructor inference (Tier 1 fallback) > lookupClassByName regression coverage > Swift lookupClassByName regression coverage > Swift explicit init inference uses lookupClassByName
  • buildTypeEnv > constructor inference (Tier 1 fallback) > lookupClassByName regression coverage > Swift lookupClassByName regression coverage > Swift cross-file constructor inference does not bind plain functions
  • buildTypeEnv > known limitations (documented skip tests) > Ruby block parameter: users.each { |user| } — closure param inference, different feature
  • Swift constructor-inferred type resolution > detects User and Repo classes, both with save methods
  • Swift constructor-inferred type resolution > resolves user.save() to Models/User.swift via constructor-inferred type
  • Swift constructor-inferred type resolution > resolves repo.save() to Models/Repo.swift via constructor-inferred type
  • Swift constructor-inferred type resolution > emits exactly 2 save() CALLS edges (one per receiver type)
  • Swift self resolution > detects User and Repo classes, each with a save function
  • Swift self resolution > resolves self.save() inside User.process to User.save, not Repo.save
  • Swift parent resolution > detects BaseModel and User classes plus Serializable protocol
  • Swift parent resolution > emits EXTENDS edge: User → BaseModel
  • Swift parent resolution > emits IMPLEMENTS edge: User → Serializable (protocol conformance)
  • Swift cross-file User.init() inference > resolves user.save() via User.init(name:) inference
  • Swift cross-file User.init() inference > resolves user.greet() via User.init(name:) inference
  • Swift return type inference > detects User class and getUser function
  • Swift return type inference > detects save function on User (Swift class methods are Function nodes)
  • Swift return type inference > resolves user.save() to User#save via return type of getUser() -> User
  • Swift return-type inference via function return type > resolves user.save() to User#save via return type of getUser()
  • Swift return-type inference via function return type > user.save() does NOT resolve to Repo#save
  • Swift return-type inference via function return type > resolves repo.save() to Repo#save via return type of getRepo()
  • Swift implicit imports (cross-file visibility) > detects UserService class in Models.swift
  • Swift implicit imports (cross-file visibility) > resolves UserService() constructor call across files (no explicit import)
  • Swift implicit imports (cross-file visibility) > resolves service.fetchUser() member call across files
  • Swift implicit imports (cross-file visibility) > creates IMPORTS edges between files in the same module
  • Swift extension deduplication > detects Product class
  • Swift extension deduplication > resolves Product() constructor despite extension creating duplicate class node
  • Swift extension deduplication > resolves product.save() to Product.swift (primary definition)
  • Swift constructor call fallback (no new keyword) > resolves OCRService() as constructor call across files
  • Swift constructor call fallback (no new keyword) > resolves ocr.recognize() member call via constructor-inferred type
  • Swift export visibility (internal vs private) > resolves PublicService() constructor across files
  • Swift export visibility (internal vs private) > resolves internalHelper() across files (internal = module-scoped)
  • Swift if let / guard let binding resolution > detects User and Repo classes
  • Swift if let / guard let binding resolution > resolves user.save() inside if-let to User#save
  • Swift if let / guard let binding resolution > resolves repo.save() inside guard-let to Repo#save
  • Swift if let / guard let binding resolution > user.save() in if-let does NOT resolve to Repo#save
  • Swift await / try expression unwrapping > resolves user.save() via await fetchUser() return type
  • Swift await / try expression unwrapping > resolves repo.save() via try parseRepo() return type
  • Swift await / try expression unwrapping > detects fetchUser and parseRepo as functions
  • Swift for-in loop element type inference > detects User and Repo classes
  • Swift for-in loop element type inference > creates implicit import edges between files
  • Swift field-type resolution > detects classes and their properties
  • Swift field-type resolution > emits HAS_PROPERTY edges from class to field
  • Swift field-type resolution > resolves field-chain call user.address.save() → Address#save
  • Swift field-type resolution > emits ACCESSES edges for field reads in chains
  • Swift field-type resolution > populates field metadata (visibility, declaredType) on Property nodes
  • Swift call-result binding > resolves call-result-bound method call user.save() → User#save
  • Swift call-result binding > getUser() is present as a defined function
  • Swift call-result binding > emits processUser -> getUser CALLS edge for let-assigned free function call
  • Swift method enrichment > detects Animal protocol and Dog class
  • Swift method enrichment > emits IMPLEMENTS edge Dog -> Animal
  • Swift method enrichment > emits HAS_METHOD edges for Dog methods
  • Swift method enrichment > marks protocol Animal.speak as isAbstract
  • Swift method enrichment > marks Dog.speak as NOT isAbstract
  • Swift method enrichment > marks breathe as isFinal
  • Swift method enrichment > marks classify as isStatic
  • Swift method enrichment > captures @objc annotation on breathe
  • Swift method enrichment > populates parameterTypes for classify(_ name: String)
  • Swift method enrichment > records parameterCount for classify
  • Swift method enrichment > records returnType for speak
  • Swift method enrichment > resolves dog.speak() CALLS edge
  • Swift method enrichment > resolves Dog.classify("dog") CALLS edge
  • Swift abstract dispatch > detects Repository protocol and SqlRepository class
  • Swift abstract dispatch > emits IMPLEMENTS edge SqlRepository -> Repository
  • Swift abstract dispatch > emits HAS_METHOD edges for Repository.find and Repository.save
  • Swift abstract dispatch > emits HAS_METHOD edges for SqlRepository.find and SqlRepository.save
  • Swift abstract dispatch > marks base Repository.find as isAbstract
  • Swift abstract dispatch > marks base Repository.save as isAbstract
  • Swift abstract dispatch > marks concrete SqlRepository.find as NOT isAbstract
  • Swift abstract dispatch > resolves repo.find(id: 42) CALLS edge
  • Swift abstract dispatch > resolves repo.save(entity: user) CALLS edge
  • Swift abstract dispatch > populates parameterTypes for Repository.find
  • Swift abstract dispatch > populates parameterTypes for Repository.save
  • Swift abstract dispatch > records returnType for SqlRepository.find
  • Swift abstract dispatch > emits METHOD_IMPLEMENTS edges from SqlRepository methods → Repository protocol methods
  • Swift overloaded method disambiguation > detects 2 distinct find Method nodes on SqlRepository
  • Swift overloaded method disambiguation > emits METHOD_IMPLEMENTS edges for both find overloads
  • Swift overloaded method disambiguation > emits METHOD_IMPLEMENTS edge for save
  • Swift overloaded method disambiguation > emits exactly 3 METHOD_IMPLEMENTS edges total
  • Swift Child extends Parent — inherited method resolution (SM-9) > detects Parent and Child classes
  • Swift Child extends Parent — inherited method resolution (SM-9) > resolves c.parentMethod() to Parent.parentMethod via first-wins MRO walk

Code Coverage

Tests

Metric Coverage Covered Base Delta Status
Statements 73.09% 17909/24502 73.22% 📉 -0.1 🔴 ██████████████░░░░░░
Branches 62.12% 11370/18301 62.24% 📉 -0.1 🔴 ████████████░░░░░░░░
Functions 77.67% 1705/2195 77.82% 📉 -0.1 🔴 ███████████████░░░░░
Lines 75.63% 16232/21462 75.78% 📉 -0.2 🔴 ███████████████░░░░░

📋 View full run · Generated by CI

@xkonjin xkonjin left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found one correctness issue in the new scanner.

sourceSinkScan() never loads or passes a node language, and patternMatches() only applies the language filter when a language value is present:

if (entry.languages && language && !entry.languages.includes(language)) return false;

With language === undefined, every language-specific pattern is treated as eligible. In practice that means Java, Go, Ruby, Rust, PHP, etc. source/sink signatures can all match any function body that happens to contain the same token text, which will inflate findings quite a bit.

Example: a TypeScript function containing params[ or system( can be classified using the Rails-only rules, and a JS file containing Command::new in a string/comment could be treated as a Rust sink.

I think this needs one of these fixes before relying on the results:

  1. return false for language-scoped entries when the node language is unknown, or
  2. fetch/store the language on Function/Method nodes and pass it into getMatchingSources() / getMatchingSinks().

I'd also add a regression test that proves a language-scoped pattern does not match when scanning content from a different language.

@magyargergo

Copy link
Copy Markdown
Collaborator

@claude is it ready to be merged?

@github-actions

github-actions Bot commented Mar 29, 2026

Copy link
Copy Markdown
Contributor

Claude finished @magyargergo's task in 1m 18s —— View job


Merge Readiness Assessment

  • Gather context
  • Read relevant source files
  • Assess merge readiness

Short answer: Not yet — there is one open correctness bug that needs to be fixed first.


Blocking Issue: Language Filter is Never Applied

The reviewer xkonjin identified a real bug, and reading the code confirms it.

In local-backend.ts the Cypher query fetches id, name, filePath, and content, but not language:

// local-backend.ts:2971-2981
MATCH (n:Function)
WHERE n.id IS NOT NULL
RETURN n.id AS id, n.name AS name, n.filePath AS filePath, n.content AS content

As a result, getMatchingSources and getMatchingSinks are always called with undefined as the language parameter:

// local-backend.ts:3022, 3037
const matchedSources = getMatchingSources(content, undefined, compiledSources);
const matchedSinks   = getMatchingSinks(content, undefined, compiledSinks);

And the guard in catalogs.ts:577 short-circuits the language check when language is falsy:

// catalogs.ts:577
if (entry.languages && language && !entry.languages.includes(language)) return false;
//                     ^^^^^^^^ — always undefined → filter never fires

This means every language-scoped pattern applies to every function, regardless of the actual file language. A TypeScript file containing params[ could be flagged by the Rails-only Ruby rule; a JS comment containing Command::new would match the Rust sink rule. This inflates findings and makes results unreliable — especially problematic for a security tool.

The fix is straightforward (add n.language to the query and thread it through), but it needs a regression test proving a language-scoped pattern does not match content from a different language. Fix this →


Everything Else Looks Good

Check Status
CI (typecheck + tests + E2E) ✅ All 4700 tests pass
TypeScript ✅ Compiles clean
Test coverage ⚠️ Minor drop (~0.2%) — not blocking
BFS path discovery logic ✅ Correct
Risk ranking ✅ Reasonable
User-defined catalog extension ✅ Clean design
OWASP filtering ✅ Works once language bug is fixed

The coverage drop is minor and expected given the new code paths. The architecture (BFS over CALLS, catalog-based matching, risk ranking) is solid. The language filter bug is the only thing standing between this and a merge.


@marxo126

Copy link
Copy Markdown
Contributor Author

Fixed the language filter bug — good catch.

What changed:

  • Import getLanguageFromFilename from gitnexus-shared in local-backend.ts
  • Derive language from filePath and pass it to getMatchingSources/getMatchingSinks
  • Added 3 regression tests proving language-scoped patterns don't cross-match (Python→TS, PHP→JS)

Ready for merge.

Test and others added 3 commits April 18, 2026 13:03
Add source_sink MCP tool that scans for security-relevant data paths.
Identifies source functions (user input, env vars, file reads) and sink
functions (SQL queries, command exec, file writes) using configurable
pattern catalogs, then uses BFS over CALLS edges to find reachable paths.

Supports OWASP category filtering, user-defined catalog extensions via
.gitnexus/security.json, and risk-ranked findings (critical/high/medium).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Derive language from filePath via getLanguageFromFilename and pass it to
getMatchingSources/getMatchingSinks. Without this, language-scoped catalog
patterns (e.g. Python-only request.data, PHP-only $_GET) matched all files
regardless of language, inflating findings.

Adds 3 regression tests proving language-scoped patterns don't cross-match.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@marxo126 marxo126 force-pushed the feat/source-sink-scanner branch from 47198bd to e6a295d Compare April 18, 2026 11:05
@magyargergo

Copy link
Copy Markdown
Collaborator

Please submit a new PR if this is still relevant

@marxo126 marxo126 deleted the feat/source-sink-scanner branch May 11, 2026 09:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants