Skip to content

feat: source-sink security scanning + Parameter data flow#561

Closed
marxo126 wants to merge 1 commit into
abhigyanpatwari:mainfrom
marxo126:feat/security-parameter-flow-v2
Closed

feat: source-sink security scanning + Parameter data flow#561
marxo126 wants to merge 1 commit into
abhigyanpatwari:mainfrom
marxo126:feat/security-parameter-flow-v2

Conversation

@marxo126

Copy link
Copy Markdown
Contributor

Summary

Two capabilities aligned with the architecture assessment's security analysis roadmap:

  1. Source-Sink Structural Scanning (Phase A) — New source_sink MCP tool. BFS over existing CALLS graph from source-adjacent to sink-adjacent functions. No new node types needed.
  2. Parameter-Level Data Flow (Phase B) — New Parameter node type with PASSES_TO edges mapping call-site arguments to callee parameters.

Source-Sink Scanning

Coverage (13 frameworks)

Framework Source patterns Sink patterns
Express/Next.js req.body, request.json, req.query, req.params eval, exec, innerHTML
Django/Flask request.GET/POST, request.data/form/args subprocess.run, os.system
PHP $_GET, $_POST, $_REQUEST shell_exec
Go r.Body, r.FormValue, r.URL.Query() os.exec, sql.Query, template.HTML
Rust/Actix web::Json, web::Query, web::Path Command::new, sqlx::query
Spring @RequestBody, @RequestParam, @PathVariable jdbcTemplate.query, Runtime.exec
Rails params[], request.body system(), AR::Base.connection.execute
Ktor call.receive, call.parameters

User-extensible via .gitnexus/security.json

{
  "sources": [{ "pattern": "myInput", "category": "user_input", "description": "Custom source" }],
  "sinks": [{ "pattern": "dangerousOp", "owasp": "A03-injection", "severity": "high", "description": "Custom sink" }]
}

Parameter Data Flow

  • Parameter node type — each function parameter as a graph node (name, index, type, isRest)
  • PASSES_TO edges — maps call-site arg positions to callee parameter positions
  • Tree-sitter extraction for all 13 supported languages
  • Handles: simple, typed, destructured, rest, default parameters

Schema changes

  • New Parameter node table: id, name, filePath, paramIndex INT32, declaredType, isRest BOOL
  • PASSES_TO and DATA_FLOWS_TO added to REL_TYPES
  • FROM/TO pairs for Function/Method/Constructor -> Parameter, Parameter -> Parameter/Community/Process
  • New source_sink MCP tool (tools.ts, local-backend.ts, server.ts)

Real-world validation (Next.js + Prisma, 30K nodes)

Metric Count
Source-adjacent route files 341 / 431 (79%)
Sink-adjacent route files 392 / 431 (91%)
Parameter nodes 8,604
PASSES_TO edges 416

Known limitations

Source-Sink

  1. Structural reachability, not taint tracking. BFS finds paths but doesn't verify data actually flows through them. Sanitizers not detected.
  2. High sink rate by design. In ORM-heavy apps, most routes are sink-adjacent. Results need manual review.
  3. Catalog coverage uneven. Strong for Express/Next.js/Django/Go/Rust. Weaker for Spring/Rails/FastAPI.
  4. No sanitizer awareness. Future work: sanitizer catalogs.

Parameter Data Flow

  1. Types are text strings, not resolved. param: User stores "User" as text, not linked to the User class node.
  2. Positional matching only. Named/keyword arguments (Python foo(key=value)) matched by position.
  3. No inter-procedural chaining. PASSES_TO shows one hop, not multi-hop chains. Full taint tracking requires function summaries.
  4. Destructured params stored as text. ({ a, b }: Props) creates one node, not individual nodes.
  5. DATA_FLOWS_TO reserved but not emitted. Schema placeholder for future intra-function assignment tracking.

Test plan

  • 32 source-sink tests (catalog matching, BFS, integration)
  • 23 parameter tests (schema, extraction, processor, integration)
  • Schema count assertions updated
  • Full build passes (tsc clean)
  • Real-repo validation: 341 sources, 392 sinks, 8604 parameters, 416 PASSES_TO

🤖 Generated with Claude Code

Two complementary features for security and data flow analysis:

1. Source-Sink Scanner (MCP tool: source_sink)
   - BFS reachability from user-input sources to dangerous sinks
   - OWASP A03/A07/A10 coverage across 10+ languages
   - User-extensible catalogs via .gitnexus/security.json
   - Risk-ranked findings with path visualization

2. Parameter Data Flow (pipeline Phase 3.6b)
   - Extract function/method parameters from AST (tree-sitter)
   - Parameter nodes with type annotations and position
   - PASSES_TO edges mapping call-site arguments to callee parameters
   - Foundation for future taint tracking

New node type: Parameter. New edge types: PASSES_TO, DATA_FLOWS_TO.
New security module: src/security/ (catalogs.ts, source-sink-scanner.ts).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@vercel

vercel Bot commented Mar 28, 2026

Copy link
Copy Markdown

@marxo126 is attempting to deploy a commit to the NexusCore Team on Vercel.

A member of the Team first needs to authorize it.

@github-actions

Copy link
Copy Markdown
Contributor

CI Report

Some checks failed

Pipeline Status

Stage Status Details
✅ Typecheck success tsc --noEmit
❌ Tests failure unit tests, 3 platforms
✅ E2E success gitnexus-web changes only

Test Results

Tests Passed Failed Skipped Duration
4507 4460 1 46 160s

1 failed / 4460 passed

46 test(s) skipped — expand for details
  • buildTypeEnv > known limitations (documented skip tests) > Ruby block parameter: users.each { |user| } — closure param inference, different feature
  • Swift constructor-inferred type resolution > detects User and Repo classes, both with save methods
  • Swift constructor-inferred type resolution > resolves user.save() to Models/User.swift via constructor-inferred type
  • Swift constructor-inferred type resolution > resolves repo.save() to Models/Repo.swift via constructor-inferred type
  • Swift constructor-inferred type resolution > emits exactly 2 save() CALLS edges (one per receiver type)
  • Swift self resolution > detects User and Repo classes, each with a save function
  • Swift self resolution > resolves self.save() inside User.process to User.save, not Repo.save
  • Swift parent resolution > detects BaseModel and User classes plus Serializable protocol
  • Swift parent resolution > emits EXTENDS edge: User → BaseModel
  • Swift parent resolution > emits IMPLEMENTS edge: User → Serializable (protocol conformance)
  • Swift cross-file User.init() inference > resolves user.save() via User.init(name:) inference
  • Swift cross-file User.init() inference > resolves user.greet() via User.init(name:) inference
  • Swift return type inference > detects User class and getUser function
  • Swift return type inference > detects save function on User (Swift class methods are Function nodes)
  • Swift return type inference > resolves user.save() to User#save via return type of getUser() -> User
  • Swift return-type inference via function return type > resolves user.save() to User#save via return type of getUser()
  • Swift return-type inference via function return type > user.save() does NOT resolve to Repo#save
  • Swift return-type inference via function return type > resolves repo.save() to Repo#save via return type of getRepo()
  • Swift implicit imports (cross-file visibility) > detects UserService class in Models.swift
  • Swift implicit imports (cross-file visibility) > resolves UserService() constructor call across files (no explicit import)
  • Swift implicit imports (cross-file visibility) > resolves service.fetchUser() member call across files
  • Swift implicit imports (cross-file visibility) > creates IMPORTS edges between files in the same module
  • Swift extension deduplication > detects Product class
  • Swift extension deduplication > resolves Product() constructor despite extension creating duplicate class node
  • Swift extension deduplication > resolves product.save() to Product.swift (primary definition)
  • Swift constructor call fallback (no new keyword) > resolves OCRService() as constructor call across files
  • Swift constructor call fallback (no new keyword) > resolves ocr.recognize() member call via constructor-inferred type
  • Swift export visibility (internal vs private) > resolves PublicService() constructor across files
  • Swift export visibility (internal vs private) > resolves internalHelper() across files (internal = module-scoped)
  • Swift if let / guard let binding resolution > detects User and Repo classes
  • Swift if let / guard let binding resolution > resolves user.save() inside if-let to User#save
  • Swift if let / guard let binding resolution > resolves repo.save() inside guard-let to Repo#save
  • Swift if let / guard let binding resolution > user.save() in if-let does NOT resolve to Repo#save
  • Swift await / try expression unwrapping > resolves user.save() via await fetchUser() return type
  • Swift await / try expression unwrapping > resolves repo.save() via try parseRepo() return type
  • Swift await / try expression unwrapping > detects fetchUser and parseRepo as functions
  • Swift for-in loop element type inference > detects User and Repo classes
  • Swift for-in loop element type inference > creates implicit import edges between files
  • Swift field-type resolution > detects classes and their properties
  • Swift field-type resolution > emits HAS_PROPERTY edges from class to field
  • Swift field-type resolution > resolves field-chain call user.address.save() → Address#save
  • Swift field-type resolution > emits ACCESSES edges for field reads in chains
  • Swift field-type resolution > populates field metadata (visibility, declaredType) on Property nodes
  • Swift call-result binding > resolves call-result-bound method call user.save() → User#save
  • Swift call-result binding > getUser() is present as a defined function
  • Swift call-result binding > emits processUser -> getUser CALLS edge for let-assigned free function call

Code Coverage

Tests

Metric Coverage Covered Base Delta Status
Statements 70.57% 12658/17935 70.85% 📉 -0.3 🔴 ██████████████░░░░░░
Branches 59.9% 8343/13926 60.32% 📉 -0.4 🔴 ███████████░░░░░░░░░
Functions 74.79% 1116/1492 75.03% 📉 -0.2 🔴 ██████████████░░░░░░
Lines 73.03% 11349/15539 73.34% 📉 -0.3 🔴 ██████████████░░░░░░

📋 View full run · Generated by CI

@magyargergo

Copy link
Copy Markdown
Collaborator

Please make sure to include your changes in the gitnexus-shared folder 🙏 I moved types over there so we can share them between cli and web.

@magyargergo

Copy link
Copy Markdown
Collaborator

⚠️ Upcoming Prettier formatting — rebase instructions

PR #563 adds Prettier as the code formatter for the repo. When it merges, the bulk format commit will touch ~350 files (style-only: whitespace, quotes, trailing commas). Your branch will likely conflict.

After #563 merges, rebase your branch:

git fetch origin
git checkout <your-branch>
git rebase origin/main

# Conflicts will be formatting-only — accept your version:
git checkout --theirs .
git add .
git rebase --continue

# Then re-format your branch to match the new style:
npx prettier --write .
git add -A
git commit -m "style: apply prettier formatting"
git push --force-with-lease

New setup step: Run npm install at the repo root (not just in gitnexus/) to get prettier + activate the pre-commit hook. The hook auto-formats staged files on every commit going forward.

@marxo126

Copy link
Copy Markdown
Contributor Author

Holding this PR from merge — needs improvement before it's ready.

After reviewing the Ferrante et al. 1987 PDG paper, the parameter-level data flow here (PASSES_TO edges by positional argument matching) is a simplified approximation of data dependence. It tracks function boundary crossings but not intra-function definition-use chains, which is what a proper Program Dependence Graph provides.

What this PR does well:

  • Source-sink structural scanning with extensible catalogs (13 frameworks)
  • Parameter as first-class node type (aligns with Graph DB Priority 3)
  • PASSES_TO edges for cross-function argument tracking
  • User-extensible .gitnexus/security.json

What needs to happen before merge:

  • Align with a formal PDG design — PASSES_TO should be compatible with or feed into proper data dependence edges (definition-use chains) when a CFG subsystem is built
  • The source-sink BFS (Phase A) is independent of PDG and could merge separately, but the Parameter + PASSES_TO (Phase B) should be designed to extend into full data dependence tracking
  • Consider whether DATA_FLOWS_TO (currently reserved but not emitted) should follow the PDG's definition-use model rather than a custom scheme

Will revisit after we formalize the PDG roadmap.

@marxo126

Copy link
Copy Markdown
Contributor Author

Superseded by #578 (source-sink scanning only) — parameter data flow tracking will be part of the PDG subsystem (#567).

@marxo126 marxo126 closed this Mar 28, 2026
@marxo126 marxo126 deleted the feat/security-parameter-flow-v2 branch March 28, 2026 19:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants