Adding json + xpath headless extractors by Mzack9999 · Pull Request #6559 · projectdiscovery/nuclei

Mzack9999 · 2025-10-28T10:19:58Z

Proposed changes

Closes #6359

Checklist

Pull request is created against the dev branch
All checks passed (lint, unit/integration/regression tests etc.) with my changes
I have added tests that prove my fix is effective or that my feature works
I have added necessary documentation (if appropriate)

Summary by CodeRabbit

New Features
- Added XPath support for extracting data elements from HTML content.
- Added JSON support for extracting data using JSONPath-like query syntax.
Tests
- Added extensive unit tests for data extraction and matching operations.
- Includes tests for complex nested HTML and JSON structures.
- Tests cover multiple extraction methods and data source types.

coderabbitai · 2025-10-28T10:20:18Z

Walkthrough

Added support for XPathExtractor and JSONExtractor types in the headless protocol's Extract method by routing them to ExtractXPath() and ExtractJSON() functions respectively. Comprehensive test coverage validates extraction, matching, and handling of complex nested structures.

Changes

Cohort / File(s)	Summary
Headless Operators Implementation `pkg/protocols/headless/operators.go`	Added two new cases in the Extract method's switch statement: XPathExtractor routing to ExtractXPath(itemStr) and JSONExtractor routing to ExtractJSON(itemStr), enabling XPath and JSON extraction capabilities in headless mode.
Headless Operators Tests `pkg/protocols/headless/operators_test.go`	Added comprehensive test suite covering HTML extraction via XPath (text content, attributes, multiple items, non-existent paths), JSON extraction via JSONPath (IDs, names, nested values, emails, invalid JSON), XPath/JSON-based matching across data parts (default, named parts), and complex nested structures (JSON APIs, HTML commerce scenarios).

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~12 minutes

The operator implementation change is minimal—two new switch cases routing to pre-existing extraction functions with no novel logic.
Test coverage is extensive but follows consistent, repetitive patterns across similar scenarios.
Primary review focus: verifying test scenarios comprehensively exercise both extraction types and match the feature requirements.

Poem

🐰 Headless hops with XPath and JSON so fine,
No HTML left unturned, no data out of line,
Extractors now aligned across all the modes,
Through nested paths and complex loads,
The rabbit's toolkit grows—what joy it bodes! 🌟

Pre-merge checks and finishing touches

❌ Failed checks (2 warnings)

Check name	Status	Explanation	Resolution
Out of Scope Changes Check	⚠️ Warning	The PR includes implementation of JSONExtractor support alongside XPathExtractor, but issue #6359 specifically requests only XPath extractor support for headless mode. While JSONExtractor is closely related to the main feature and serves a complementary purpose within the same operators file, it represents an out-of-scope addition since the linked issue makes no mention of JSON extraction requirements. The JSON extractor feature, though adjacent and useful, extends beyond the explicitly stated requirements for the PR.	Consider either limiting the PR scope to XPath extractor support only (as requested in issue #6359) and deferring JSON extractor implementation to a separate feature request, or alternatively, create a corresponding issue for the JSON extractor enhancement to bring it formally into scope. This would ensure clear alignment between the PR changes and linked issues.
Docstring Coverage	⚠️ Warning	Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%.	You can run `@coderabbitai generate docstrings` to improve docstring coverage.

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title Check	✅ Passed	The title "Adding json + xpath headless extractors" is clear, concise, and accurately reflects the main changes in the pull request. It specifically identifies the two new extractor types being added (JSON and XPath) and their context (headless mode), which directly aligns with the PR's objective to enhance headless protocol extraction capabilities. The title is descriptive enough for developers scanning history to understand the primary change.
Linked Issues Check	✅ Passed	The PR successfully addresses the primary objective from linked issue #6359, which requests enabling the XPath extractor in headless mode. The code changes in operators.go add support for both XPathExtractor and JSONExtractor, with the former directly satisfying the requirement. Comprehensive unit tests in operators_test.go demonstrate that XPath extraction and matching work correctly in headless mode across various scenarios (text content, attributes, nested structures). The main requirement has been met, allowing users to employ XPath extraction in headless mode the same way as in HTTP mode.

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch feat-6359-json-xpath-headless-extractor

📜 Recent review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 82144e5 and e535e01.

📒 Files selected for processing (2)

pkg/protocols/headless/operators.go (1 hunks)
pkg/protocols/headless/operators_test.go (1 hunks)

🧰 Additional context used

📓 Path-based instructions (2)

**/*.go

📄 CodeRabbit inference engine (CLAUDE.md)

**/*.go: Format Go code using go fmt
Run static analysis with go vet

Files:

pkg/protocols/headless/operators.go
pkg/protocols/headless/operators_test.go

pkg/protocols/**/*.go

📄 CodeRabbit inference engine (CLAUDE.md)

Each protocol implementation must provide a Request interface with methods Compile(), ExecuteWithResults(), Match(), and Extract()

Files:

pkg/protocols/headless/operators.go
pkg/protocols/headless/operators_test.go

🧬 Code graph analysis (2)

pkg/protocols/headless/operators.go (1)

pkg/operators/extractors/extractor_types.go (2)

XPathExtractor (21-21)

JSONExtractor (23-23)

pkg/protocols/headless/operators_test.go (4)

pkg/operators/extractors/extractors.go (1)

Extractor (11-116)

pkg/operators/extractors/extractor_types.go (4)

ExtractorTypeHolder (71-73)

ExtractorType (12-12)

XPathExtractor (21-21)

JSONExtractor (23-23)

pkg/operators/matchers/matchers.go (1)

Matcher (10-138)

pkg/operators/matchers/matchers_types.go (3)

MatcherTypeHolder (77-79)

MatcherType (12-12)

XPathMatcher (29-29)

🔇 Additional comments (2)

pkg/protocols/headless/operators.go (1)

79-82: LGTM! Clean implementation of XPath and JSON extractors.

The implementation correctly adds support for XPathExtractor and JSONExtractor types, following the established pattern of existing extractors. Both cases appropriately route to their respective extraction methods and maintain consistency with the codebase.

pkg/protocols/headless/operators_test.go (1)

1-566: Excellent test coverage for XPath and JSON extractors!

The test suite is comprehensive and well-structured, covering:

Basic extraction scenarios for both XPath and JSON

Attribute extraction and multiple item handling

Edge cases (invalid JSON, non-existent paths)

Different response parts (data, header, history)

Complex nested structures (API responses, e-commerce HTML)

XPath matching functionality

All tests follow Go testing conventions and provide thorough validation of the new extractor functionality in headless mode.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

Mzack9999 added 2 commits October 28, 2025 14:19

adding xpath + json extractors

9d59fd0

adding tests

e535e01

Mzack9999 added the Type: Enhancement Most issues will probably ask for additions or changes. label Oct 28, 2025

Mzack9999 marked this pull request as ready for review October 28, 2025 19:50

auto-assign bot requested a review from dogancanbakir October 28, 2025 19:50

dogancanbakir approved these changes Oct 29, 2025

View reviewed changes

dogancanbakir merged commit 3be27b9 into dev Oct 29, 2025
20 checks passed

dogancanbakir deleted the feat-6359-json-xpath-headless-extractor branch October 29, 2025 11:56

BrewTestBot mentioned this pull request Nov 14, 2025

nuclei 3.5.0 Homebrew/homebrew-core#254453

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adding json + xpath headless extractors#6559

Adding json + xpath headless extractors#6559
dogancanbakir merged 2 commits intodevfrom
feat-6359-json-xpath-headless-extractor

Mzack9999 commented Oct 28, 2025 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Oct 28, 2025 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Mzack9999 commented Oct 28, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Proposed changes

Checklist

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Oct 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Poem

Pre-merge checks and finishing touches

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Mzack9999 commented Oct 28, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Oct 28, 2025 •

edited

Loading