Skip to content

fix(headless): merge extra headers#6376

Merged
ehsandeep merged 2 commits intoprojectdiscovery:devfrom
ysokolovsky:fix/headless-headers-merge
Aug 15, 2025
Merged

fix(headless): merge extra headers#6376
ehsandeep merged 2 commits intoprojectdiscovery:devfrom
ysokolovsky:fix/headless-headers-merge

Conversation

@ysokolovsky
Copy link
Contributor

@ysokolovsky ysokolovsky commented Aug 11, 2025

Proposed changes

New field: Browser.defaultHeaders map[string]string.
Parse CLI headers: copy non-UA headers into defaultHeaders; keep User-Agent in customAgent .
New helper: applyDefaultHeaders(*rod.Page) error builds alternating []string{k,v,...} and calls SetExtraHeaders once.
Invoke helper: At the start of Instance.Run(...) after page creation and timeout.

fixes issue 6352

Steps to reproduce

  1. Minimal headless template that prints X-Demo inline
id: headless-header-propagation
info:
  name: Headless header propagation check (httpbin)
  author: author
  severity: info

headless:
  - steps:
      - action: navigate
        args:
          url: "{{BaseURL}}/headers"
      - action: waitload

    matchers:
      - type: word
        part: body
        words:
          - '"X-Demo": "via-cli"'

    extractors:
      - type: regex
        part: body
        group: 1
        regex:
          - '"X-Demo"\s*:\s*"([^"]+)"'
  1. before fix
➜  nuclei git:(fix/headless-headers-merge) nuclei -u https://httpbin.org \
  -t headless-header-propagation.yaml \
  -H 'X-Demo: via-cli' \
  -headless -vv -jsonl

[INF] Current nuclei version: v3.4.7 (latest)
[INF] Current nuclei-templates version: v10.2.7 (latest)
[WRN] Scan results upload to cloud is disabled.
[INF] New templates added in latest release: 55
[INF] Templates loaded for current scan: 1
[WRN] Loading 1 unsigned templates for scan. Use with caution.
[INF] Targets loaded for current scan: 1
[headless-header-propagation] Headless header propagation check (httpbin) (@ysokolovsky) [info]
[INF] Scan completed in 7.443080769s. No results found.
  1. after fix
➜  nuclei git:(fix/headless-headers-merge) nuclei -u https://httpbin.org -t headless-header-propagation.yaml -H 'X-Demo: via-cli' -headless -vv -jsonl \
  | grep '^{\"' \
  | jq -r '.response | gsub("\\u003c";"<") | gsub("\\u003e";">") | sub("(?s)^.*<pre>";"") | sub("</pre>.*$";"") | fromjson' \
  | jq

[INF] Current nuclei version: v3.4.7 (latest)
[INF] Current nuclei-templates version: v10.2.7 (latest)
[INF] New templates added in latest release: 55
[INF] Templates loaded for current scan: 1
[INF] Targets loaded for current scan: 1
[INF] Scan completed in 9.19056686s. 1 matches found.
{
  "headers": {
    "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.7",
    "Accept-Encoding": "gzip, deflate, br, zstd",
    "Accept-Language": "en, en-GB, en-us;",
    "Host": "httpbin.org",
    "Priority": "u=0, i",
    "Sec-Fetch-Dest": "document",
    "Sec-Fetch-Mode": "navigate",
    "Sec-Fetch-Site": "none",
    "Sec-Fetch-User": "?1",
    "Upgrade-Insecure-Requests": "1",
    "User-Agent": "Mozilla/5.0 (Kubuntu; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/129.0.0.0 Safari/537.36",
    "X-Amzn-Trace-Id": "Root=1-6899f125-4234cdcc52943bf75168a95e",
    "X-Demo": "via-cli"
  }
}

Checklist

  • Pull request is created against the dev branch
  • All checks passed (lint, unit/integration/regression tests etc.) with my changes
  • I have added tests that prove my fix is effective or that my feature works
  • I have added necessary documentation (if appropriate)

Summary by CodeRabbit

  • New Features

    • Added support for setting default HTTP headers per browser session.
    • Default headers (including Accept-Language) are now applied automatically to new pages.
    • Custom User-Agent behavior continues to be supported.
  • Refactor

    • Consolidated header handling into a single mechanism for consistent application across pages.
    • Removed redundant per-page Accept-Language injection to reduce duplication and potential conflicts.

@auto-assign auto-assign bot requested a review from dogancanbakir August 11, 2025 13:04
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Aug 11, 2025

Walkthrough

Adds per-browser default headers to Browser, initializes them from options.CustomHeaders (excluding User-Agent), and applies them to pages via a new Browser.applyDefaultHeaders method. The page creation flow now calls applyDefaultHeaders and removes the previous hardcoded Accept-Language SetExtraHeaders call.

Changes

Cohort / File(s) Summary of Changes
Headless engine headers handling
pkg/protocols/headless/engine/engine.go, pkg/protocols/headless/engine/page.go
Added Browser.defaultHeaders map[string]string and wiring in constructor; parsed options.CustomHeaders into defaults (User-Agent handled separately); added Browser.applyDefaultHeaders(p *rod.Page) error to set Accept-Language plus defaults via SetExtraHeaders; updated page creation to call applyDefaultHeaders and removed the prior direct SetExtraHeaders Accept-Language injection.

Sequence Diagram(s)

sequenceDiagram
  actor Client
  participant Engine
  participant Browser
  participant Page

  Client->>Engine: request new page
  Engine->>Browser: ensure Browser created with defaultHeaders
  Browser->>Page: create rod.Page
  Browser->>Page: applyDefaultHeaders(Page)\n(SetExtraHeaders: Accept-Language + defaultHeaders)
  Browser->>Page: apply User-Agent if provided
  Client->>Page: proceed with cookies/navigation
Loading

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~8 minutes

Poem

I twitch my ears at headers new,
I stash each default, neat and true.
Accept-Language hops in place,
Custom keys find their space.
A rabbit sorts the web's embrace. 🐇✨


📜 Recent review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 3fc2cc2 and ccc0e01.

📒 Files selected for processing (1)
  • pkg/protocols/headless/engine/engine.go (4 hunks)
🧰 Additional context used
🧬 Code Graph Analysis (1)
pkg/protocols/headless/engine/engine.go (2)
pkg/protocols/headless/engine/page.go (2)
  • Options (50-54)
  • Page (26-41)
pkg/types/types.go (1)
  • Options (32-464)
🔇 Additional comments (2)
pkg/protocols/headless/engine/engine.go (2)

22-34: LGTM - Clean struct extension.

The addition of the defaultHeaders field is well-positioned and properly typed to support the PR's objective of applying extra headers to browser pages.


99-116: LGTM - Robust header parsing logic.

The header parsing implementation correctly:

  • Separates User-Agent from other headers as intended
  • Uses case-insensitive User-Agent detection
  • Validates and trims header keys/values
  • Prevents malformed headers from being stored
✨ Finishing Touches
  • 📝 Generate Docstrings
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Explain this complex logic.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai explain this code block.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and explain its main purpose.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR.
  • @coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
  • @coderabbitai generate unit tests to generate unit tests for this PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

@ysokolovsky ysokolovsky force-pushed the fix/headless-headers-merge branch from 38843c3 to 3fc2cc2 Compare August 11, 2025 13:11
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🔭 Outside diff range comments (2)
pkg/protocols/headless/engine/engine.go (2)

101-116: Trim User-Agent value when parsing CLI headers

Avoid leading/trailing whitespace in UA; currently it may retain a leading space after the colon. Trim it like other headers.

-        if strings.EqualFold(parts[0], "User-Agent") {
-            customAgent = parts[1]
+        if strings.EqualFold(parts[0], "User-Agent") {
+            customAgent = strings.TrimSpace(parts[1])
         } else {
             k := strings.TrimSpace(parts[0])
             v := strings.TrimSpace(parts[1])
             if k == "" || v == "" {
                 continue
             }
             defaultHeaders[k] = v
         }

23-34: getHTTPClient can return (nil, nil) after first failure due to sync.Once capturing error in a local var

If newHttpClient fails the first time, Once prevents retry, the local err is lost on subsequent calls, and callers may receive a nil client with a nil error. Persist the initialization error on the Browser struct and return it consistently.

 type Browser struct {
     customAgent    string
     defaultHeaders map[string]string
     tempDir        string
     previousPIDs   map[int32]struct{} // track already running PIDs
     engine         *rod.Browser
     options        *types.Options
     launcher       *launcher.Launcher
 
     // use getHTTPClient to get the http client
-    httpClient     *http.Client
-    httpClientOnce *sync.Once
+    httpClient     *http.Client
+    httpClientErr  error
+    httpClientOnce *sync.Once
 }
 func (b *Browser) getHTTPClient() (*http.Client, error) {
-    var err error
-    b.httpClientOnce.Do(func() {
-        b.httpClient, err = newHttpClient(b.options)
-    })
-    return b.httpClient, err
+    b.httpClientOnce.Do(func() {
+        b.httpClient, b.httpClientErr = newHttpClient(b.options)
+    })
+    return b.httpClient, b.httpClientErr
 }

Alternative approach: remove sync.Once and retry construction on subsequent calls until success using a mutex; but the above is the minimal, safe change.

Also applies to: 161-167

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 8ef3662 and 3fc2cc2.

📒 Files selected for processing (2)
  • pkg/protocols/headless/engine/engine.go (4 hunks)
  • pkg/protocols/headless/engine/page.go (1 hunks)
🔇 Additional comments (1)
pkg/protocols/headless/engine/page.go (1)

64-67: Apply default headers early — good; ensure Accept-Language behavior remains intentional

Applying headers once per page before UA override is correct. With the engine-side fix to always set Accept-Language unless overridden, this preserves previous behavior while allowing user-specified headers.

If you keep the current engine implementation, Accept-Language will not be set when no custom headers are provided. Confirm intended behavior or adopt the proposed engine change.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🔭 Outside diff range comments (1)
pkg/protocols/headless/engine/engine.go (1)

101-115: Normalize and harden header parsing; trim UA; handle “User-Agent” detection after trimming.

  • Trim key/value before checking for “User-Agent” so “ User-Agent” is recognized.
  • Trim the UA value to avoid leading spaces.
  • Basic hardening: skip headers containing CR/LF to prevent header injection into CDP.
  • Optional: use strings.Cut for simpler split.
-	for _, option := range options.CustomHeaders {
-		parts := strings.SplitN(option, ":", 2)
-		if len(parts) != 2 {
-			continue
-		}
-		if strings.EqualFold(parts[0], "User-Agent") {
-			customAgent = parts[1]
-		} else {
-			k := strings.TrimSpace(parts[0])
-			v := strings.TrimSpace(parts[1])
-			if k == "" || v == "" {
-				continue
-			}
-			defaultHeaders[k] = v
-		}
-	}
+	for _, h := range options.CustomHeaders {
+		parts := strings.SplitN(h, ":", 2)
+		if len(parts) != 2 {
+			continue
+		}
+		k := strings.TrimSpace(parts[0])
+		v := strings.TrimSpace(parts[1])
+		if k == "" || v == "" {
+			continue
+		}
+		// basic hardening against CR/LF in header k/v
+		if strings.ContainsAny(k, "\r\n") || strings.ContainsAny(v, "\r\n") {
+			continue
+		}
+		if strings.EqualFold(k, "User-Agent") {
+			customAgent = v
+			continue
+		}
+		defaultHeaders[k] = v
+	}

Alternative (optional):

- parts := strings.SplitN(h, ":", 2)
- if len(parts) != 2 {
-   continue
- }
- k := strings.TrimSpace(parts[0])
- v := strings.TrimSpace(parts[1])
+ key, val, ok := strings.Cut(h, ":")
+ if !ok {
+   continue
+ }
+ k := strings.TrimSpace(key)
+ v := strings.TrimSpace(val)
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 8ef3662 and 3fc2cc2.

📒 Files selected for processing (2)
  • pkg/protocols/headless/engine/engine.go (4 hunks)
  • pkg/protocols/headless/engine/page.go (1 hunks)
🧰 Additional context used
🧬 Code Graph Analysis (1)
pkg/protocols/headless/engine/engine.go (2)
pkg/protocols/headless/engine/page.go (2)
  • Options (50-54)
  • Page (26-41)
pkg/types/types.go (1)
  • Options (32-464)

Comment on lines +64 to +67
if err = i.browser.applyDefaultHeaders(page); err != nil {
return nil, nil, err
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Verification agent

🧩 Analysis chain

Good placement; ensure Accept-Language still applies with no custom headers.

Calling applyDefaultHeaders right after setting Timeout is correct. With the current engine implementation, Accept-Language won’t be set if defaultHeaders is empty (regression). Please merge the engine.go fix to always apply Accept-Language unless the user provided one.

To confirm coverage and avoid missed call sites:


🏁 Script executed:

#!/bin/bash
# Find all Page creations and check that default headers are applied.
rg -n 'Page\s*\(\s*proto\.TargetCreateTarget' -A2
rg -n 'SetExtraHeaders\s*\(' -A2

Length of output: 487


All page creation sites covered; merge Accept-Language fix

The placement of applyDefaultHeaders immediately after setting Timeout in page.go is correct and covers all page creation calls. However, until the engine patch is merged, Accept-Language won’t be applied when defaultHeaders is empty.

Key locations checked:

  • Page creation in pkg/protocols/headless/engine/page.go:58
  • SetExtraHeaders call in pkg/protocols/headless/engine/engine.go:157–159

Action required:

  • Merge the fix in engine.go to ensure Accept-Language is always set unless explicitly overridden by the user.
🤖 Prompt for AI Agents
In pkg/protocols/headless/engine/page.go around lines 64 to 67,
applyDefaultHeaders is correctly placed but currently Accept-Language is not
applied when defaultHeaders is empty; merge the fix from
pkg/protocols/headless/engine/engine.go (around lines 157–159) into the codebase
so that SetExtraHeaders ensures Accept-Language is always added unless the user
explicitly sets it—update SetExtraHeaders logic to insert Accept-Language into
headers when missing (or when defaultHeaders is empty) before it’s passed to
applyDefaultHeaders and ensure the merged change is used by page creation paths.

@ehsandeep
Copy link
Member

@coderabbitai create and link github issue with this PR

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Aug 11, 2025

Tip

For best results, initiate chat on the files or code changes.

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Create a GitHub issue linked to PR #6376
gh issue create \
  --title "Track headless browser header merging implementation" \
  --body "This issue tracks the implementation of extra headers support for headless browser pages.

## Context
PR #6376 implements the ability to merge and apply extra headers (provided via CLI) to headless browser pages, fixing issue #6352.

## Related Links
- Pull Request: https://github.com/projectdiscovery/nuclei/pull/6376
- Comment: https://github.com/projectdiscovery/nuclei/pull/6376#issuecomment-2609879476
- Original Issue: #6352

## Summary
The implementation adds a \`defaultHeaders\` field to the Browser struct and ensures that CLI-provided headers (except User-Agent) are properly propagated to headless browser page requests.

Requested by: @ehsandeep" \
  --assignee ehsandeep

Length of output: 823


@ehsandeep I've successfully created GitHub issue #6377 to track this PR's implementation of headless browser header merging. The issue has been assigned to you and includes all relevant links.

Issue URL: #6377

@ehsandeep ehsandeep linked an issue Aug 11, 2025 that may be closed by this pull request
@dwisiswant0 dwisiswant0 linked an issue Aug 12, 2025 that may be closed by this pull request
1 task
@dwisiswant0 dwisiswant0 changed the title headless: merge extra headers fix(headless): merge extra headers Aug 12, 2025
@ehsandeep ehsandeep merged commit d569cfe into projectdiscovery:dev Aug 15, 2025
18 of 19 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Track headless browser header merging implementation [BUG] Headless does not take specific headers

4 participants