Fix connection reuse + port pre-flight by Mzack9999 · Pull Request #6715 · projectdiscovery/nuclei

Mzack9999 · 2025-12-18T11:09:09Z

Proposed changes

Checklist

Pull request is created against the dev branch
All checks passed (lint, unit/integration/regression tests etc.) with my changes
I have added tests that prove my fix is effective or that my feature works
I have added necessary documentation (if appropriate)

Summary by CodeRabbit

New Features
- Preflight port scanning to detect and filter responsive targets before execution
- Per-host rate limiting for finer-grained throttling
- Per-host HTTP client pooling to optimize connections
- HTTP client sharding to distribute load across multiple pools
- Connection reuse tracking with per-host statistics
- Automatic HTTP→HTTPS port mismatch detection and correction
- New CLI options to enable/disable these behaviors

_{✏️ Tip: You can customize this high-level summary in your review settings.}

coderabbitai · 2025-12-18T11:09:17Z

Walkthrough

Adds optional preflight TCP port scanning, per-host HTTP client pooling, sharded HTTP clients, per-host rate limiting, connection-reuse and HTTP→HTTPS port trackers, new CLI flags and Options toggles, and supporting protocolstate and pool implementations.

Changes

Cohort / File(s)	Summary
CLI & Configuration `cmd/nuclei/main.go`, `pkg/types/types.go`	New CLI flags and Options fields: `preflight-portscan`, `per-host-client-pool`, `http-client-shards`, `per-host-rate-limit`.
Preflight Port Scanning `internal/runner/preflight_portscan.go`	New preflight TCP connect scanner: derives ports, resolves IPs, parallel batched scans, filters input provider, and reports statistics.
Runner & Execution `internal/runner/runner.go`	Integrates optional preflight phase, toggles global rate limiter when per-host rate limiting enabled, records input count in protocolstate, and emits per-host/shard/trackers stats.
Protocol State `pkg/protocols/common/protocolstate/dialers.go`, `pkg/protocols/common/protocolstate/state.go`	Added Dialers fields (PerHostHTTPPool, PerHostRateLimitPool, ConnectionReuseTracker, HTTPToHTTPSPortTracker, ShardedHTTPPool, InputCount) and exported `SetInputCount()`; InputCount initialized on dialers setup.
HTTP Request Flow `pkg/protocols/http/http.go`, `pkg/protocols/http/build_request.go`, `pkg/protocols/http/request.go`, `pkg/protocols/http/request_fuzz.go`	Added ConnectionReusePolicy and AnalyzeConnectionReuse(); policy-driven keep-alive handling; introduced per-host rate limiting calls, per-target client selection, connection reuse tracking via httptrace, and HTTP→HTTPS mismatch detection/recording.
HTTP Client Pooling & Rate Limits `pkg/protocols/http/httpclientpool/clientpool.go`	Added public APIs for per-target client retrieval, per-host rate limiter access/recording, connection reuse and HTTP→HTTPS trackers; early tracker initialization; routing to per-host or sharded pools.
Per-Host Client Pool `pkg/protocols/http/httpclientpool/perhost_pool.go`	New expirable LRU-backed per-host HTTP client pool with GetOrCreate, eviction, stats, client info, resize, and close operations.
Per-Host Rate Limit Pool `pkg/protocols/http/httpclientpool/perhost_ratelimit_pool.go`	New expirable LRU-backed per-host rate limiter pool with GetOrCreate, eviction, PPS tracking, stats, and request recording.
Sharded Client Pool `pkg/protocols/http/httpclientpool/sharded_pool.go`	New sharded HTTP client pool with host-to-shard hashing, configurable shard count (Default 16, Min 4, Max 256), per-shard stats, and client creation per shard.
Connection Reuse Tracker `pkg/protocols/http/httpclientpool/connection_reuse_tracker.go`	New ConnectionReuseTracker with global and per-host reuse/new counts, LRU expirable cache, stats and print helpers, and thread-safe recording.
HTTP-to-HTTPS Port Tracker `pkg/protocols/http/httpclientpool/http_to_https_tracker.go`	New tracker marking host:port entries requiring HTTPS, with detection/correction counters and stats reporting.
Utilities & Tests `pkg/protocols/utils/http/requtils.go`, `lib/tests/sdk_test.go`, `pkg/protocols/common/protocolstate/memguardian_test.go`, `pkg/protocols/http/request_test.go`	`ShouldDisableKeepAlive` tightened to require TemplateThreads == 0; goleak ignores added for expirable LRU TTL goroutine in tests.
Dependencies `go.mod`	`github.com/hashicorp/golang-lru/v2` moved from indirect to direct require (v2.0.7).

Sequence Diagram(s)

sequenceDiagram
    participant Runner
    participant Preflight as Preflight Scanner
    participant Resolver as DNS Resolver
    participant PortScanner as TCP Port Scanner
    participant InputProv as Input Provider
    participant Stats as Stats Reporter

    Note over Runner,Preflight: Enabled when --preflight-portscan is set

    Runner->>Preflight: start preflight (templates, ports)
    Preflight->>Resolver: resolve hostnames -> IPs (parallel)
    Resolver-->>Preflight: resolved IP lists

    loop per input target
        Preflight->>PortScanner: scan candidate ports on resolved IPs (batched/parallel)
        alt open port found
            PortScanner-->>Preflight: record open port, mark kept
        else none open
            PortScanner-->>Preflight: mark filtered
        end
    end

    Preflight->>InputProv: wrap provider to expose only kept targets
    Preflight->>Stats: emit summary (total, kept, filtered, per-port counts)
    Runner->>InputProv: proceed with filtered inputs

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

"🐰
New pools and trackers hop in tune,
Preflight probes by afternoon,
Per-host limits, shards that bite,
Connections reused through the night,
A carrot-code chorus—hop, delight!"

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 64.71% which is insufficient. The required threshold is 80.00%.	You can run `@coderabbitai generate docstrings` to improve docstring coverage.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title 'Fix connection reuse + port pre-flight' directly relates to the main changes in the changeset, which introduce connection reuse tracking, preflight port scanning, and related optimizations across multiple files.

✨ Finishing touches

📝 Generate docstrings

📜 Recent review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 0de52bb and 6bc5df6.

📒 Files selected for processing (2)

go.mod
pkg/protocols/common/protocolstate/state.go

🚧 Files skipped from review as they are similar to previous changes (1)

pkg/protocols/common/protocolstate/state.go

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)

GitHub Check: Lint

🔇 Additional comments (1)

go.mod (1)

84-84: golang-lru/v2 v2.0.7 is current and free of known vulnerabilities. The dependency is the latest stable release with no CVEs in public databases, making it safe for the HTTP client pooling, rate limiting, and connection-reuse tracking features.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

pkg/protocols/http/httpclientpool/sharded_pool.go

Ice3man543 · 2025-12-24T14:08:09Z

@coderabbitai review

coderabbitai · 2025-12-24T14:08:14Z

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

coderabbitai

Actionable comments posted: 4

♻️ Duplicate comments (2)

pkg/protocols/http/httpclientpool/perhost_ratelimit_pool.go (1)

211-261: Host normalization expects URLs/host:port, not full HTTP requests

normalizeHostForRateLimit (and extractHostPortFromString) correctly handle URLs, bare hosts, and host:port strings, but will mis-normalize if given a full HTTP request line (e.g., "GET http://example.com/ HTTP/1.1\r\n..."). This leads to odd keys such as "GET:80" and breaks per-host grouping.

The pool itself is fine; just ensure call sites pass URLs or host:port (not Request.String()), as suggested in the request.go comment.

Also applies to: 263-307

pkg/protocols/http/httpclientpool/sharded_pool.go (1)

276-281: TLS security configuration: InsecureSkipVerify and MinVersion: TLS 1.0.

These settings were already flagged by CodeQL in past reviews. While InsecureSkipVerify: true is typical for security scanning tools that need to test any target, and TLS 1.0 support may be intentional for compatibility, this is worth acknowledging.

For a security scanning tool like Nuclei, these settings are likely intentional to maximize target compatibility during vulnerability assessment.

🧹 Nitpick comments (19)

pkg/protocols/common/protocolstate/memguardian_test.go (1)
103-110: Consider adding the expirable LRU exclusion for consistency.

TestMemGuardianReset has its own goleak.VerifyNone call but doesn't include the github.com/hashicorp/golang-lru/v2/expirable exclusion that was added to TestMemGuardianGoroutineLeak. If this test can trigger code paths that use the expirable LRU cache, it may become flaky.
🔎 Proposed fix
 func TestMemGuardianReset(t *testing.T) {
 	defer goleak.VerifyNone(t,
 		goleak.IgnoreAnyContainingPkg("go.opencensus.io/stats/view"),
 		goleak.IgnoreAnyContainingPkg("github.com/syndtr/goleveldb"),
 		goleak.IgnoreAnyContainingPkg("github.com/go-rod/rod"),
 		goleak.IgnoreAnyContainingPkg("github.com/projectdiscovery/interactsh/pkg/server"),
 		goleak.IgnoreAnyContainingPkg("github.com/projectdiscovery/ratelimit"),
+		// expirable LRU cache creates a background goroutine for TTL expiration that persists
+		// see: https://github.com/hashicorp/golang-lru/blob/770151e9c8cdfae1797826b7b74c33d6f103fbd8/expirable/expirable_lru.go#L79
+		goleak.IgnoreAnyContainingPkg("github.com/hashicorp/golang-lru/v2/expirable"),
 	)
pkg/protocols/common/protocolstate/state.go (1)
212-214: Minor: Redundant initialization.

Setting dialersInstance.InputCount = 0 is redundant since Go zero-initializes struct fields. The comment explains intent, but the explicit assignment adds no value.
🔎 Proposed simplification
-	// Set input count for sharding calculation (will be updated later when input provider is ready)
-	dialersInstance.InputCount = 0
-
+	// InputCount will be updated later when input provider is ready (via SetInputCount)
 	return nil
pkg/protocols/http/request_fuzz.go (1)

184-187: Prefer using generated request URL for per-host rate limiting

Here you use input.MetaInput.Input as the hostname, which may not reflect the final fuzzed request (e.g. when the rule changes host/port or when Input is a raw HTTP request string). Since gr.Request is already the concrete HTTP request, consider deriving the URL/host from it (e.g. its URL) and passing that into rateLimitTake for more accurate per-host limiting, falling back to input.MetaInput.Input only if needed.

internal/runner/runner.go (2)

688-694: Preflight failures currently abort the scan

preflightResolveAndPortScan is treated as mandatory when PreflightPortScan is enabled: any error (e.g., dialers not initialized, DNS resolution setup issues) will fail RunEnumeration. If preflight is meant to be a best-effort optimization rather than a hard requirement, you may want to log and continue instead of returning an error here, or at least document that enabling the flag can cause the whole scan to abort on preflight errors.

717-733: Dialer stats printing could reuse a single lookup and optionally lock

At the end of the run you call protocolstate.GetDialersWithId multiple times and then read PerHostHTTPPool, PerHostRateLimitPool, ConnectionReuseTracker, ShardedHTTPPool, and HTTPToHTTPSPortTracker without taking the embedded mutex. In practice this runs after enumeration and is unlikely to race, but to keep it robust:

Fetch dialers once into a local variable.

Consider taking dialers.Lock() around reads of the pool/tracker fields.

This avoids potential data races if any background worker still touches dialers.

Also applies to: 770-813

pkg/protocols/common/protocolstate/dialers.go (1)

18-23: New dialer fields are fine, but consider concrete types long-term

The added fields (PerHostHTTPPool, PerHostRateLimitPool, ConnectionReuseTracker, HTTPToHTTPSPortTracker, ShardedHTTPPool) are all any, which is consistent with avoiding import cycles but pushes type-safety to call sites via type assertions. If you later introduce a thin local interface for each (e.g. exposing just PrintStats / GetOrCreate), you can keep protocolstate decoupled while still getting compile-time guarantees.

internal/runner/preflight_portscan.go (2)

36-66: Filtering input provider assumes a fixed post-preflight target set

filteringInputProvider wraps the base provider and exposes Count() as a fixed allowCnt while delegating Iterate to the underlying provider and filtering by allowed. This is fine as long as the base provider is not mutated after preflight. If you ever add inputs dynamically later, Count() and the actual number of iterated targets could diverge; documenting that assumption (or computing Count() from allowed on demand) would make the behavior clearer.

71-91: Preflight resolver/scan flow is solid; double-check Fastdialer initialization and unused param

The preflight workflow (template-driven port set, resolve with Fastdialer, then short TCP dials with early stop) is well-structured and bounded. A couple of small points:

dialers := protocolstate.GetDialersWithId(r.options.ExecutionId) is checked for nil, but dialers.Fastdialer is assumed non-nil; if preflight can run before dialers are fully initialized, this will panic rather than fail gracefully.

preflightOneResolved receives dialers *protocolstate.Dialers but doesn’t use it; you can drop that parameter to simplify the signature unless you plan to use it later.

If Fastdialer is guaranteed to be initialized before preflightResolveAndPortScan runs, consider adding a brief comment to that effect.

Also applies to: 151-188, 335-444
pkg/protocols/http/httpclientpool/http_to_https_tracker.go (2)
55-76: Side effect in read operation: RequiresHTTPS increments counter on every call.

RequiresHTTPS is semantically a query/read operation, but it increments totalCorrections each time it's called and returns true. This means:

Multiple calls for the same host:port will inflate the corrections count

The counter tracks "checks that returned true" rather than actual corrections applied

If the intent is to track actual corrections made (e.g., when a URL is rewritten), consider moving the counter increment to the caller where the correction is applied.
🔎 Proposed refactor: separate query from recording
 // RequiresHTTPS checks if a host:port requires HTTPS
 func (t *HTTPToHTTPSPortTracker) RequiresHTTPS(hostPort string) bool {
 	if hostPort == "" {
 		return false
 	}

 	normalizedHostPort := normalizeHostPortForTracker(hostPort)
 	if normalizedHostPort == "" {
 		return false
 	}

 	requiresHTTPS, ok := t.ports.Get(normalizedHostPort)
 	if !ok {
 		return false
 	}

-	if requiresHTTPS {
-		t.totalCorrections.Add(1)
-	}
-
 	return requiresHTTPS
 }
+
+// RecordCorrection records that a correction was applied
+func (t *HTTPToHTTPSPortTracker) RecordCorrection() {
+	t.totalCorrections.Add(1)
+}
108-203: Code duplication: normalizeHostPortForTracker and extractHostPortFromStringForHTTPS are nearly identical to functions in connection_reuse_tracker.go.

The functions normalizeHostPortForTracker/extractHostPortFromStringForHTTPS here and normalizeHostForConnectionReuse/extractHostPortFromStringForReuse in connection_reuse_tracker.go share almost identical logic. Consider extracting a shared helper to reduce duplication and maintenance burden.
pkg/protocols/http/httpclientpool/connection_reuse_tracker.go (3)
179-214: Double-checked locking uses Peek instead of Get inside the lock.

In getOrCreateEntry, the second check at line 188 uses cache.Peek() while the first check at line 180 uses cache.Get(). This is intentional and correct for LRU caches—Get updates access time (touching the entry), while Peek does not. However, the explicit Store(0) calls at lines 196-205 are redundant since Go zero-initializes atomic values.
🔎 Remove redundant zero-initialization
 	entry := &connectionReuseEntry{
 		host:      normalizedHost,
 		createdAt: time.Now(),
 	}
-	entry.totalConnections.Store(0)
-	entry.totalReused.Store(0)
-	entry.totalNewConnections.Store(0)
-	entry.accessCount.Store(0)
-	entry.totalHTTPConnections.Store(0)
-	entry.totalHTTPSConnections.Store(0)
-	entry.totalHTTPReused.Store(0)
-	entry.totalHTTPSReused.Store(0)
-	entry.totalHTTPNewConnections.Store(0)
-	entry.totalHTTPSNewConnections.Store(0)
207-211: Unused variable evicted is assigned but only used in a no-op statement.

The return value from cache.Add is assigned to evicted but then discarded with _ = evicted. This is dead code. The Add method returns a boolean indicating if an entry was evicted, which could be logged or tracked.
🔎 Simplify or use the eviction indicator
-	evicted := t.cache.Add(normalizedHost, entry)
-	if evicted {
-		_ = evicted
-		// Entry was evicted, but we still return the new entry
-	}
+	_ = t.cache.Add(normalizedHost, entry)
272-377: PrintPerHostStats holds the mutex while performing potentially slow logging operations.

The method acquires t.mu.Lock() at line 277 and holds it throughout the entire iteration and logging (lines 295-376). This blocks other operations like RecordConnection that also need the mutex via getOrCreateEntry. Consider collecting data under the lock and releasing it before logging.
🔎 Proposed refactor to reduce lock contention
 func (t *ConnectionReuseTracker) PrintPerHostStats() {
 	if t.Size() == 0 {
 		return
 	}

+	// Collect stats under lock
 	t.mu.Lock()
-	defer t.mu.Unlock()
-
 	hostStats := []struct {
 		// ... fields ...
 	}{}

 	for _, key := range t.cache.Keys() {
 		// ... collect stats ...
 	}
+	t.mu.Unlock()

 	if len(hostStats) == 0 {
 		return
 	}

+	// Log outside the lock
 	gologger.Info().Msgf("[connection-reuse-tracker] Per-host connection reuse:")
 	for _, stat := range hostStats {
 		// ... logging ...
 	}
 }
pkg/protocols/http/httpclientpool/clientpool.go (1)

41-46: Trackers initialized but return values discarded.

These calls ensure trackers are created early, but discarding the return values means any initialization errors would be silently ignored. Consider logging or returning an error if initialization fails.
pkg/protocols/http/httpclientpool/perhost_pool.go (3)
137-170: Inconsistent normalization: this function returns scheme://host:port while others return host:port.

The normalizeHost function here returns a full URL format (e.g., https://example.com:443), while normalizeHostForConnectionReuse and normalizeHostPortForTracker in other files return just host:port format. This could cause confusion when correlating data across pools and trackers.

Additionally, lines 162-167 are unreachable—if port != "" at line 158, line 159 returns, so lines 162-167 checking port == "" will always be true (redundant conditions).
🔎 Simplify unreachable code
 	port := parsed.Port()
 	if port != "" {
 		return fmt.Sprintf("%s://%s:%s", scheme, parsed.Hostname(), port)
 	}

-	if scheme == "https" && port == "" {
+	if scheme == "https" {
 		return fmt.Sprintf("%s://%s:443", scheme, parsed.Hostname())
 	}
-	if scheme == "http" && port == "" {
-		return fmt.Sprintf("%s://%s:80", scheme, parsed.Hostname())
-	}
-
-	return fmt.Sprintf("%s://%s", scheme, host)
+	return fmt.Sprintf("%s://%s:80", scheme, parsed.Hostname())
237-246: Hit rate calculation has an off-by-one error in the denominator.

At line 244, the hit rate is calculated as Hits * 100 / (Hits + Misses + 1). The +1 prevents division by zero but artificially lowers the hit rate, especially for small sample sizes. A 100% hit rate becomes ~99.9% for 1000 hits.
🔎 Fix hit rate calculation
 func (p *PerHostClientPool) PrintStats() {
 	stats := p.Stats()
 	if stats.Size == 0 {
 		return
 	}
+	hitRate := float64(0)
+	total := stats.Hits + stats.Misses
+	if total > 0 {
+		hitRate = float64(stats.Hits) * 100 / float64(total)
+	}
 	gologger.Verbose().Msgf("[perhost-pool] Connection reuse stats: Hits=%d Misses=%d HitRate=%.1f%% Hosts=%d",
-		stats.Hits, stats.Misses,
-		float64(stats.Hits)*100/float64(stats.Hits+stats.Misses+1),
-		stats.Size)
+		stats.Hits, stats.Misses, hitRate, stats.Size)
 }
248-249: Empty PrintTransportStats method.

This method has no implementation. If it's intended as a placeholder for future functionality, consider adding a TODO comment or removing it if not needed.
pkg/protocols/http/httpclientpool/sharded_pool.go (2)
98-103: Unused conditional: baseMaxIdleConnsPerHost is set to 500 in both branches.

Lines 99-103 check if baseConfig.Threads == 0 but set baseMaxIdleConnsPerHost = 500 in both cases, making the condition pointless.
🔎 Simplify redundant conditional
-	// Base max idle conns per host (from existing logic: 500 when threading enabled)
-	baseMaxIdleConnsPerHost := 500
-	if baseConfig.Threads == 0 {
-		// If no threading, we still want some pooling for sharding
-		baseMaxIdleConnsPerHost = 500
-	}
+	// Base max idle conns per host for sharding (consistent regardless of threading)
+	baseMaxIdleConnsPerHost := 500
225-391: Significant code duplication: wrappedGetWithCustomMaxIdle duplicates ~150 lines from wrappedGet in clientpool.go.

This function replicates most of the client creation logic from wrappedGet. Changes to TLS configuration, proxy handling, or transport settings would need to be made in both places. Consider refactoring to share the common logic.
🔎 Potential refactoring approach

Extract the common transport and client creation logic into a shared helper function that accepts maxIdleConnsPerHost as a parameter:
func createHTTPClientWithConfig(
    options *types.Options,
    configuration *Configuration,
    maxIdleConnsPerHost int,
    enableCookieJar bool,
) (*retryablehttp.Client, error) {
    // ... shared logic ...
}
Then both wrappedGet and wrappedGetWithCustomMaxIdle can call this helper with their respective parameters.

📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between c80ac99 and 0de52bb.

📒 Files selected for processing (21)

cmd/nuclei/main.go
go.mod
internal/runner/preflight_portscan.go
internal/runner/runner.go
lib/tests/sdk_test.go
pkg/protocols/common/protocolstate/dialers.go
pkg/protocols/common/protocolstate/memguardian_test.go
pkg/protocols/common/protocolstate/state.go
pkg/protocols/http/build_request.go
pkg/protocols/http/http.go
pkg/protocols/http/httpclientpool/clientpool.go
pkg/protocols/http/httpclientpool/connection_reuse_tracker.go
pkg/protocols/http/httpclientpool/http_to_https_tracker.go
pkg/protocols/http/httpclientpool/perhost_pool.go
pkg/protocols/http/httpclientpool/perhost_ratelimit_pool.go
pkg/protocols/http/httpclientpool/sharded_pool.go
pkg/protocols/http/request.go
pkg/protocols/http/request_fuzz.go
pkg/protocols/http/request_test.go
pkg/protocols/utils/http/requtils.go
pkg/types/types.go

🧰 Additional context used

📓 Path-based instructions (5)

**/*.go