Skip to content

perf(loader): eliminate double parsing#6503

Closed
dwisiswant0 wants to merge 2 commits intodevfrom
dwisiswant0/perf/loader/eliminate-double-parsing
Closed

perf(loader): eliminate double parsing#6503
dwisiswant0 wants to merge 2 commits intodevfrom
dwisiswant0/perf/loader/eliminate-double-parsing

Conversation

@dwisiswant0
Copy link
Member

@dwisiswant0 dwisiswant0 commented Sep 27, 2025

Proposed changes

Previously, each template was parsed twice during
validation:

  1. First in LoadTemplate() via Parser.ParseTemplate().
  2. Then again in templates.Parse() for validation checks.

This caused significant overhead when validating
10,000+ templates in nuclei-templates repo.

Proof

$ hyperfine --warmup=5 --prepare='sleep 5' './bin/nuclei-dev -validate -duc' './bin/nuclei -validate -duc'
Benchmark 1: ./bin/nuclei-dev -validate -duc
  Time (mean ± σ):     93.662 s ± 41.013 s    [User: 24.995 s, System: 2.811 s]
  Range (min … max):   51.213 s … 169.758 s    10 runs
 
Benchmark 2: ./bin/nuclei -validate -duc
  Time (mean ± σ):      2.490 s ±  0.025 s    [User: 3.163 s, System: 0.301 s]
  Range (min … max):    2.456 s …  2.517 s    10 runs
 
Summary
  ./bin/nuclei -validate -duc ran
   37.62 ± 16.48 times faster than ./bin/nuclei-dev -validate -duc

Note

Key Performance Metrics:

  • 37.62x faster validation (3,762% improvement)
  • Reduced from ~93.7s to ~2.5s (1.5 minutes down to 2.5 seconds)

Checklist

  • Pull request is created against the dev branch
  • All checks passed (lint, unit/integration/regression tests etc.) with my changes
  • I have added tests that prove my fix is effective or that my feature works
  • I have added necessary documentation (if appropriate)

Summary by CodeRabbit

  • New Features
    • None
  • Performance
    • Reduced template parsing by using an optional shared parsed-cache, speeding catalog/workflow initialization.
  • Bug Fixes
    • More resilient handling when cached entries are absent, falling back to parsing without causing false negatives.
    • Preserves prior behavior for intentionally empty templates and existing validation/duplicate detection.

Previously, each template was parsed twice during
validation:
1. First in `LoadTemplate()` via
   `Parser.ParseTemplate()`.
2. Then again in `templates.Parse()` for
   validation checks.

This caused significant overhead when validating
10,000+ templates in nuclei-templates repo.

Closes #6502.

Signed-off-by: Dwi Siswanto <git@dw1.io>
@auto-assign auto-assign bot requested a review from Mzack9999 September 27, 2025 20:55
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Sep 27, 2025

Walkthrough

Adds optional parsed-template cache usage to loader validation: asserts a templates.Parser, tries parsedCache.Has to retrieve cached parsed templates before calling templates.Parse, falls back to parsing on cache miss, and preserves existing error, nil-template, duplicate-ID, and workflow checks.

Changes

Cohort / File(s) Summary of Changes
Catalog loader validation/cache integration
pkg/catalog/loader/loader.go
Assert templates.Parser from config; obtain its parsed cache when available; perform cache-first lookup (parsedCache.Has) for template paths and use cached parsed templates if present; fall back to templates.Parse on cache miss; keep existing parse error handling, nil-template flow, duplicate template.ID tracking, and workflow validation; log and return false on parser assertion failure.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  participant L as Loader.areWorkflowOrTemplatesValid
  participant C as Config
  participant P as templates.Parser
  participant Cache as ParsedCache
  participant Parse as templates.Parse

  L->>C: get Parser
  alt Parser implements templates.Parser
    L->>P: access parsedCache
    loop per templatePath
      L->>Cache: Has(templatePath)
      alt cache hit (parsed != nil)
        Cache-->>L: return parsed template
        note right of L: use cached parsed template
      else cache miss or no cache
        L->>Parse: Parse(templatePath)
        Parse-->>L: parsed template or error
        alt parse error
          note right of L: mark invalid, continue
        end
      end
      alt parsed template is nil
        note right of L: continue to next path
      else parsed template non-nil
        L->>L: track duplicate IDs & validate workflows
      end
    end
  else parser assertion fails
    L-->>L: log error & return false
  end
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

Poem

I sniffed the cache, a carrot stash,
No need to parse with every dash.
If miss, I nibble code anew,
If hit, I hop right through the queue.
IDs tracked, workflows tight—
A tidy warren, running light. 🥕🐇

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title Check ✅ Passed The title “perf(loader): eliminate double parsing” concisely and accurately summarizes the primary change of removing redundant template parsing in the loader to improve performance, using a conventional commit-style prefix and clear wording that immediately conveys the focus on performance optimization without extraneous details.
Docstring Coverage ✅ Passed No functions found in the changes. Docstring coverage check skipped.
✨ Finishing touches
  • 📝 Generate Docstrings
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch dwisiswant0/perf/loader/eliminate-double-parsing

📜 Recent review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 5b78658 and db123c6.

📒 Files selected for processing (1)
  • pkg/catalog/loader/loader.go (2 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • pkg/catalog/loader/loader.go
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: Tests (macOS-latest)
  • GitHub Check: Tests (windows-latest)
  • GitHub Check: Tests (ubuntu-latest)

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
pkg/catalog/loader/loader.go (1)

321-343: Potential nil-pointer dereference in LoadTemplatesOnlyMetadata.

template may be nil; len(template.RequestsHeadless) dereferences it.

 for templatePath := range validPaths {
-		template, _, _ := templatesCache.Has(templatePath)
-
-		if len(template.RequestsHeadless) > 0 && !store.config.ExecutorOptions.Options.Headless {
+		template, _, _ := templatesCache.Has(templatePath)
+		if template == nil {
+			// e.g., not cached yet or global matchers stub; skip safely
+			continue
+		}
+		if len(template.RequestsHeadless) > 0 && !store.config.ExecutorOptions.Options.Headless {
 			continue
 		}
 
-		if len(template.RequestsCode) > 0 && !store.config.ExecutorOptions.Options.EnableCodeTemplates {
+		if len(template.RequestsCode) > 0 && !store.config.ExecutorOptions.Options.EnableCodeTemplates {
 			continue
 		}
 
-		if template.IsFuzzing() && !store.config.ExecutorOptions.Options.DAST {
+		if template.IsFuzzing() && !store.config.ExecutorOptions.Options.DAST {
 			continue
 		}
 
-		if template.SelfContained && !store.config.ExecutorOptions.Options.EnableSelfContainedTemplates {
+		if template.SelfContained && !store.config.ExecutorOptions.Options.EnableSelfContainedTemplates {
 			continue
 		}
 
-		if template.HasFileProtocol() && !store.config.ExecutorOptions.Options.EnableFileTemplates {
+		if template.HasFileProtocol() && !store.config.ExecutorOptions.Options.EnableFileTemplates {
 			continue
 		}
🧹 Nitpick comments (1)
pkg/catalog/loader/loader.go (1)

421-435: Make “global matchers” an explicit sentinel error (future refactor).

Current continue-on-nil works; consider introducing ErrGlobalMatchersTemplate and using errors.Is for clarity and testability.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between f8f89bb and 5b78658.

📒 Files selected for processing (1)
  • pkg/catalog/loader/loader.go (2 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.go

📄 CodeRabbit inference engine (CLAUDE.md)

**/*.go: Format Go code using go fmt
Run static analysis with go vet

Files:

  • pkg/catalog/loader/loader.go
🧠 Learnings (1)
📚 Learning: 2025-07-16T21:27:14.937Z
Learnt from: hdm
PR: projectdiscovery/nuclei#6322
File: pkg/templates/compile.go:79-81
Timestamp: 2025-07-16T21:27:14.937Z
Learning: To make the template caching mechanism in pkg/templates/compile.go production-ready, DSLs need to be updated to use runtime options instead of cached variables, rather than restoring the Compile() calls on each request.

Applied to files:

  • pkg/catalog/loader/loader.go
🧬 Code graph analysis (1)
pkg/catalog/loader/loader.go (4)
pkg/protocols/protocols.go (1)
  • ExecutorOptions (61-141)
pkg/templates/parser.go (1)
  • Parser (21-30)
pkg/templates/cache.go (1)
  • Cache (9-11)
pkg/templates/compile.go (1)
  • Parse (52-230)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Lint

Signed-off-by: Dwi Siswanto <git@dw1.io>
@dwisiswant0
Copy link
Member Author

brb

@dwisiswant0 dwisiswant0 deleted the dwisiswant0/perf/loader/eliminate-double-parsing branch September 28, 2025 01:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant