Skip to content

dont load templates with the same ID#6465

Merged
Mzack9999 merged 3 commits intodevfrom
4690_dont_load_dup_templates
Sep 12, 2025
Merged

dont load templates with the same ID#6465
Mzack9999 merged 3 commits intodevfrom
4690_dont_load_dup_templates

Conversation

@dogancanbakir
Copy link
Member

@dogancanbakir dogancanbakir commented Sep 10, 2025

Proposed changes

closes #4690

Checklist

  • Pull request is created against the dev branch
  • All checks passed (lint, unit/integration/regression tests etc.) with my changes
  • I have added tests that prove my fix is effective or that my feature works
  • I have added necessary documentation (if appropriate)

Summary by CodeRabbit

  • Bug Fixes

    • Prevents duplicate templates from appearing when loaded from multiple sources, including concurrent loads, by deduplicating on template ID rather than file path.
  • Improvements

    • Adds concurrency-safe deduplication to ensure consistent results across concurrent and non-concurrent loading paths.
    • Skips duplicates to reduce processing and memory use.
    • Emits clearer debug logs when duplicates are detected.
  • Chores

    • No changes to public APIs.

@dogancanbakir dogancanbakir self-assigned this Sep 10, 2025
@auto-assign auto-assign bot requested a review from dwisiswant0 September 10, 2025 13:44
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Sep 10, 2025

Walkthrough

Replaces manual map+mutex deduplication with concurrency-safe structures from github.com/projectdiscovery/utils/maps; both LoadTemplatesOnlyMetadata and LoadTemplatesWithTags now detect duplicates by template.ID and skip them (with debug logs); no public function signatures changed.

Changes

Cohort / File(s) Summary
Loader deduplication (concurrency-safe)
pkg/catalog/loader/loader.go
Replaces manual map+mutex with mapsutil.SyncLockMap and uses mapsutil.SyncSlice for concurrent collects; introduces loadedTemplateIDs and loadedTemplates; both LoadTemplatesOnlyMetadata and concurrent LoadTemplatesWithTags skip templates with duplicate template.ID and log debug messages; import github.com/projectdiscovery/utils/maps added; no exported signatures changed.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  participant C as Caller
  participant L as Loader
  participant SM as SyncLockMap (loaded IDs)
  participant SS as SyncSlice (loaded templates)
  participant T as loadTemplate()

  Note over SM,SS: Concurrency-safe deduplication structures

  C->>L: LoadTemplatesOnlyMetadata(paths)
  loop per template (sequential)
    L->>SM: Check template.ID (atomic)
    alt ID present
      L-->>C: Skip (debug)
    else
      L->>SM: Mark ID (atomic)
      L-->>C: Append metadata
    end
  end

  C->>L: LoadTemplatesWithTags(paths)
  par concurrent per template
    L->>T: loadTemplate(path)
    T->>SM: Check template.ID (atomic)
    alt ID present
      T-->>L: Skip (debug)
    else
      T->>SM: Mark ID (atomic)
      T->>SS: Append template (thread-safe)
      T-->>L: Return template
    end
  end
  L-->>C: Return non-duplicate templates
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Pre-merge checks (5 passed)

✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title Check ✅ Passed The title “dont load templates with the same ID” succinctly summarizes the primary change of preventing loading duplicate templates by ID, aligning clearly with the pull request’s main objective.
Linked Issues Check ✅ Passed The changes introduce concurrency‐safe deduplication based on template.ID in both metadata‐only and tagged loading paths, effectively implementing the mechanism requested in issue #4690 to prevent scanning the same template more than once.
Out of Scope Changes Check ✅ Passed All modifications are confined to the loader implementation for deduplicating templates by ID and do not introduce any unrelated or out‐of‐scope alterations beyond the objectives of issue #4690.
Docstring Coverage ✅ Passed No functions found in the changes. Docstring coverage check skipped.

Poem

I hop through folders, nose alert and quick,
Marking IDs with a clever, sync-safe trick.
If I’ve seen it once, I won’t take two bites—
One carrot per template, tidy and right.
Hooray for clean runs and deduped nights! 🥕🐇

Tip

👮 Agentic pre-merge checks are now available in preview!

Pro plan users can now enable pre-merge checks in their settings to enforce checklists before merging PRs.

  • Built-in checks – Quickly apply ready-made checks to enforce title conventions, require pull request descriptions that follow templates, validate linked issues for compliance, and more.
  • Custom agentic checks – Define your own rules using CodeRabbit’s advanced agentic capabilities to enforce organization-specific policies and workflows. For example, you can instruct CodeRabbit’s agent to verify that API documentation is updated whenever API schema files are modified in a PR. Note: Upto 5 custom checks are currently allowed during the preview period. Pricing for this feature will be announced in a few weeks.

Please see the documentation for more information.

Example:

reviews:
  pre_merge_checks:
    custom_checks:
      - name: "Undocumented Breaking Changes"
        mode: "warning"
        instructions: |
          Pass/fail criteria: All breaking changes to public APIs, CLI flags, environment variables, configuration keys, database schemas, or HTTP/GraphQL endpoints must be documented in the "Breaking Change" section of the PR description and in CHANGELOG.md. Exclude purely internal or private changes (e.g., code not exported from package entry points or explicitly marked as internal).

Please share your feedback with us on this Discord post.


📜 Recent review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between b05359b and 608159b.

📒 Files selected for processing (1)
  • pkg/catalog/loader/loader.go (4 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • pkg/catalog/loader/loader.go
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: Tests (macOS-latest)
  • GitHub Check: Tests (windows-latest)
  • GitHub Check: Tests (ubuntu-latest)
✨ Finishing Touches
  • 📝 Generate Docstrings
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch 4690_dont_load_dup_templates

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
pkg/catalog/loader/loader.go (1)

321-352: Nil-pointer risk before duplicate-ID check; also prefer set semantics for IDs.

templatesCache.Has(templatePath) can return nil; template is dereferenced (fields/methods) before the nil check at Line 344, which can panic. Guard early. While here, use map[string]struct{} and existence checks to model a set and avoid bool allocations.

Apply:

-	loadedTemplateIDs := make(map[string]bool)
+	loadedTemplateIDs := make(map[string]struct{}, len(validPaths))

 	for templatePath := range validPaths {
 		template, _, _ := templatesCache.Has(templatePath)
+		if template == nil {
+			continue
+		}
 
-		if len(template.RequestsHeadless) > 0 && !store.config.ExecutorOptions.Options.Headless {
+		if len(template.RequestsHeadless) > 0 && !store.config.ExecutorOptions.Options.Headless {
 			continue
 		}
@@
-		if template != nil {
-			if loadedTemplateIDs[template.ID] {
+		if template != nil {
+			if _, exists := loadedTemplateIDs[template.ID]; exists {
 				store.logger.Debug().Msgf("Skipping duplicate template ID '%s' from path '%s'", template.ID, templatePath)
 				continue
 			}
 
-			loadedTemplateIDs[template.ID] = true
+			loadedTemplateIDs[template.ID] = struct{}{}
 			template.Path = templatePath
 			store.templates = append(store.templates, template)
 		}
🧹 Nitpick comments (1)
pkg/catalog/loader/loader.go (1)

504-517: Concurrent dedup is fine; tighten set modeling and existence check.

Use map[string]struct{} and the exists idiom to reflect set semantics and shave a few bytes per entry.

-	loadedTemplateIDs := make(map[string]bool)
+	loadedTemplateIDs := make(map[string]struct{})
 	var loadedTemplateIDsMutex sync.Mutex

 	loadTemplate := func(tmpl *templates.Template) {
 		loadedTemplateIDsMutex.Lock()
-		if loadedTemplateIDs[tmpl.ID] {
+		if _, exists := loadedTemplateIDs[tmpl.ID]; exists {
 			loadedTemplateIDsMutex.Unlock()
 			store.logger.Debug().Msgf("Skipping duplicate template ID '%s' from path '%s'", tmpl.ID, tmpl.Path)
 			return
 		}
 
-		loadedTemplateIDs[tmpl.ID] = true
+		loadedTemplateIDs[tmpl.ID] = struct{}{}
 		loadedTemplateIDsMutex.Unlock()
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between ff5734b and 4916cf3.

📒 Files selected for processing (1)
  • pkg/catalog/loader/loader.go (4 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.go

📄 CodeRabbit inference engine (CLAUDE.md)

**/*.go: Format Go code using go fmt
Run static analysis with go vet

Files:

  • pkg/catalog/loader/loader.go
🧬 Code graph analysis (1)
pkg/catalog/loader/loader.go (1)
pkg/templates/templates.go (1)
  • Template (35-164)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: Tests (windows-latest)
  • GitHub Check: Tests (ubuntu-latest)
  • GitHub Check: Tests (macOS-latest)
🔇 Additional comments (1)
pkg/catalog/loader/loader.go (1)

10-10: Import looks correct.

sync is required for the mutex usage below.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
pkg/catalog/loader/loader.go (1)

1-683: Address remaining lint and typecheck errors

  • pkg/catalog/loader/loader.go still contains non-atomic loadedTemplateIDs.Set and Has calls in the concurrent loading path (lines 350, 507, 512); refactor to use thread-safe operations or guard with a mutex.
  • pkg/output/stats/waf/waf.go’s //go:embed regexes.json fails typecheck—ensure regexes.json is present at build time or update the embed directive.
    Re-run golangci-lint after these fixes to confirm all issues are resolved.
🧹 Nitpick comments (2)
pkg/catalog/loader/loader.go (2)

319-321: Using SyncLockMap here is OK (single-threaded path).

LoadTemplatesOnlyMetadata iterates serially, so SyncLockMap is not strictly required; a plain map[string]struct{} would suffice. Keeping it for consistency is fine.


345-351: Optional: include “first seen at” path in the debug to aid troubleshooting.

If helpful, store the first path as the map value so the duplicate log can include both paths.

Example (conceptual):

// use string value to store first seen path
var loadedTemplateIDs sync.Map
if prev, loaded := loadedTemplateIDs.LoadOrStore(tmpl.ID, tmpl.Path); loaded {
    store.logger.Debug().Msgf("Skipping duplicate template ID '%s' at '%s' (first seen at '%s')", tmpl.ID, tmpl.Path, prev.(string))
    return
}

Also applies to: 507-513

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 4916cf3 and b05359b.

📒 Files selected for processing (1)
  • pkg/catalog/loader/loader.go (4 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.go

📄 CodeRabbit inference engine (CLAUDE.md)

**/*.go: Format Go code using go fmt
Run static analysis with go vet

Files:

  • pkg/catalog/loader/loader.go
🧬 Code graph analysis (1)
pkg/catalog/loader/loader.go (1)
pkg/templates/templates.go (1)
  • Template (35-164)
🪛 GitHub Check: Lint
pkg/catalog/loader/loader.go

[failure] 350-350:
Error return value of loadedTemplateIDs.Set is not checked (errcheck)


[failure] 512-512:
Error return value of loadedTemplateIDs.Set is not checked (errcheck)

🪛 GitHub Actions: 🔨 Tests
pkg/catalog/loader/loader.go

[error] 350-350: golangci-lint: Error return value of loadedTemplateIDs.Set is not checked (errcheck)

🔇 Additional comments (2)
pkg/catalog/loader/loader.go (2)

28-28: New dependency import looks fine.

mapsutil is used below for SyncLockMap/Has/Set; no issues with the import itself. Ensure go mod is updated and formatted.


630-635: Sorting after concurrent load remains correct.

Stable sort by Path preserves deterministic output after atomic dedup; no change needed.

Comment on lines 345 to 351
if loadedTemplateIDs.Has(template.ID) {
store.logger.Debug().Msgf("Skipping duplicate template ID '%s' from path '%s'", template.ID, templatePath)
continue
}

loadedTemplateIDs.Set(template.ID, struct{}{})
template.Path = templatePath
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Fix errcheck failure and handle Set error.

golangci-lint fails because the error return of SyncLockMap.Set is ignored. Set can return ErrReadOnly; handle it and keep behavior unchanged. (pkg.go.dev)

Apply:

-            if loadedTemplateIDs.Has(template.ID) {
+            if loadedTemplateIDs.Has(template.ID) {
                 store.logger.Debug().Msgf("Skipping duplicate template ID '%s' from path '%s'", template.ID, templatePath)
                 continue
             }
-
-            loadedTemplateIDs.Set(template.ID, struct{}{})
+            if err := loadedTemplateIDs.Set(template.ID, struct{}{}); err != nil {
+                store.logger.Debug().Msgf("could not record template ID '%s' for path '%s': %v", template.ID, templatePath, err)
+            }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
if loadedTemplateIDs.Has(template.ID) {
store.logger.Debug().Msgf("Skipping duplicate template ID '%s' from path '%s'", template.ID, templatePath)
continue
}
loadedTemplateIDs.Set(template.ID, struct{}{})
template.Path = templatePath
if loadedTemplateIDs.Has(template.ID) {
store.logger.Debug().Msgf("Skipping duplicate template ID '%s' from path '%s'", template.ID, templatePath)
continue
}
if err := loadedTemplateIDs.Set(template.ID, struct{}{}); err != nil {
store.logger.Debug().Msgf("could not record template ID '%s' for path '%s': %v", template.ID, templatePath, err)
}
template.Path = templatePath
🧰 Tools
🪛 GitHub Check: Lint

[failure] 350-350:
Error return value of loadedTemplateIDs.Set is not checked (errcheck)

🪛 GitHub Actions: 🔨 Tests

[error] 350-350: golangci-lint: Error return value of loadedTemplateIDs.Set is not checked (errcheck)

🤖 Prompt for AI Agents
In pkg/catalog/loader/loader.go around lines 345 to 351, handle the error
returned by loadedTemplateIDs.Set to fix the errcheck failure: capture the
returned error, ignore it when it equals maps.ErrReadOnly (to keep current
behavior), and for any other non-nil error log it
(store.logger.Debug().Err(err).Msgf(...)) and skip/continue as appropriate;
ensure you import the errors package and reference maps.ErrReadOnly when
comparing.

Comment on lines 507 to 513
if loadedTemplateIDs.Has(tmpl.ID) {
store.logger.Debug().Msgf("Skipping duplicate template ID '%s' from path '%s'", tmpl.ID, tmpl.Path)
return
}

loadedTemplateIDs.Set(tmpl.ID, struct{}{})

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Make the duplicate-ID guard atomic; remove errcheck issue.

Switch to LoadOrStore so only the first goroutine proceeds; others return early. This also removes the errcheck warning on Set.

Apply:

-        if loadedTemplateIDs.Has(tmpl.ID) {
-            store.logger.Debug().Msgf("Skipping duplicate template ID '%s' from path '%s'", tmpl.ID, tmpl.Path)
-            return
-        }
-
-        loadedTemplateIDs.Set(tmpl.ID, struct{}{})
+        if _, loaded := loadedTemplateIDs.LoadOrStore(tmpl.ID, struct{}{}); loaded {
+            store.logger.Debug().Msgf("Skipping duplicate template ID '%s' from path '%s'", tmpl.ID, tmpl.Path)
+            return
+        }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
if loadedTemplateIDs.Has(tmpl.ID) {
store.logger.Debug().Msgf("Skipping duplicate template ID '%s' from path '%s'", tmpl.ID, tmpl.Path)
return
}
loadedTemplateIDs.Set(tmpl.ID, struct{}{})
// Only the first goroutine storing this tmpl.ID will proceed; others skip.
if _, loaded := loadedTemplateIDs.LoadOrStore(tmpl.ID, struct{}{}); loaded {
store.logger.Debug().Msgf("Skipping duplicate template ID '%s' from path '%s'", tmpl.ID, tmpl.Path)
return
}
🧰 Tools
🪛 GitHub Check: Lint

[failure] 512-512:
Error return value of loadedTemplateIDs.Set is not checked (errcheck)

🤖 Prompt for AI Agents
In pkg/catalog/loader/loader.go around lines 507 to 513, replace the non-atomic
Has/Set pair with a single LoadOrStore call so only the first goroutine for a
given tmpl.ID proceeds and others return early; specifically call
loadedTemplateIDs.LoadOrStore(tmpl.ID, struct{}{}) and if it reports the ID was
already present, log the duplicate and return, otherwise continue (no separate
Set call, which removes the errcheck warning).

@Mzack9999 Mzack9999 merged commit 1acd40f into dev Sep 12, 2025
23 checks passed
@Mzack9999 Mzack9999 deleted the 4690_dont_load_dup_templates branch September 12, 2025 09:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Prevent scans from scanning dupe templates

2 participants