Skip to content

feat: memory and performance improvement#6418

Closed
knakul853 wants to merge 4 commits intodevfrom
memory-improvement
Closed

feat: memory and performance improvement#6418
knakul853 wants to merge 4 commits intodevfrom
memory-improvement

Conversation

@knakul853
Copy link
Contributor

@knakul853 knakul853 commented Aug 22, 2025

Proposed changes

Checklist

  • Pull request is created against the dev branch
  • All checks passed (lint, unit/integration/regression tests etc.) with my changes
  • I have added tests that prove my fix is effective or that my feature works
  • I have added necessary documentation (if appropriate)

Summary by CodeRabbit

  • Performance Improvements

    • Faster expression and regex handling via caching, reducing repeated compilations during scans.
    • Lower memory and CPU usage in template parsing and caching through pooling and streamlined cache hits.
    • Improved concurrency and scalability with bounded parsing and thread-safe caches.
  • Bug Fixes

    • Prevented potential crash in YAML include preprocessing when no preceding newline exists.
    • More robust YAML handling with stricter decoding and safer alias/mapping processing.
  • Chores

    • Upgraded YAML processing to v3 across the app for better compatibility and consistency.

- Replace gopkg.in/yaml.v2 with gopkg.in/yaml.v3 across the codebase for better compatibility and features.
- Enhance YAML unmarshalling in various components to support new parsing capabilities.
- Introduce a new InsertionOrderedStringMap type for maintaining order in YAML mappings.
- Add tests for YAML preprocessing and decoding to ensure robustness and correctness.
@auto-assign auto-assign bot requested a review from dogancanbakir August 22, 2025 18:01
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Aug 22, 2025

Important

Review skipped

Draft detected.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Walkthrough

This PR upgrades YAML handling to yaml.v3 across the codebase, introduces caching for DSL expressions and regexes, refactors template caching/parsing with pooling and stricter YAML decoding, adds new YAML helpers for string slices, updates map unmarshalling to yaml.Node, and expands tests for caches, YAML preprocessing, decoding, and ordered maps.

Changes

Cohort / File(s) Summary
YAML v2→v3 import updates
cmd/integration-test/http.go, pkg/catalog/config/ignorefile.go, pkg/model/model_test.go, pkg/model/types/severity/severity_test.go, pkg/protocols/common/automaticscan/automaticscan.go, pkg/protocols/common/generators/generators_test.go, pkg/protocols/common/variables/variables_test.go, pkg/reporting/reporting.go, pkg/templates/compile.go, pkg/templates/templates.go, pkg/templates/templates_test.go, pkg/utils/yaml/yaml_decode_wrapper.go, go.mod
Switch imports from gopkg.in/yaml.v2 to v3; reclassify yaml.v2 as indirect in go.mod; no functional logic changes in these files.
StringSlice YAML helper + migration
pkg/model/types/stringslice/stringslice.go, pkg/model/types/stringslice/stringslice_raw.go, pkg/model/types/stringslice/yamlhelper.go, pkg/model/types/stringslice/yamlhelper_test.go
Add UnmarshalYAMLNode helper with StringNormalizer; migrate UnmarshalYAML methods to yaml.Node; support scalars/sequences/aliases; add comprehensive tests and benchmarks.
Fuzz YAML migration
pkg/fuzz/type.go
Migrate to yaml.v3 Node-based UnmarshalYAML; preserve ordered mappings; handle aliases; update method signature.
Insertion-ordered map YAML migration + tests
pkg/utils/insertion_ordered_map.go, pkg/utils/insertion_ordered_map_test.go
Change UnmarshalYAML to yaml.Node mapping parsing with validation; extensive new tests and benchmarks for YAML/JSON, ordering, and edge cases.
Expression cache (new) + consumers
pkg/protocols/common/exprcache/exprcache.go, pkg/protocols/common/exprcache/exprcache_test.go, pkg/operators/extractors/compile.go, pkg/operators/matchers/compile.go, pkg/operators/matchers/match.go, pkg/protocols/common/expressions/expressions.go, pkg/protocols/common/expressions/variables.go, pkg/templates/tag_filter.go
Add thread-safe govaluate expression caches (default/DSL) with helpers; replace direct compilation sites with cache-backed getters; add tests and benchmarks for cache; no public API changes in consumers.
Regex cache (new) + consumers
pkg/operators/common/regexcache/cache.go, pkg/operators/extractors/compile.go, pkg/operators/matchers/compile.go
Add thread-safe compiled regex cache and helpers; integrate into extractor/matcher compile paths.
Templates cache refactor
pkg/templates/cache.go
Add sync.Pool for parsedTemplate; adjust Has behavior (nil vs empty slice); add StoreWithoutRaw and Get; refactor Store to use pool.
Parser refactor: pooling/strict YAML
pkg/templates/parser.go
Add pools for buffers/readers/templates; cap concurrency via semaphore; move to yaml.v3 with strict KnownFields; preprocess YAML via pooled buffers; use StoreWithoutRaw; new helpers for decoding paths.
YAML preprocess safety + tests
pkg/utils/yaml/preprocess.go, pkg/utils/yaml/preprocess_test.go
Safeguard padBytes computation and padding application; add comprehensive tests and benchmarks for include processing and strict mode.
YAML decode wrapper tests
pkg/utils/yaml/yaml_decode_wrapper_test.go
Add extensive tests and benchmarks for DecodeAndValidate across validations, readers, concurrency, and sizes.

Sequence Diagram(s)

sequenceDiagram
  participant C as Caller
  participant EC as exprcache (DSLCache)
  participant GV as govaluate

  C->>EC: GetCompiledDSLExpression(expr)
  alt found in cache
    EC-->>C: *EvaluableExpression
  else compile
    EC->>GV: NewEvaluableExpressionWithFunctions(expr, dslFns)
    GV-->>EC: *EvaluableExpression | error
    alt success
      EC->>EC: write-lock + store
      EC-->>C: *EvaluableExpression
    else error
      EC-->>C: error
    end
  end
  C->>GV: Evaluate(values)
  GV-->>C: result | error
Loading
sequenceDiagram
  participant E as Extractor/Matcher
  participant RC as regexcache
  participant RE as regexp

  E->>RC: GetCompiledRegex(pattern)
  alt cached
    RC-->>E: *Regexp
  else compile
    RC->>RE: Compile(pattern)
    RE-->>RC: *Regexp | error
    alt success
      RC->>RC: write-lock + store
      RC-->>E: *Regexp
    else error
      RC-->>E: error
    end
  end
Loading
sequenceDiagram
  participant P as Parser
  participant TP as templatePool
  participant YU as yamlutil.PreProcess
  participant YD as yaml.v3 Decoder
  participant TC as Templates Cache

  P->>TP: Get *Template
  note over P: Acquire parsing semaphore
  P->>YU: PreProcess(reader)
  alt preprocessed data
    P->>YD: decodeYAMLFromData(data, tmpl)
  else direct read
    P->>YD: decodeYAMLFromReader(reader, tmpl)
  end
  alt success
    P->>TC: StoreWithoutRaw(id, tmpl, err=nil)
    P-->>TP: Return tmpl to pool on next reuse
    P-->>Caller: *Template
  else error
    P->>TP: Reset + Put tmpl
    P-->>Caller: error
  end
  note over P: Release semaphore
Loading

Estimated code review effort

🎯 5 (Critical) | ⏱️ ~120 minutes

Possibly related PRs

Suggested reviewers

  • Mzack9999
  • dwisiswant0
  • tarunKoyalwar

Poem

A nibble of YAML, a cache full of cheer,
I hop through nodes where mappings appear.
Regex and DSL, now swift as the breeze,
Templates pooled gently, reusing with ease.
With whiskers a-twitch and tests in a row,
This bunny ships bytes—onward we go! 🐇✨

✨ Finishing Touches
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch memory-improvement

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

CodeRabbit Commands (Invoked using PR/Issue comments)

Type @coderabbitai help to get the list of available commands.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Status, Documentation and Community

  • Visit our Status Page to check the current availability of CodeRabbit.
  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

@knakul853 knakul853 marked this pull request as draft August 22, 2025 18:07
@ehsandeep ehsandeep removed the request for review from dogancanbakir August 22, 2025 18:07
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 13

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (4)
pkg/protocols/common/generators/generators_test.go (1)

131-135: Fix panic-prone type assertion in TestParsePayloadsWithAggression after yaml.v3 migration

The YAML decoder from gopkg.in/yaml.v3 now produces nested maps of type map[string]interface{}, so the current assertion to map[interface{}]interface{} will always panic at runtime in this test.

Please apply the following changes:

• In pkg/protocols/common/generators/generators_test.go (around lines 131–135 in TestParsePayloadsWithAggression):

-    parsed, err := parsePayloadsWithAggression(k, v.(map[interface{}]interface{}), aggression)
-    require.Nil(t, err, "could not parse payloads with aggression")
+    inner, ok := v.(map[string]interface{})
+    require.True(t, ok, "unexpected payload type for %q: %T", k, v)
+    parsed, err := parsePayloadsWithAggression(k, inner, aggression)
+    require.NoError(t, err, "could not parse payloads with aggression")

• In pkg/protocols/common/generators/generators.go (line 86): update the function signature to accept string-keyed maps:

- func parsePayloadsWithAggression(name string, v map[interface{}]interface{}, aggression string) (map[string]interface{}, error) {
+ func parsePayloadsWithAggression(name string, v map[string]interface{}, aggression string) (map[string]interface{}, error) {

and adjust its internals (or introduce a small adapter) to work with map[string]interface{} instead of map[interface{}]interface{}.

These fixes will align both the tests and implementation with the behavior of yaml.v3, preventing the panic and ensuring correct parsing.

pkg/operators/matchers/match.go (1)

185-189: Prevent unbounded memory growth in DSLCache for per‐request expressions

The global DSLCache used by GetCompiledDSLExpression is backed by a simple map[string]*govaluate.EvaluableExpression with no size limit or eviction policy (see pkg/protocols/common/exprcache/exprcache.go:16–19). Calling it on every unique resolvedExpression—which can vary per target or request—will steadily bloat memory over long‐running scans.

Locations to address:

  • pkg/operators/matchers/match.go (around lines 185–189):
    			expression, err = exprcache.GetCompiledDSLExpression(resolvedExpression)
    			if err != nil {
    				logExpressionEvaluationFailure(matcher.Name, err)
    				return false
    			}

Recommended refactors (pick one):

  1. Compile without global cache
    Add a “no-cache” helper in exprcache, e.g.:

    func CompileDSLExpressionNoCache(expr string) (*govaluate.EvaluableExpression, error) {
        return govaluate.NewEvaluableExpressionWithFunctions(expr, HelperFunctions)
    }

    Then update the call site:

    - expression, err = exprcache.GetCompiledDSLExpression(resolvedExpression)
    + expression, err = exprcache.CompileDSLExpressionNoCache(resolvedExpression)
  2. Bounded in‐function LRU for the scan
    If you still want caching across this matcher, embed a small LRU (e.g. 128 entries) scoped to the scan:

    // at start of scan:
    lru := lru.New(128)
    …
    // replace compile call:
    expression, err = lru.GetOrAdd(resolvedExpression, func() (*govaluate.EvaluableExpression, error) {
        return exprcache.GetCompiledDSLExpression(resolvedExpression)
    })

Addressing this will prevent the cache from growing without bound. Let me know which approach you’d like to adopt.

pkg/templates/tag_filter.go (1)

432-437: Bug: ExcludeIds population checks the wrong map (uses block tags map).

This mistakenly gates excludeIds insertion on filter.block membership rather than filter.excludeIds, causing unexpected coupling between tag blocking and id exclusion.

Apply this diff:

-       for _, val := range splitCommaTrim(id) {
-           if _, ok := filter.block[val]; !ok {
-               filter.excludeIds[val] = struct{}{}
-           }
-       }
+       for _, val := range splitCommaTrim(id) {
+           if _, ok := filter.excludeIds[val]; !ok {
+               filter.excludeIds[val] = struct{}{}
+           }
+       }
pkg/templates/compile.go (1)

409-425: Use srcOptions.TemplatePath for format detection and unify YAML decoding with KnownFields(strict).

The switch currently inspects template.Path, which is unset at this stage. While YAML v3 can parse JSON, using the actual source path is more correct and avoids edge cases. Also, to align with the stricter yaml.v3 migration used in parser.go, consider decoding via yaml.Decoder with KnownFields toggled by srcOptions.Options.Validate (or a similar strictness flag).

- switch config.GetTemplateFormatFromExt(template.Path) {
+ switch config.GetTemplateFormatFromExt(srcOptions.TemplatePath) {
 case config.JSON:
   err = json.Unmarshal(data, template)
 case config.YAML:
-  err = yaml.Unmarshal(data, template)
+  {
+    dec := yaml.NewDecoder(bytes.NewReader(data))
+    // Honor validation/strictness if enabled
+    if srcOptions != nil && srcOptions.Options != nil && srcOptions.Options.Validate {
+      dec.KnownFields(true)
+    }
+    err = dec.Decode(template)
+  }
 default:
   // assume its yaml
-  if err = yaml.Unmarshal(data, template); err != nil {
+  {
+    dec := yaml.NewDecoder(bytes.NewReader(data))
+    if srcOptions != nil && srcOptions.Options != nil && srcOptions.Options.Validate {
+      dec.KnownFields(true)
+    }
+    if err = dec.Decode(template); err != nil {
       return nil, err
-  }
+    }
+  }
 }

This keeps memory usage unchanged while aligning behavior with the new yaml.v3 strict decoding path.

🧹 Nitpick comments (52)
pkg/protocols/common/generators/generators_test.go (3)

108-111: Enable strict YAML decoding to catch unknown fields early.

Since the PR aims for stricter YAML handling, toggle KnownFields on the v3 decoder in the test as well.

-  err := yaml.NewDecoder(strings.NewReader(testPayload)).Decode(&payloads)
-  require.Nil(t, err, "could not unmarshal yaml")
+  dec := yaml.NewDecoder(strings.NewReader(testPayload))
+  dec.KnownFields(true)
+  err := dec.Decode(&payloads)
+  require.NoError(t, err, "could not unmarshal yaml")

19-19: Use require.NoError for error checks (clearer intent).

Switch from require.Nil(err) to require.NoError(err) for readability and better failure messages.

- require.Nil(t, err, "could not create generator")
+ require.NoError(t, err, "could not create generator")

Apply similarly at Lines 39, 61, 110, and 134.

Also applies to: 39-39, 61-61, 110-110, 134-134


136-139: Strengthen assertions: verify content, not just length.

Length-only checks can miss ordering/content regressions. Also prefer require.Len for clarity.

-      gotValues := parsed[k].([]interface{})
-      require.Equal(t, len(values), len(gotValues), "could not get correct number of values")
+      gotValues := parsed[k].([]interface{})
+      require.Len(t, gotValues, len(values), "could not get correct number of values")
+      // Optional: verify actual values (cast to []string first)
+      strVals := make([]string, 0, len(gotValues))
+      for _, iv := range gotValues {
+        strVals = append(strVals, iv.(string))
+      }
+      for _, exp := range values {
+        require.Contains(t, strVals, exp, "missing expected value for aggression=%s", aggression)
+      }
pkg/model/types/severity/severity_test.go (1)

19-19: Prefer require.NoError for error checks in tests

Use require.NoError for clarity and better failure messages in tests.

-	require.Nil(t, err, "could not marshal yaml")
+	require.NoError(t, err, "could not marshal yaml")
pkg/model/model_test.go (3)

31-36: Use NoError to assert on error values

Nit: prefer require.NoError for readability.

-	require.Nil(t, err)
+	require.NoError(t, err)

55-77: Brittle YAML string equality in tests

Asserting exact YAML bytes can be brittle across emitter versions/whitespace changes. Consider structural comparison by unmarshalling both expected and actual and comparing maps/structs.

Example refactor:

-	result, err := yaml.Marshal(&info)
-	require.Nil(t, err)
-
-	expected := `name: Test Template Name
-... (omitted for brevity) ...
-  string_key: string_value
-`
-	require.Equal(t, expected, string(result))
+	result, err := yaml.Marshal(&info)
+	require.NoError(t, err)
+
+	var got map[string]any
+	require.NoError(t, yaml.Unmarshal(result, &got))
+
+	const expectedYAML = `name: Test Template Name
+author:
+- forgedhallpass
+- ice3man
+tags:
+- cve
+- misc
+description: Test description
+reference: Reference1
+severity: high
+metadata:
+  array_key:
+  - array_value1
+  - array_value2
+  map_key:
+    key1: val1
+  string_key: string_value
+`
+	var expected map[string]any
+	require.NoError(t, yaml.Unmarshal([]byte(expectedYAML), &expected))
+	require.Equal(t, expected, got)

55-56: Also switch to NoError here

Mirror the earlier nit at this call site.

-	require.Nil(t, err)
+	require.NoError(t, err)
pkg/utils/yaml/yaml_decode_wrapper.go (1)

9-10: Enable KnownFields on yaml.v3 decoder to catch unknown config keys.

With yaml.v3 we can reject unknown fields to avoid silent misconfigurations. This is low risk, and you can make it toggleable if needed.

 import (
   "io"
   "strings"

   "github.com/go-playground/validator/v10"
   "github.com/pkg/errors"
-  "gopkg.in/yaml.v3"
+  "gopkg.in/yaml.v3"
 )
@@
 func DecodeAndValidate(r io.Reader, v interface{}) error {
-  if err := yaml.NewDecoder(r).Decode(v); err != nil {
+  dec := yaml.NewDecoder(r)
+  dec.KnownFields(true)
+  if err := dec.Decode(v); err != nil {
     return err
   }

Also applies to: 16-17

pkg/protocols/common/automaticscan/automaticscan.go (1)

37-37: yaml.v3 import: consider strict decoding and log parse errors for the mapping file.

We currently ignore decode errors for the Wappalyzer mapping. Enabling KnownFields helps catch typos; logging the error aids diagnosis and doesn’t change control flow.

-  if file, err := os.Open(mappingFile); err == nil {
-    _ = yaml.NewDecoder(file).Decode(&mappingData)
-    _ = file.Close()
-  }
+  if file, err := os.Open(mappingFile); err == nil {
+    dec := yaml.NewDecoder(file)
+    dec.KnownFields(true)
+    if derr := dec.Decode(&mappingData); derr != nil {
+      gologger.Verbose().Msgf("Failed to parse %s: %v", mappingFile, derr)
+    }
+    _ = file.Close()
+  }

Also applies to: 75-81

pkg/catalog/config/ignorefile.go (1)

8-9: Harden YAML parsing: KnownFields and graceful handling of empty files (EOF).

Unknown keys in nuclei-ignore can be silently ignored today. Also, empty files yield io.EOF which we treat as a parse error. Recommend strict fields and treating EOF as empty config.

 import (
   "os"
   "runtime/debug"

   "github.com/projectdiscovery/gologger"
   "gopkg.in/yaml.v3"
+  "io"
+  "errors"
 )
@@
   ignore := IgnoreFile{}
-  if err := yaml.NewDecoder(file).Decode(&ignore); err != nil {
-    gologger.Error().Msgf("Could not parse nuclei-ignore file: %s\n", err)
-    return IgnoreFile{}
-  }
+  dec := yaml.NewDecoder(file)
+  dec.KnownFields(true)
+  if err := dec.Decode(&ignore); err != nil {
+    if errors.Is(err, io.EOF) {
+      return IgnoreFile{}
+    }
+    gologger.Error().Msgf("Could not parse nuclei-ignore file: %s\n", err)
+    return IgnoreFile{}
+  }

Also applies to: 28-33

pkg/utils/yaml/preprocess.go (2)

46-52: Good fix: avoid negative slice; but capture only indentation bytes.

The guard prevents panics when no prior newline exists—nice. However, padBytes now includes the newline itself (starting at lastNewLineIndex), which is intentional for subsequent replacement, but it also mixes concerns (newline + indent). Consider splitting into (newline) and (indent) to make downstream logic clearer and to enable clean first-line padding.

-    var padBytes []byte
-    if lastNewLineIndex >= 0 {
-      padBytes = data[lastNewLineIndex:matchIndex]
-    } else {
-      padBytes = data[:matchIndex]
-    }
+    var padBytes []byte
+    if lastNewLineIndex >= 0 {
+      // keep only indentation (exclude the newline)
+      padBytes = data[lastNewLineIndex+1 : matchIndex]
+    } else {
+      padBytes = data[:matchIndex]
+    }

70-72: Also indent the first line of the included content (optional).

Current logic indents lines after the first by replacing “\n” with “\n”. The first line remains unindented, which can break YAML blocks when “# !include:” appears under indentation. Prefix the first line with the indent as well.

-      if len(padBytes) > 0 && !bytes.Equal(padBytes, []byte("\n")) {
-        includeFileContent = bytes.ReplaceAll(includeFileContent, []byte("\n"), padBytes)
-      }
+      if len(padBytes) > 0 {
+        // indent subsequent lines by replacing newline with newline+indent
+        includeFileContent = bytes.ReplaceAll(includeFileContent, []byte("\n"), append([]byte("\n"), padBytes...))
+        // and indent the first line too
+        includeFileContent = append(padBytes, includeFileContent...)
+      }

If CRLF normalization matters, consider normalizing to LF before replacement to avoid mixed line endings.

pkg/model/types/stringslice/yamlhelper.go (4)

29-38: Default-trim items to avoid surprising whitespace; normalize after trimming.

Without a normalizer, a scalar like "a, b ,c" yields ["a", " b ", "c"]. Trim by default, then apply the optional normalizer for consistent behavior across scalar and sequence forms.

-		result := strings.Split(v, ",")
-		if normalizer != nil {
-			for i, value := range result {
-				result[i] = normalizer.Normalize(value)
-			}
-		}
+		result := strings.Split(v, ",")
+		for i := range result {
+			token := strings.TrimSpace(result[i])
+			if normalizer != nil {
+				token = normalizer.Normalize(token)
+			}
+			result[i] = token
+		}
 		return result, nil
@@
-			if normalizer != nil {
-				v = normalizer.Normalize(v)
-			}
+			v = strings.TrimSpace(v)
+			if normalizer != nil {
+				v = normalizer.Normalize(v)
+			}
 			out = append(out, v)

Also applies to: 41-52


15-18: Defensive checks for nil nodes and alias targets.

Node should be non-nil, but a guard prevents panics if callers misuse this helper. Also ensure alias targets are non-nil.

 func UnmarshalYAMLNode(node *yaml.Node, normalizer StringNormalizer) ([]string, error) {
+	if node == nil {
+		return nil, fmt.Errorf("stringslice: nil YAML node")
+	}
 	if node.Kind == yaml.DocumentNode && len(node.Content) > 0 {
 		node = node.Content[0]
 	}
@@
 	case yaml.AliasNode:
-		return UnmarshalYAMLNode(node.Alias, normalizer)
+		if node.Alias == nil {
+			return nil, fmt.Errorf("stringslice: invalid alias with nil target")
+		}
+		return UnmarshalYAMLNode(node.Alias, normalizer)

Also applies to: 54-55


60-62: More helpful error details for unsupported kinds.

Printing Kind as an int is opaque. Include kind and tag to aid debugging.

-		return nil, fmt.Errorf("stringslice: expected string or sequence, got %v", node.Kind)
+		return nil, fmt.Errorf("stringslice: expected scalar or sequence, got kind=%d tag=%q", node.Kind, node.Tag)

22-24: Confirm null semantics (error vs. empty slice).

You error out on !!null (scalar) and also when Kind==0. Confirm callers rely on hard-fail rather than treating null as empty slice for backward compatibility. I can align all string-slice unmarshallers accordingly if needed.

Also applies to: 57-58

pkg/operators/common/regexcache/cache.go (2)

15-33: Avoid duplicate compilations under contention; consider singleflight or a double-check.

Two goroutines can compile the same pattern concurrently, wasting CPU. Add a second check under the write lock, or integrate singleflight for correctness + throughput.

 func GetCompiledRegex(pattern string) (*regexp.Regexp, error) {
 	regexCacheLock.RLock()
 	if compiled, exists := regexCache[pattern]; exists {
 		regexCacheLock.RUnlock()
 		return compiled, nil
 	}
 	regexCacheLock.RUnlock()

-	compiled, err := regexp.Compile(pattern)
-	if err != nil {
-		return nil, err
-	}
-
-	regexCacheLock.Lock()
-	regexCache[pattern] = compiled
-	regexCacheLock.Unlock()
+	compiled, err := regexp.Compile(pattern)
+	if err != nil {
+		return nil, err
+	}
+	regexCacheLock.Lock()
+	if existing, ok := regexCache[pattern]; ok {
+		regexCacheLock.Unlock()
+		return existing, nil
+	}
+	regexCache[pattern] = compiled
+	regexCacheLock.Unlock()

 	return compiled, nil
 }

If you prefer zero duplicate compile without holding the lock during compilation, I can wire golang.org/x/sync/singleflight.


35-47: Unbounded cache growth; add eviction and observability.

Patterns are user-controlled in some contexts. Consider an LRU with a sane cap and simple metrics (hit/miss, size) to prevent memory bloat and to validate perf wins.

I can drop in a tiny LRU using hashicorp/golang-lru or a lock-free sync.Map+size cap depending on your constraints.

pkg/utils/insertion_ordered_map.go (1)

121-134: Set/ForEach behavior depends on keys/values coherence.

With the reset fix applied, Set/ForEach remain correct. Without it, ForEach may fetch nil values for phantom keys. Please add a focused test covering reuse of InsertionOrderedStringMap to lock this invariant.

I can add a unit test demonstrating the bug and verifying the fix.

pkg/utils/yaml/yaml_decode_wrapper_test.go (1)

183-188: Adjust the “invalid validation” test to actually trigger InvalidValidationError.

Currently YAML decode fails before validation because target is a non-pointer string. Decode into a string pointer so YAML succeeds and validator sees a non-struct.

-		{
-			name:        "invalid validation error test",
-			yamlInput:   `name: "test"`,
-			target:      "invalid_target", // This should cause InvalidValidationError
-			expectError: true,
-			errorType:   "invalid_validation",
-		},
+		{
+			name:        "invalid validation error test",
+			yamlInput:   `"hello"`,
+			target:      new(string), // YAML decodes fine; validator gets non-struct
+			expectError: true,
+			errorType:   "invalid_validation",
+		},
pkg/utils/yaml/preprocess_test.go (3)

262-271: Restore StrictSyntax after mutation in this test, too.

Unlike TestPreProcess, this test doesn’t save/restore StrictSyntax; although tests are not parallel, restoring state is safer and self-contained.

@@
-      StrictSyntax = false
-      result, err := PreProcess([]byte(input))
+      original := StrictSyntax
+      StrictSyntax = false
+      defer func() { StrictSyntax = original }()
+      result, err := PreProcess([]byte(input))

194-207: Rename “large recursive includes” to “large include file” (not recursive).

This test creates a single large file and includes it once; the name suggests recursion that isn’t present.

-      name: "large recursive includes",
+      name: "large include file",

381-400: Benchmarks: add b.ReportAllocs and document StrictSyntax state.

Report allocations to quantify memory improvements, and set StrictSyntax explicitly to avoid accidental perf skew from global changes.

@@ func BenchmarkPreProcess_NoIncludes(b *testing.B) {
-  b.ResetTimer()
+  StrictSyntax = false
+  b.ReportAllocs()
+  b.ResetTimer()
@@ func BenchmarkPreProcess_WithIncludes(b *testing.B) {
-  b.ResetTimer()
+  StrictSyntax = false
+  b.ReportAllocs()
+  b.ResetTimer()
@@ func BenchmarkPreProcess_MultipleIncludes(b *testing.B) {
-  b.ResetTimer()
+  StrictSyntax = false
+  b.ReportAllocs()
+  b.ResetTimer()
@@ func BenchmarkPreProcess_NestedIncludes(b *testing.B) {
-  b.ResetTimer()
+  StrictSyntax = false
+  b.ReportAllocs()
+  b.ResetTimer()

Also applies to: 423-430, 454-461, 508-515

pkg/utils/insertion_ordered_map_test.go (5)

14-68: YAML mixed-type expectations are tightly coupled to stringification semantics.

The test asserts that non-string scalars become strings (e.g., 42 → "42"). If the implementation evolves to preserve native types under yaml.v3, these assertions will fail. Consider clarifying the contract in InsertionOrderedStringMap’s UnmarshalYAML doc or add a separate test that explicitly validates the intended “all scalars become strings” rule.


136-176: JSON tests: expectedKeys is declared but never asserted.

Either assert iteration order (if your UnmarshalJSON preserves order) or drop expectedKeys from the JSON-table to avoid confusion.

@@
-   expectedKeys   []string
    expectedValues map[string]interface{}
@@
-      expectedKeys:   []string{"b", "a", "c"}, // JSON doesn't preserve order, so we get map iteration order
       expectedValues: map[string]interface{}{"a": "first", "b": "second", "c": "third"},

Also applies to: 188-198


262-282: float32 stringification can be flaky; prefer stable formatting in tests.

fmt.Sprint(float32(3.14)) is usually "3.14", but binary rounding can surface on some platforms. To avoid flakes, compare numerically or format with %.2f if you need exact decimals.

-    {"float32", float32(3.14), "3.14"},
+    {"float32", float32(3.14), "3.14"}, // consider numeric compare if flakes show up

If you want to harden now:

-    {"float32", float32(3.14), "3.14"},
+    {"float32", float32(3.14), fmt.Sprintf("%.2f", float32(3.14))},

(then adapt toString test to use fmt.Sprintf similarly)


440-449: Test name/comment mismatch: this YAML isn’t malformed.

The case named “yaml with malformed structure” uses a valid simple mapping. Rename to avoid confusion.

- t.Run("yaml with malformed structure", func(t *testing.T) {
+ t.Run("yaml simple mapping (parser forgivingness)", func(t *testing.T) {
@@
-   assert.NoError(t, err) // YAML parser is forgiving
+   assert.NoError(t, err)

370-379: Benchmarks: add b.ReportAllocs to capture memory impact.

Given the PR’s performance/memory goals, tracking allocs here is useful.

@@ func BenchmarkInsertionOrderedStringMap_Set(b *testing.B) {
-  b.ResetTimer()
+  b.ReportAllocs()
+  b.ResetTimer()
@@ func BenchmarkInsertionOrderedStringMap_UnmarshalJSON(b *testing.B) {
-  b.ResetTimer()
+  b.ReportAllocs()
+  b.ResetTimer()

Also applies to: 427-438

pkg/protocols/common/exprcache/exprcache.go (3)

69-77: Unbounded cache can grow without limit; consider a size cap or LRU.

If expressions are template/user-driven, the map can grow indefinitely. Consider an LRU (e.g., simple ring or groupcache/lru) or an optional MaxSize with random/oldest eviction to bound memory.


16-27: Minor: document constructors and clarify functions binding immutability.

Add doc comments describing thread-safety, typical usage, and that the functions map is captured at construction time.


57-66: Clear() recreates the map—good. Consider returning previous size for observability.

A return value can help callers log/measure effect when clearing hot caches.

-func (ec *ExpressionCache) Clear() {
+func (ec *ExpressionCache) Clear() int {
   ec.cacheLock.Lock()
-  ec.cache = make(map[string]*govaluate.EvaluableExpression)
+  prev := len(ec.cache)
+  ec.cache = make(map[string]*govaluate.EvaluableExpression)
   ec.cacheLock.Unlock()
+  return prev
}

(Adjust call sites/tests accordingly.)

pkg/protocols/common/exprcache/exprcache_test.go (3)

108-129: Concurrency test doesn’t detect duplicate compilations.

The current test ensures safety and single entry, but it can’t detect wasted duplicate compilations during races. After applying double-checked locking, consider instrumenting compilation counts by injecting a compile hook (e.g., an optional compileFn in ExpressionCache) and assert that only one compilation occurred under parallel Get calls.

I can propose a minimal refactor to add a test-only compile hook if you want to validate the deduction.


165-176: Benchmarks: include allocation metrics.

Add b.ReportAllocs() to quantify memory impact of caching.

@@ func BenchmarkExpressionCache_Get(b *testing.B) {
-  b.ResetTimer()
+  b.ReportAllocs()
+  b.ResetTimer()
@@ func BenchmarkExpressionCache_GetParallel(b *testing.B) {
-  b.ResetTimer()
+  b.ReportAllocs()
+  b.ResetTimer()

Also applies to: 178-191


131-151: Global caches smoke tests look good. Consider a DSL function assertion if stable.

If dsl.HelperFunctions contains stable built-ins, add one small assertion using a known DSL function to ensure functions binding works (optional; only if the function set is stable across versions).

pkg/templates/templates_test.go (1)

12-36: Optional: run this test in parallel.

This test is read-only and doesn’t mutate globals; you can add t.Parallel() to speed up the suite.

 func TestTemplateStruct(t *testing.T) {
+  t.Parallel()
pkg/operators/matchers/compile.go (1)

46-49: Preserve the underlying compile error for diagnosability

Returning only the pattern loses the root cause. Wrap the error so operators get actionable details.

Apply this diff:

-        compiled, err := regexcache.GetCompiledRegex(regex)
-        if err != nil {
-            return fmt.Errorf("could not compile regex: %s", regex)
-        }
+        compiled, err := regexcache.GetCompiledRegex(regex)
+        if err != nil {
+            return fmt.Errorf("could not compile regex %q: %w", regex, err)
+        }
pkg/protocols/common/expressions/variables.go (1)

118-126: Edge-case: treat nil compiled expressions as non-literal

If GetCompiledDSLExpression returns (nil, nil), the current logic returns true, potentially skipping unresolved-variable detection incorrectly. Safer default is false.

Apply this diff:

-    expr, err := exprcache.GetCompiledDSLExpression(data)
-    if err != nil {
-        return false
-    }
-    if expr != nil {
-        _, err = expr.Evaluate(nil)
-        return err == nil
-    }
-    return true
+    expr, err := exprcache.GetCompiledDSLExpression(data)
+    if err != nil || expr == nil {
+        return false
+    }
+    _, err = expr.Evaluate(nil)
+    return err == nil

If other callers depend on the previous “nil means true” behavior, let’s document and codify it instead. Otherwise, this change reduces false negatives.

pkg/model/types/stringslice/yamlhelper_test.go (1)

1-200: Solid, thorough coverage of YAML node-based unmarshalling

Tests exercise scalar, sequence, aliasing, normalization, and error cases. Benchmarks are a nice touch for tracking perf of the new helpers.

Two small gaps you might consider:

  • Add an alias case with normalization enabled to ensure alias targets go through the normalizer.
  • Add a case with trailing/leading commas in scalar input (e.g., ",one,two,"), asserting trimming/normalization behavior is as intended.
pkg/templates/tag_filter.go (3)

448-453: Wrap include-condition compilation error with context.

Returning the raw error makes it harder to pinpoint which condition failed. Wrap with the offending condition and preserve the cause.

Apply this diff within the function and add the import:

@@
-       compiled, err := exprcache.GetCompiledDSLExpression(includeCondition)
-       if err != nil {
-           return nil, err
-       }
+       compiled, err := exprcache.GetCompiledDSLExpression(includeCondition)
+       if err != nil {
+           return nil, fmt.Errorf("invalid include-condition %q: %w", includeCondition, err)
+       }
        filter.includeConditions[includeCondition] = compiled

Outside the selected range, add:

 import (
+    "fmt"
     "bufio"
     "errors"
     "io"

330-335: Comment vs. behavior mismatch on condition-evaluation errors.

The comment says “skip” on evaluation error, but the code returns false, rejecting the template. Either fix the comment or change logic to continue as intended.

If “skip” is correct, use:

-       if err != nil {
-           // Using debug as the failure here might be legitimate (eg. template not having optional metadata fields => missing required fields)
-           gologger.Debug().Msgf("The expression condition couldn't be evaluated correctly for template \"%s\": %s\n", template.ID, err)
-           return false
-       }
+       if err != nil {
+           // Using debug as the failure here might be legitimate (e.g., optional metadata missing)
+           gologger.Debug().Msgf("The expression condition couldn't be evaluated correctly for template %q: %v", template.ID, err)
+           continue
+       }

If rejecting is desired, update the comment to reflect that includeConditions must all evaluate successfully.


231-235: Close raw HTTP request bodies after reading.

Minor resource hygiene: ensure Body is closed after io.ReadAll to avoid leaks in non-trivial readers.

-                   body, _ := io.ReadAll(rawHttpReq.Body)
+                   body, _ := io.ReadAll(rawHttpReq.Body)
+                   _ = rawHttpReq.Body.Close()
                    bodies = append(bodies, string(body))
pkg/model/types/stringslice/stringslice_raw.go (1)

17-23: Node-based UnmarshalYAML is clean; confirm null handling compatibility.

The new UnmarshalYAMLNode returns an error on nulls (“stringslice: null values not supported”). Previous behavior may have accepted null/empty as empty slice. Please confirm this won’t break existing templates and update docs/tests accordingly if this is an intentional tightening.

If you want to preserve prior semantics (treat null as empty), consider:

 func (rawStringSlice *RawStringSlice) UnmarshalYAML(node *yaml.Node) error {
-    result, err := UnmarshalYAMLNode(node, rawStringSlice)
+    if node == nil || node.Kind == 0 || node.Tag == "!!null" {
+        rawStringSlice.Value = []string{}
+        return nil
+    }
+    result, err := UnmarshalYAMLNode(node, rawStringSlice)
     if err != nil {
         return err
     }
     rawStringSlice.Value = result
     return nil
 }
pkg/operators/extractors/compile.go (1)

23-28: Preserve underlying regex compilation error for diagnosability.

Wrap the original error; current message omits the cause.

-        compiled, err := regexcache.GetCompiledRegex(regex)
-        if err != nil {
-            return fmt.Errorf("could not compile regex: %s", regex)
-        }
+        compiled, err := regexcache.GetCompiledRegex(regex)
+        if err != nil {
+            return fmt.Errorf("could not compile regex %q: %w", regex, err)
+        }
pkg/model/types/stringslice/stringslice.go (1)

115-139: Remove unused helper marshalStringToSlice

This function is only defined in pkg/model/types/stringslice/stringslice.go (lines 115–139) and is never invoked elsewhere in the repo. You can safely delete it to eliminate dead code.

• File: pkg/model/types/stringslice/stringslice.go
– Delete the marshalStringToSlice function (lines 115–139)

pkg/protocols/common/expressions/expressions.go (2)

13-16: Remove commented-out helper.

Leftover commented code should be dropped to keep the file clean.

-// func GetCompiledDSLExpression(expression string) (*govaluate.EvaluableExpression, error) {
-//  return exprcache.GetCompiledDSLExpression(expression)
-// }

133-143: isExpression may pollute caches during detection; consider a non-caching parse.

The detection path attempts compilation and may cache many transient strings during scanning, increasing memory pressure. If feasible, consider a “peek” parse that doesn’t populate the cache (or a short-lived/size-bounded cache) for detection.

pkg/templates/compile.go (3)

66-71: Prefer current filePath for TemplatePath on cache-hit.

You restore TemplatePath from the cached value. For safety and clearer logging (symlinks, relative vs absolute paths), set it to the current filePath.

-      template.Options.TemplatePath = value.Options.TemplatePath
+      template.Options.TemplatePath = filePath

If needed, also ensure template.Path is consistent when returning the cached instance.


106-106: Remove leftover commented log.

Dead/commented code adds noise.

-      // options.Logger.Error().Msgf("returning cached template %s after recompiling %d requests", tplCopy.Options.TemplateID, tplCopy.Requests())

175-186: Nil-safety for Global Matchers options.

Minor: isGlobalMatchersEnabled dereferences template.Options.Options without a nil check. Realistically, Options should be set, but if a future call path changes this assumption, it would panic.

-  if !template.Options.Options.EnableGlobalMatchersTemplates {
+  if template.Options == nil || template.Options.Options == nil || !template.Options.Options.EnableGlobalMatchersTemplates {
     return false
   }
pkg/templates/cache.go (2)

52-67: Unsafe zero-copy conversion on pooled buffers can corrupt cached raw.

conversion.String(raw) is zero-copy. If raw comes from a reusable buffer (e.g., sync.Pool), later writes can mutate the cached string. You currently use StoreWithoutRaw in the parser path, which avoids this, but please document this contract or defensively copy when the source is from a pool.

- if raw != nil {
-   entry.raw = conversion.String(raw)
+ if raw != nil {
+   // If raw originates from a pooled/temporary buffer, copy it to avoid aliasing.
+   // entry.raw = conversion.String(raw) // zero-copy; use only with immutable backing
+   entry.raw = string(append([]byte(nil), raw...))

Alternatively, keep zero-copy but add a comment stating the precondition: raw must not be reused/mutated.


89-93: Update comment: Purge doesn’t “return all objects to pool.”

Purge clears the map; pooled items are handled by GC. Fix the doc to avoid confusion.

-// Purge the cache and return all objects to pool
+// Purge clears the cache; pooled objects are not explicitly reclaimed.
pkg/templates/parser.go (1)

171-179: Bounded parsing concurrency is a sensible default.

The parsingSemaphore guard (size 10) is a pragmatic limit. Consider making it configurable in the future if workloads vary widely, but this is fine for now.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 00f4595 and 9402d78.

📒 Files selected for processing (34)
  • cmd/integration-test/http.go (1 hunks)
  • go.mod (1 hunks)
  • pkg/catalog/config/ignorefile.go (1 hunks)
  • pkg/fuzz/type.go (2 hunks)
  • pkg/model/model_test.go (1 hunks)
  • pkg/model/types/severity/severity_test.go (1 hunks)
  • pkg/model/types/stringslice/stringslice.go (2 hunks)
  • pkg/model/types/stringslice/stringslice_raw.go (2 hunks)
  • pkg/model/types/stringslice/yamlhelper.go (1 hunks)
  • pkg/model/types/stringslice/yamlhelper_test.go (1 hunks)
  • pkg/operators/common/regexcache/cache.go (1 hunks)
  • pkg/operators/extractors/compile.go (3 hunks)
  • pkg/operators/matchers/compile.go (3 hunks)
  • pkg/operators/matchers/match.go (2 hunks)
  • pkg/protocols/common/automaticscan/automaticscan.go (1 hunks)
  • pkg/protocols/common/exprcache/exprcache.go (1 hunks)
  • pkg/protocols/common/exprcache/exprcache_test.go (1 hunks)
  • pkg/protocols/common/expressions/expressions.go (3 hunks)
  • pkg/protocols/common/expressions/variables.go (2 hunks)
  • pkg/protocols/common/generators/generators_test.go (1 hunks)
  • pkg/protocols/common/variables/variables_test.go (1 hunks)
  • pkg/reporting/reporting.go (1 hunks)
  • pkg/templates/cache.go (2 hunks)
  • pkg/templates/compile.go (2 hunks)
  • pkg/templates/parser.go (6 hunks)
  • pkg/templates/tag_filter.go (2 hunks)
  • pkg/templates/templates.go (2 hunks)
  • pkg/templates/templates_test.go (1 hunks)
  • pkg/utils/insertion_ordered_map.go (2 hunks)
  • pkg/utils/insertion_ordered_map_test.go (1 hunks)
  • pkg/utils/yaml/preprocess.go (2 hunks)
  • pkg/utils/yaml/preprocess_test.go (1 hunks)
  • pkg/utils/yaml/yaml_decode_wrapper.go (1 hunks)
  • pkg/utils/yaml/yaml_decode_wrapper_test.go (1 hunks)
🧰 Additional context used
🧠 Learnings (3)
📚 Learning: 2025-07-16T21:27:14.937Z
Learnt from: hdm
PR: projectdiscovery/nuclei#6322
File: pkg/templates/compile.go:79-81
Timestamp: 2025-07-16T21:27:14.937Z
Learning: To make the template caching mechanism in pkg/templates/compile.go production-ready, DSLs need to be updated to use runtime options instead of cached variables, rather than restoring the Compile() calls on each request.

Applied to files:

  • pkg/operators/matchers/match.go
  • pkg/protocols/common/expressions/expressions.go
  • pkg/operators/matchers/compile.go
  • pkg/templates/tag_filter.go
  • pkg/protocols/common/expressions/variables.go
  • pkg/operators/extractors/compile.go
  • pkg/templates/compile.go
  • pkg/templates/parser.go
📚 Learning: 2025-07-16T21:27:14.937Z
Learnt from: hdm
PR: projectdiscovery/nuclei#6322
File: pkg/templates/compile.go:79-81
Timestamp: 2025-07-16T21:27:14.937Z
Learning: In pkg/templates/compile.go, the template caching mechanism intentionally skips calling Compile() on copied requests to achieve performance benefits. This is the intended design, not a bug. The current implementation isn't production-ready but represents the desired direction.

Applied to files:

  • pkg/operators/matchers/compile.go
  • pkg/operators/extractors/compile.go
  • pkg/templates/compile.go
📚 Learning: 2025-07-16T21:28:08.073Z
Learnt from: hdm
PR: projectdiscovery/nuclei#6322
File: pkg/templates/compile.go:59-78
Timestamp: 2025-07-16T21:28:08.073Z
Learning: The shallow copy behavior (`tplCopy := *value`) in pkg/templates/compile.go is intentional design for the template caching mechanism. The partial-copy approach is part of the performance optimization strategy, not a bug requiring deep copying.

Applied to files:

  • pkg/templates/compile.go
🧬 Code graph analysis (18)
pkg/model/types/stringslice/yamlhelper_test.go (3)
pkg/model/types/stringslice/yamlhelper.go (2)
  • StringNormalizer (11-13)
  • UnmarshalYAMLNode (15-63)
pkg/model/types/stringslice/stringslice.go (1)
  • StringSlice (30-32)
pkg/model/types/stringslice/stringslice_raw.go (1)
  • RawStringSlice (5-7)
pkg/operators/matchers/match.go (1)
pkg/protocols/common/exprcache/exprcache.go (1)
  • GetCompiledDSLExpression (83-85)
pkg/model/types/stringslice/stringslice_raw.go (1)
pkg/model/types/stringslice/yamlhelper.go (1)
  • UnmarshalYAMLNode (15-63)
pkg/protocols/common/expressions/expressions.go (2)
pkg/protocols/common/exprcache/exprcache.go (2)
  • GetCompiledDSLExpression (83-85)
  • GetCompiledExpression (79-81)
pkg/operators/common/dsl/dsl.go (1)
  • FunctionNames (19-19)
pkg/utils/yaml/preprocess_test.go (1)
pkg/utils/yaml/preprocess.go (2)
  • StrictSyntax (18-18)
  • PreProcess (21-82)
pkg/operators/matchers/compile.go (2)
pkg/operators/common/regexcache/cache.go (1)
  • GetCompiledRegex (15-33)
pkg/protocols/common/exprcache/exprcache.go (1)
  • GetCompiledDSLExpression (83-85)
pkg/model/types/stringslice/yamlhelper.go (1)
pkg/utils/utils.go (1)
  • IsBlank (17-19)
pkg/templates/templates.go (1)
pkg/protocols/common/automaticscan/automaticscan.go (1)
  • New (69-119)
pkg/protocols/common/exprcache/exprcache_test.go (1)
pkg/protocols/common/exprcache/exprcache.go (5)
  • ExpressionCache (10-14)
  • NewWithFunctions (22-27)
  • New (16-20)
  • GetCompiledExpression (79-81)
  • GetCompiledDSLExpression (83-85)
pkg/templates/tag_filter.go (1)
pkg/protocols/common/exprcache/exprcache.go (1)
  • GetCompiledDSLExpression (83-85)
pkg/protocols/common/exprcache/exprcache.go (1)
pkg/operators/common/dsl/dsl.go (1)
  • HelperFunctions (18-18)
pkg/protocols/common/expressions/variables.go (1)
pkg/protocols/common/exprcache/exprcache.go (1)
  • GetCompiledDSLExpression (83-85)
pkg/operators/extractors/compile.go (2)
pkg/operators/common/regexcache/cache.go (1)
  • GetCompiledRegex (15-33)
pkg/protocols/common/exprcache/exprcache.go (1)
  • GetCompiledDSLExpression (83-85)
pkg/utils/yaml/yaml_decode_wrapper_test.go (1)
pkg/utils/yaml/yaml_decode_wrapper.go (1)
  • DecodeAndValidate (15-34)
pkg/utils/insertion_ordered_map_test.go (1)
pkg/utils/insertion_ordered_map.go (3)
  • InsertionOrderedStringMap (11-14)
  • NewEmptyInsertionOrderedStringMap (16-21)
  • NewInsertionOrderedStringMap (23-30)
pkg/model/types/stringslice/stringslice.go (1)
pkg/model/types/stringslice/yamlhelper.go (1)
  • UnmarshalYAMLNode (15-63)
pkg/templates/compile.go (2)
pkg/templates/templates.go (1)
  • Template (36-165)
pkg/protocols/protocols.go (1)
  • ExecutorOptions (61-141)
pkg/templates/parser.go (3)
pkg/loader/parser/parser.go (1)
  • Parser (7-11)
pkg/templates/cache.go (2)
  • NewCache (19-28)
  • Cache (11-16)
pkg/utils/yaml/preprocess.go (1)
  • PreProcess (21-82)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Lint
🔇 Additional comments (29)
pkg/protocols/common/generators/generators_test.go (1)

8-8: YAML v3 import LGTM; watch for downstream type changes.

The import bump to gopkg.in/yaml.v3 is correct. Note that v3 commonly yields map[string]interface{} when decoding into interface{}-typed values; make sure tests (and parse helpers) aren’t still asserting map[interface{}]interface{}.

Run the script in my other comment to confirm the signature and expected map type of parsePayloadsWithAggression.

pkg/model/types/severity/severity_test.go (1)

6-6: YAML v3 migration import looks good

The switch to gopkg.in/yaml.v3 is correct. No behavior change expected for yaml.Marshal/yaml.Unmarshal used here.

pkg/model/model_test.go (1)

11-11: YAML v3 import LGTM

The import upgrade to gopkg.in/yaml.v3 is consistent with the repo-wide migration.

pkg/protocols/common/variables/variables_test.go (1)

9-9: YAML v3 import LGTM

The import switch to gopkg.in/yaml.v3 aligns with the migration; usage via yaml.Unmarshal remains compatible.

go.mod (1)

343-343: yaml.v2 marked indirect: no remaining v2 imports or outdated UnmarshalYAML signatures

All checks passed—there are no direct yaml.v2 references or old-style UnmarshalYAML implementations in the codebase.

• No lingering gopkg.in/yaml.v2 imports found in .go files.
• Every YAML import points to gopkg.in/yaml.v3.
• No func (…) UnmarshalYAML(func(...) error) signatures detected.

cmd/integration-test/http.go (1)

17-17: YAML v3 import change is correct

The migration to gopkg.in/yaml.v3 matches the rest of the codebase; the yaml.Unmarshal call sites in this file remain compatible.

pkg/fuzz/type.go (2)

127-129: Alias handling looks good.

Recursive resolution is correct for yaml.v3 nodes.


9-9: No legacy UnmarshalYAML(callback) signatures detected

All UnmarshalYAML implementations now use the node-based signature (func (… ) UnmarshalYAML(node *yaml.Node) error), and no v2-style callback overloads (func(interface{}) error) remain.

pkg/utils/yaml/yaml_decode_wrapper_test.go (1)

1-424: Solid coverage and useful benchmarks.

Great breadth: valid/invalid cases, nested, slices, multiple reader types, concurrency, and perf. After fixing validator init, the concurrency test should also pass under -race.

pkg/utils/yaml/preprocess_test.go (1)

40-72: Great coverage of include expansion, indentation, and strict-mode behaviors.

The table-driven tests validate core scenarios thoroughly, including recursive preprocessing and indentation padding. This will guard regressions around preprocess.go’s padBytes logic.

pkg/templates/templates_test.go (1)

9-10: yaml.v3 migration looks correct.

Import switch only; usage remains the same. Good alignment with the broader repo migration.

pkg/operators/matchers/compile.go (2)

9-11: Good move: centralizing compilation via caches

Replacing direct regexp/govaluate compilation with regexcache/exprcache aligns with the PR’s memory/performance goals and reduces duplicate work across templates.


62-69: LGTM: DSL compilation via exprcache

Using exprcache here ensures per-template expressions are compiled once and reused, which should cut CPU and allocations on hot paths. The error wrapping with dsl.CompilationError is retained.

pkg/operators/matchers/match.go (1)

12-12: Import swap to exprcache is consistent with the new caching layer

This keeps DSL compile paths uniform across the codebase.

pkg/templates/templates.go (1)

31-31: YAML v3 migration acknowledged

Import swap to gopkg.in/yaml.v3 is expected given the broader repo move.

pkg/protocols/common/expressions/variables.go (1)

8-9: Import change to exprcache is consistent

Centralizing DSL compilation here removes direct govaluate/dsl coupling from this package.

pkg/templates/tag_filter.go (1)

16-16: Good move: centralize DSL compilation via exprcache.

This aligns with our prior learning to avoid per-request Compile() and use cached expressions. Nice step toward the caching-based design.

pkg/model/types/stringslice/stringslice_raw.go (1)

3-4: YAML v3 migration looks correct.

Switching to gopkg.in/yaml.v3 is consistent with the repo-wide move.

pkg/operators/extractors/compile.go (2)

9-11: Imports updated to use caches — aligns with perf goals.

Using regexcache and exprcache is consistent with the memory/perf objectives and prior guidance to centralize compilation.


46-51: DSL compilation via exprcache is correct, with structured error.

Using dsl.CompilationError keeps error context while benefiting from caching.

pkg/model/types/stringslice/stringslice.go (2)

10-10: YAML v3 import — consistent with the migration.


64-71: Nice consolidation: delegate YAML parsing and normalization to UnmarshalYAMLNode.

This reduces duplication and keeps normalization consistent across Raw and normalized types.

pkg/protocols/common/expressions/expressions.go (3)

7-7: Switch to exprcache import — good for centralized caching.


19-24: Eval path: cached compilation is correct.

This preserves behavior while eliminating redundant compilations.


60-61: Evaluation path updated to cached compilation — OK.

Keeps evaluation semantics, improves perf via caching.

pkg/templates/compile.go (1)

58-75: Shallow-copy cache reuse is OK (matches intended design).

The shallow-copy of Template on cache-hit is intentional and aligns with the direction captured in prior learnings; avoiding deep copies is the point. The follow-up fixes above ensure safe option rebinding without re-compilation.

pkg/templates/cache.go (2)

10-16: Nice: lightweight parsedTemplate pooling and copy-on-set.

Using a pooled parsedTemplate and storing by value avoids extra heap churn while keeping map contention low. Looks good.


44-48: No action needed — template cache callers already handle nil raw (resolved).

Quick summary: I audited usages of Cache.Has / parsedTemplatesCache.Has. Call sites either ignore the raw bytes or already check raw != nil — none rely on len(raw)==0.

Locations reviewed:

  • pkg/templates/compile.go — uses "_, raw, err := parser.parsedTemplatesCache.Has(filePath)" and checks "err == nil && raw != nil".
  • pkg/templates/parser.go — uses "value, _, err := p.parsedTemplatesCache.Has(templatePath)" and only uses the returned template (ignores raw).
  • pkg/catalog/loader/loader.go — "template, _, _ := templatesCache.Has(templatePath)" (raw ignored).
  • pkg/templates/cache_test.go — tests call "data, _, err := templates.Has(...)" and don't inspect raw.
  • pkg/templates/cache.go — current implementation returns nil for absent raw (snippet below).

Relevant snippet (pkg/templates/cache.go):

	if value.raw != "" {
		return value.template, []byte(value.raw), value.err
	}
	return value.template, nil, value.err
}

Conclusion: No callers found that must be changed to raw != nil checks. Marking this review comment resolved.

pkg/templates/parser.go (1)

156-170: Good use of pooled buffers/readers for preprocessing.

Buffered io.Copy + yamlutil.PreProcess with sync.Pool is clean and should reduce GC pressure for large templates. LGTM.

Comment on lines 93 to 98
// UnmarshalYAML implements yaml.Unmarshaler interface.
func (v *SliceOrMapSlice) UnmarshalYAML(callback func(interface{}) error) error {
// try to unmarshal it as a string and fallback to map
if err := callback(&v.Value); err == nil {
return nil
}
func (v *SliceOrMapSlice) UnmarshalYAML(node *yaml.Node) error {
switch node.Kind {
case yaml.ScalarNode, yaml.SequenceNode:
return node.Decode(&v.Value)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Zero the alternate field to avoid stale state on reuse.

When decoding into Value, ensure KV is cleared. Otherwise pooled/reused structs can carry stale KV and produce ambiguous state.

 case yaml.ScalarNode, yaml.SequenceNode:
-		return node.Decode(&v.Value)
+		v.KV = nil
+		return node.Decode(&v.Value)
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
// UnmarshalYAML implements yaml.Unmarshaler interface.
func (v *SliceOrMapSlice) UnmarshalYAML(callback func(interface{}) error) error {
// try to unmarshal it as a string and fallback to map
if err := callback(&v.Value); err == nil {
return nil
}
func (v *SliceOrMapSlice) UnmarshalYAML(node *yaml.Node) error {
switch node.Kind {
case yaml.ScalarNode, yaml.SequenceNode:
return node.Decode(&v.Value)
// UnmarshalYAML implements yaml.Unmarshaler interface.
func (v *SliceOrMapSlice) UnmarshalYAML(node *yaml.Node) error {
switch node.Kind {
case yaml.ScalarNode, yaml.SequenceNode:
// Clear any previously-set KV data to avoid stale state
v.KV = nil
return node.Decode(&v.Value)
🤖 Prompt for AI Agents
In pkg/fuzz/type.go around lines 93 to 98, the UnmarshalYAML method currently
decodes into v.Value without clearing the alternate field, which can leave stale
KV data when structs are reused; update the method so that before decoding into
Value you set v.KV = nil (and likewise, before decoding into KV clear v.Value)
to zero the alternate field and avoid stale state from pooled/reused structs,
then proceed with the node.Decode call.

Comment on lines +99 to +126
case yaml.MappingNode:
// Handle as ordered map to preserve order
tmpx := mapsutil.NewOrderedMap[string, string]()
// preserve order
for _, v := range node {
tmpx.Set(v.Key.(string), v.Value.(string))

// Process key-value pairs in order
for i := 0; i < len(node.Content); i += 2 {
if i+1 >= len(node.Content) {
break
}

keyNode := node.Content[i]
valueNode := node.Content[i+1]

var key string
if err := keyNode.Decode(&key); err != nil {
continue
}

var value string
if err := valueNode.Decode(&value); err != nil {
continue
}

tmpx.Set(key, value)
}
v.KV = &tmpx
return nil

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Initialize/reset fields and capacity; consider surfacing decode errors instead of silent continue.

  • Reset Value when decoding a mapping to avoid stale data.
  • Pre-size the map for fewer allocations.
  • Swallowing key/value decode errors hides malformed YAML; consider failing fast or collecting errors.
-	case yaml.MappingNode:
+	case yaml.MappingNode:
 		// Handle as ordered map to preserve order
-		tmpx := mapsutil.NewOrderedMap[string, string]()
+		tmpx := mapsutil.NewOrderedMap[string, string]()
+		v.Value = nil
@@
-			if err := keyNode.Decode(&key); err != nil {
-				continue
-			}
+			if err := keyNode.Decode(&key); err != nil {
+				return fmt.Errorf("invalid mapping key: %w", err)
+			}
@@
-			if err := valueNode.Decode(&value); err != nil {
-				continue
-			}
+			if err := valueNode.Decode(&value); err != nil {
+				return fmt.Errorf("invalid mapping value for key %q: %w", key, err)
+			}

If you want to keep a permissive path, I can wire an option flag to control “continue vs. error”.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
case yaml.MappingNode:
// Handle as ordered map to preserve order
tmpx := mapsutil.NewOrderedMap[string, string]()
// preserve order
for _, v := range node {
tmpx.Set(v.Key.(string), v.Value.(string))
// Process key-value pairs in order
for i := 0; i < len(node.Content); i += 2 {
if i+1 >= len(node.Content) {
break
}
keyNode := node.Content[i]
valueNode := node.Content[i+1]
var key string
if err := keyNode.Decode(&key); err != nil {
continue
}
var value string
if err := valueNode.Decode(&value); err != nil {
continue
}
tmpx.Set(key, value)
}
v.KV = &tmpx
return nil
case yaml.MappingNode:
// Handle as ordered map to preserve order
tmpx := mapsutil.NewOrderedMap[string, string]()
v.Value = nil
// Process key-value pairs in order
for i := 0; i < len(node.Content); i += 2 {
if i+1 >= len(node.Content) {
break
}
keyNode := node.Content[i]
valueNode := node.Content[i+1]
var key string
if err := keyNode.Decode(&key); err != nil {
return fmt.Errorf("invalid mapping key: %w", err)
}
var value string
if err := valueNode.Decode(&value); err != nil {
return fmt.Errorf("invalid mapping value for key %q: %w", key, err)
}
tmpx.Set(key, value)
}
v.KV = &tmpx
return nil
🤖 Prompt for AI Agents
In pkg/fuzz/type.go around lines 99 to 126, when decoding a yaml.MappingNode you
must reset v.Value (or relevant field) before populating KV, pre-size the
ordered map to node.Content length/2 to avoid extra allocations, and stop
silently swallowing decode errors: return or accumulate and return the first
decode error (or wire a permissive flag to choose continue vs fail-fast).
Specifically, clear/reset the target value(s) at the start of the mapping
branch, create the ordered map with capacity = len(node.Content)/2, replace the
continue on key/value Decode failures with error propagation (or call a
passed-in option to allow continuing), and set v.KV to the populated map only
after successful processing.

Comment on lines +29 to +55
func (ec *ExpressionCache) Get(expression string) (*govaluate.EvaluableExpression, error) {
ec.cacheLock.RLock()
if compiled, exists := ec.cache[expression]; exists {
ec.cacheLock.RUnlock()
return compiled, nil
}
ec.cacheLock.RUnlock()

var compiled *govaluate.EvaluableExpression
var err error

if ec.functions != nil {
compiled, err = govaluate.NewEvaluableExpressionWithFunctions(expression, ec.functions)
} else {
compiled, err = govaluate.NewEvaluableExpression(expression)
}

if err != nil {
return nil, err
}

ec.cacheLock.Lock()
ec.cache[expression] = compiled
ec.cacheLock.Unlock()

return compiled, nil
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Avoid duplicate compilation under contention (double-checked locking).

Two goroutines can race to compile the same expression and one result is discarded. It’s safe but wastes CPU. Add a second check under the write lock.

 func (ec *ExpressionCache) Get(expression string) (*govaluate.EvaluableExpression, error) {
-  ec.cacheLock.RLock()
-  if compiled, exists := ec.cache[expression]; exists {
-    ec.cacheLock.RUnlock()
-    return compiled, nil
-  }
-  ec.cacheLock.RUnlock()
-
-  var compiled *govaluate.EvaluableExpression
-  var err error
-
-  if ec.functions != nil {
-    compiled, err = govaluate.NewEvaluableExpressionWithFunctions(expression, ec.functions)
-  } else {
-    compiled, err = govaluate.NewEvaluableExpression(expression)
-  }
-  if err != nil {
-    return nil, err
-  }
-
-  ec.cacheLock.Lock()
-  ec.cache[expression] = compiled
-  ec.cacheLock.Unlock()
-  return compiled, nil
+  ec.cacheLock.RLock()
+  if compiled, ok := ec.cache[expression]; ok {
+    ec.cacheLock.RUnlock()
+    return compiled, nil
+  }
+  ec.cacheLock.RUnlock()
+
+  // Compile without holding locks
+  var compiledNew *govaluate.EvaluableExpression
+  var err error
+  if ec.functions != nil {
+    compiledNew, err = govaluate.NewEvaluableExpressionWithFunctions(expression, ec.functions)
+  } else {
+    compiledNew, err = govaluate.NewEvaluableExpression(expression)
+  }
+  if err != nil {
+    return nil, err
+  }
+
+  // Second-check under write lock to avoid duplicate inserts
+  ec.cacheLock.Lock()
+  if compiledExisting, ok := ec.cache[expression]; ok {
+    ec.cacheLock.Unlock()
+    return compiledExisting, nil
+  }
+  ec.cache[expression] = compiledNew
+  ec.cacheLock.Unlock()
+  return compiledNew, nil
 }
🤖 Prompt for AI Agents
In pkg/protocols/common/exprcache/exprcache.go around lines 29 to 55, the
current Get method can compile the same expression twice under contention; after
acquiring the write lock to store a newly compiled expression, add a second
check (double-checked locking) to see if the expression was populated by another
goroutine while this one was compiling; if it exists, discard the local compiled
value and return the cached one, otherwise store and return the compiled
expression. Ensure the read lock/unlock and write lock/unlock semantics remain
correct and that errors from compilation are still returned unchanged.


"go.uber.org/multierr"
"gopkg.in/yaml.v2"
"gopkg.in/yaml.v3"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Verification agent

🧩 Analysis chain

yaml.v3 migration looks good; close the Encoder and set indentation.

yaml.v3’s Encoder buffers output until Close is called. Here we’re not closing it, which can lead to truncated/partial writes on some writers. Also, explicitly setting indent keeps the file formatting stable across versions.

Apply this diff inside CreateConfigIfNotExists():

-	err = yaml.NewEncoder(reportingFile).Encode(options)
-	return err
+	enc := yaml.NewEncoder(reportingFile)
+	enc.SetIndent(2)
+	if err = enc.Encode(options); err != nil {
+		return err
+	}
+	if cerr := enc.Close(); cerr != nil {
+		return errkit.Append(errkit.New("could not finalize yaml encoding"), cerr)
+	}
+	return nil

Run to spot other yaml.NewEncoder usages missing Close:

Also applies to: 236-237


🏁 Script executed:

#!/bin/bash
rg -n --type=go -C2 '\byaml\.NewEncoder\('

Length of output: 533


Ensure YAML encoders are closed and indentation is explicitly set

YAML v3’s Encoder buffers output until Close is called. Omitting Close can lead to truncated or partial writes. Explicitly setting the indent also stabilizes formatting across versions. Please apply the following mandatory refactors:

• pkg/reporting/reporting.go (CreateConfigIfNotExists)
– Replace the one-liner encoder with a named encoder.
– Call SetIndent(2) before encoding.
– Handle Encode errors, then call Close() and propagate any Close error (e.g. via errkit).

--- pkg/reporting/reporting.go
@@ -234,4 +234,13 @@ func CreateConfigIfNotExists(...) error {
-   err = yaml.NewEncoder(reportingFile).Encode(options)
-   return err
+   enc := yaml.NewEncoder(reportingFile)
+   enc.SetIndent(2)
+   if err = enc.Encode(options); err != nil {
+       return err
+   }
+   if cerr := enc.Close(); cerr != nil {
+       return errkit.Append(errkit.New("could not finalize yaml encoding"), cerr)
+   }
+   return nil
}

• cmd/tmc/main.go (around lines 386–388)
– indent is already set via yamlIndentSpaces, but Close() is never called.
– After Encode(infoBlock), invoke yamlEncoder.Close() and handle any returned error (wrapping or appending as per existing conventions).

--- cmd/tmc/main.go
@@ -386,3 +386,7 @@ func updateInfoBlock(...) error {
-   yamlEncoder := yaml.NewEncoder(&newInfoBlock)
-   yamlEncoder.SetIndent(yamlIndentSpaces)
-   err = yamlEncoder.Encode(infoBlock)
+   yamlEncoder := yaml.NewEncoder(&newInfoBlock)
+   yamlEncoder.SetIndent(yamlIndentSpaces)
+   if err = yamlEncoder.Encode(infoBlock); err != nil {
+       return fmt.Errorf("yaml encode infoBlock: %w", err)
+   }
+   if cerr := yamlEncoder.Close(); cerr != nil {
+       return fmt.Errorf("error closing yaml encoder: %w", cerr)
+   }

These changes ensure that all buffered data is flushed and formatting remains consistent.

Committable suggestion skipped: line range outside the PR's diff.

🤖 Prompt for AI Agents
In pkg/reporting/reporting.go around line 17 (CreateConfigIfNotExists), replace
the one-liner yaml.NewEncoder with a named encoder variable, call
yamlEncoder.SetIndent(2) before encoding, check and return any error from
yamlEncoder.Encode(...), then call yamlEncoder.Close() and propagate any Close()
error (use errkit wrapping/propagation consistent with project conventions); and
in cmd/tmc/main.go around lines 386–388, after calling Encode(infoBlock) call
yamlEncoder.Close() and handle/return or append any Close() error consistent
with surrounding error handling.

Comment on lines +74 to 82
template.Options.ApplyNewEngineOptions(options)
if template.CompiledWorkflow != nil {
template.CompiledWorkflow.Options.ApplyNewEngineOptions(options)
for _, w := range template.CompiledWorkflow.Workflows {
for _, ex := range w.Executers {
ex.Options.ApplyNewEngineOptions(options)
}
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Verification agent

🧩 Analysis chain

Cache-hit path misses options rebind for non-workflow executer (stale engine options at runtime).

On cache-hit, options are propagated to CompiledWorkflow executers, but the main template.Executer (non-workflow path) is not updated. This can cause reused executers to run with stale Output/RateLimiter/Logger/etc., leading to cross-scan leakage or incorrect behavior.

This does not require deep copies (which we want to avoid per prior design); we only need to rebind the engine options on the executer if it supports it.

Proposed minimal, non-invasive fix (duck-typing; no recompile, no deep copy):

       template.Options.ApplyNewEngineOptions(options)
       if template.CompiledWorkflow != nil {
         template.CompiledWorkflow.Options.ApplyNewEngineOptions(options)
         for _, w := range template.CompiledWorkflow.Workflows {
           for _, ex := range w.Executers {
             ex.Options.ApplyNewEngineOptions(options)
           }
         }
       }
+      // Rebind engine options for non-workflow executer (if present)
+      if template.Executer != nil {
+        // Prefer an ApplyNewEngineOptions method if exposed by the executer
+        if applier, ok := any(template.Executer).(interface {
+          ApplyNewEngineOptions(*protocols.ExecutorOptions)
+        }); ok {
+          applier.ApplyNewEngineOptions(template.Options)
+        } else if setter, ok := any(template.Executer).(interface {
+          SetOptions(*protocols.ExecutorOptions)
+        }); ok {
+          setter.SetOptions(template.Options)
+        }
+      }

To be safe, please verify whether the underlying executer type(s) expose either ApplyNewEngineOptions or SetOptions; if not, I can add a small adapter to tmplexec to support rebinding without recompilation.
Run to confirm availability:


🏁 Script executed:

#!/bin/bash
# Inspect executer types for option-rebinding hooks
rg -nP --type=go -C3 '\b(ApplyNewEngineOptions|SetOptions)\s*\('

Length of output: 7741


🏁 Script executed:

#!/bin/bash
rg -nP 'type \w*Template\s+struct' -n pkg/templates

Length of output: 240


🏁 Script executed:

#!/bin/bash
# Locate and display the Template struct definition
start_line=$(grep -n "type Template struct" -m1 pkg/templates/templates.go | cut -d: -f1)
if [ -z "$start_line" ]; then
  echo "Template struct not found"
  exit 1
fi
# Show 50 lines from the struct definition
end_line=$((start_line + 50))
sed -n "${start_line},${end_line}p" pkg/templates/templates.go

# Search specifically for an Executer field in the Template struct
rg -n "Executer" -C2 pkg/templates/templates.go

Length of output: 3170


🏁 Script executed:

#!/bin/bash
# Display the Executer interface definition in protocols.go
rg -nP 'type Executer\s+interface' -n pkg/protocols/protocols.go -C5

Length of output: 537


Add explicit option-rebinding support for the main template.Executer

We’ve verified that the protocols.Executer held in Template.Executer does not currently expose any method named ApplyNewEngineOptions or SetOptions (so the proposed duck-typing check will never succeed). To ensure the non-workflow executor gets its engine options rebound on cache-hit, please make the following mandatory changes:

• Define a small interface for rebinding, e.g.:

// In pkg/protocols/protocols.go (or a new file under pkg/templates):
type OptionRebinder interface {
    UpdateOptions(*ExecutorOptions)
}

• Implement it on the types used as Template.Executer (e.g. in pkg/templates/tmplexec.go):

func (e *executor) UpdateOptions(opts *protocols.ExecutorOptions) {
    // rebind the same way workflow executers do
    e.Options.ApplyNewEngineOptions(opts)
}

• In pkg/templates/compile.go, immediately after applying to CompiledWorkflow, add:

       // existing workflow-path rebinding...
       if template.CompiledWorkflow != nil {
         … 
       }
+      // Rebind engine options on the main executer (cache-hit path)
+      if rebinder, ok := template.Executer.(OptionRebinder); ok {
+        rebinder.UpdateOptions(options)
+      }

These changes ensure that, on a cache hit, both the workflow executers and the main template.Executer are updated with the latest ExecutorOptions.

🤖 Prompt for AI Agents
In pkg/templates/compile.go around lines 74 to 82, the code updates engine
options for CompiledWorkflow and its workflow executers but never rebinding the
main Template.Executer; add explicit option-rebinding support by 1) defining a
small interface (e.g. OptionRebinder with
UpdateOptions(*protocols.ExecutorOptions)) in pkg/protocols (or a new file under
pkg/protocols), 2) implement UpdateOptions on the concrete types used as
Template.Executer (e.g. in pkg/templates/tmplexec.go) to call their internal
Options.ApplyNewEngineOptions(opts), and 3) after applying options to
CompiledWorkflow in compile.go, check if template.Executer implements the
OptionRebinder interface and call UpdateOptions with the same options so the
top-level executor is updated on cache-hit.

Comment on lines +36 to 65
func (insertionOrderedStringMap *InsertionOrderedStringMap) UnmarshalYAML(node *yaml.Node) error {
if node.Kind != yaml.MappingNode {
return fmt.Errorf("insertion_ordered_map: expected mapping, got %v", node.Kind)
}

insertionOrderedStringMap.values = make(map[string]interface{})
for _, v := range data {
if v.Key == nil {

// Process key-value pairs in order
for i := 0; i < len(node.Content); i += 2 {
if i+1 >= len(node.Content) {
break
}

keyNode := node.Content[i]
valueNode := node.Content[i+1]

var key string
if err := keyNode.Decode(&key); err != nil {
continue
}
insertionOrderedStringMap.Set(v.Key.(string), toString(v.Value))

var value interface{}
if err := valueNode.Decode(&value); err != nil {
continue
}

insertionOrderedStringMap.Set(key, toString(value))
}
return nil
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Stale keys bug: keys slice is not reset during YAML unmarshal.

You reinitialize the values map but not the keys slice. Reused instances will retain old keys; ForEach will emit keys absent from values with nil values. This causes subtle corruption.

 func (insertionOrderedStringMap *InsertionOrderedStringMap) UnmarshalYAML(node *yaml.Node) error {
-	if node.Kind != yaml.MappingNode {
+	if node.Kind == yaml.AliasNode {
+		return insertionOrderedStringMap.UnmarshalYAML(node.Alias)
+	}
+	if node.Kind != yaml.MappingNode {
 		return fmt.Errorf("insertion_ordered_map: expected mapping, got %v", node.Kind)
 	}
 
-	insertionOrderedStringMap.values = make(map[string]interface{})
+	// Reset internal state; pre-size for fewer allocations
+	capacity := len(node.Content) / 2
+	insertionOrderedStringMap.keys = make([]string, 0, capacity)
+	insertionOrderedStringMap.values = make(map[string]interface{}, capacity)
@@
-		if err := keyNode.Decode(&key); err != nil {
-			continue
-		}
+		if err := keyNode.Decode(&key); err != nil {
+			return fmt.Errorf("invalid key: %w", err)
+		}
@@
-		if err := valueNode.Decode(&value); err != nil {
-			continue
-		}
+		if err := valueNode.Decode(&value); err != nil {
+			return fmt.Errorf("invalid value for key %q: %w", key, err)
+		}
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
func (insertionOrderedStringMap *InsertionOrderedStringMap) UnmarshalYAML(node *yaml.Node) error {
if node.Kind != yaml.MappingNode {
return fmt.Errorf("insertion_ordered_map: expected mapping, got %v", node.Kind)
}
insertionOrderedStringMap.values = make(map[string]interface{})
for _, v := range data {
if v.Key == nil {
// Process key-value pairs in order
for i := 0; i < len(node.Content); i += 2 {
if i+1 >= len(node.Content) {
break
}
keyNode := node.Content[i]
valueNode := node.Content[i+1]
var key string
if err := keyNode.Decode(&key); err != nil {
continue
}
insertionOrderedStringMap.Set(v.Key.(string), toString(v.Value))
var value interface{}
if err := valueNode.Decode(&value); err != nil {
continue
}
insertionOrderedStringMap.Set(key, toString(value))
}
return nil
}
func (insertionOrderedStringMap *InsertionOrderedStringMap) UnmarshalYAML(node *yaml.Node) error {
// Handle YAML aliases (anchors)
if node.Kind == yaml.AliasNode {
return insertionOrderedStringMap.UnmarshalYAML(node.Alias)
}
if node.Kind != yaml.MappingNode {
return fmt.Errorf("insertion_ordered_map: expected mapping, got %v", node.Kind)
}
// Reset internal state; pre-size for fewer allocations
capacity := len(node.Content) / 2
insertionOrderedStringMap.keys = make([]string, 0, capacity)
insertionOrderedStringMap.values = make(map[string]interface{}, capacity)
// Process key-value pairs in order
for i := 0; i < len(node.Content); i += 2 {
if i+1 >= len(node.Content) {
break
}
keyNode := node.Content[i]
valueNode := node.Content[i+1]
var key string
if err := keyNode.Decode(&key); err != nil {
return fmt.Errorf("invalid key: %w", err)
}
var value interface{}
if err := valueNode.Decode(&value); err != nil {
return fmt.Errorf("invalid value for key %q: %w", key, err)
}
insertionOrderedStringMap.Set(key, toString(value))
}
return nil
}
🤖 Prompt for AI Agents
In pkg/utils/insertion_ordered_map.go around lines 36 to 65, the UnmarshalYAML
reinitializes the values map but doesn’t reset the keys slice so reused
instances retain stale keys; before populating from the YAML node, clear or
reassign insertionOrderedStringMap.keys to an empty slice (e.g. nil or
make([]string,0)) so the keys list matches the new values, then proceed to
decode and call Set as before; ensure any early-continue on decode errors does
not leave partial/inconsistent state by only appending keys when a value is
successfully decoded.

Comment on lines +131 to +142
name: "multiple includes with different indentation",
input: fmt.Sprintf(`level1:
# !include:%s
level2:
# !include:%s`, includeFile1, includeFile2),
expected: fmt.Sprintf(`level1:
%s
level2:
%s`,
strings.ReplaceAll(includeContent1, "\n", "\n "),
strings.ReplaceAll(includeContent2, "\n", "\n ")),
},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Add a case for “same file included at different indentation levels” (exposes a padBytes/positioning bug).

PreProcess computes padBytes using bytes.Index on the first occurrence of the matched substring. If the same include directive appears multiple times with different indentation, padding for later occurrences can be derived from the first match’s indentation, producing mis-indented expansions. Add a test for identical include paths at different depths to lock this down.

Apply this diff to extend the tests:

@@
 		{
 			name: "multiple includes with different indentation",
@@
 				strings.ReplaceAll(includeContent2, "\n", "\n    ")),
 		},
+		{
+			name: "same file included at different indentation",
+			input: fmt.Sprintf(`# top level
+# !include:%[1]s
+parent:
+  child:
+    # !include:%[1]s`, includeFile1),
+			expected: fmt.Sprintf(`top level
+%[1]s
+parent:
+  child:
+    %[2]s`,
+				includeContent1,
+				strings.ReplaceAll(includeContent1, "\n", "\n    ")),
+		},

Follow-up (in production code pkg/utils/yaml/preprocess.go): switch to FindAllSubmatchIndex (or equivalent) and compute padBytes from each match’s concrete start index instead of bytes.Index on the full buffer. For example:

@@
- importMatches := reImportsPattern.FindAllSubmatch(data, -1)
+ importMatches := reImportsPattern.FindAllSubmatchIndex(data, -1)
@@
- for _, match := range importMatches {
-   matchBytes := match[0]
-   matchString = string(matchBytes)
-   if len(match) > 0 {
-     includeFileName = string(match[1])
-   }
-   matchIndex := bytes.Index(data, matchBytes)
+ for _, loc := range importMatches {
+   // loc: [fullStart, fullEnd, submatch1Start, submatch1End, ...]
+   fullStart, fullEnd := loc[0], loc[1]
+   incStart, incEnd := loc[2], loc[3]
+   matchIndex := fullStart
+   matchBytes := data[fullStart:fullEnd]
+   matchString = string(matchBytes)
+   includeFileName = string(data[incStart:incEnd])
@@
-   lastNewLineIndex := bytes.LastIndex(data[:matchIndex], []byte("\n"))
+   lastNewLineIndex := bytes.LastIndex(data[:matchIndex], []byte("\n"))

This avoids indentation bleed between identical includes at different offsets. I can send a focused PR patch if helpful.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
name: "multiple includes with different indentation",
input: fmt.Sprintf(`level1:
# !include:%s
level2:
# !include:%s`, includeFile1, includeFile2),
expected: fmt.Sprintf(`level1:
%s
level2:
%s`,
strings.ReplaceAll(includeContent1, "\n", "\n "),
strings.ReplaceAll(includeContent2, "\n", "\n ")),
},
{
name: "multiple includes with different indentation",
input: fmt.Sprintf(`level1:
# !include:%s
level2:
# !include:%s`, includeFile1, includeFile2),
expected: fmt.Sprintf(`level1:
%s
level2:
%s`,
strings.ReplaceAll(includeContent1, "\n", "\n "),
strings.ReplaceAll(includeContent2, "\n", "\n ")),
},
{
name: "same file included at different indentation",
input: fmt.Sprintf(`# top level
# !include:%[1]s
parent:
child:
# !include:%[1]s`, includeFile1),
expected: fmt.Sprintf(`top level
%[1]s
parent:
child:
%[2]s`,
includeContent1,
strings.ReplaceAll(includeContent1, "\n", "\n ")),
},

Comment on lines +241 to +253
if os.Getuid() == 0 { // Skip if running as root
t.Skip("Skipping permission test when running as root")
}

restrictedFile := filepath.Join(tempDir, "restricted.yaml")
content := "restricted: content"
err := os.WriteFile(restrictedFile, []byte(content), 0000)
if err != nil {
return "", err
}
return fmt.Sprintf("# !include:%s", restrictedFile), nil
},
expectError: true,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Fix: os.Getuid is unix-only; the test won’t build on Windows.

os.Getuid is not available on Windows. Gate the test by OS or use a portable approach. Also restore StrictSyntax after mutation.

Apply this diff to make the test portable:

@@
-      if os.Getuid() == 0 { // Skip if running as root
-        t.Skip("Skipping permission test when running as root")
-      }
+      // Skip on non-Unix OSes and when running as root on Unix
+      if runtime.GOOS == "windows" || runtime.GOOS == "plan9" {
+        t.Skip("Skipping permission test on non-Unix OS")
+      }
+      if os.Geteuid() == 0 {
+        t.Skip("Skipping permission test when running as root")
+      }
@@
-      restrictedFile := filepath.Join(tempDir, "restricted.yaml")
+      restrictedFile := filepath.Join(tempDir, "restricted.yaml")
       content := "restricted: content"
       err := os.WriteFile(restrictedFile, []byte(content), 0000)

And at the top of this file:

@@
-import (
+import (
   "fmt"
   "os"
   "path/filepath"
   "strings"
   "testing"
+  "runtime"

Note: os.Geteuid is available on Unix. If you need a stricter cross-platform behavior, consider skipping this case entirely on non-Unix.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
if os.Getuid() == 0 { // Skip if running as root
t.Skip("Skipping permission test when running as root")
}
restrictedFile := filepath.Join(tempDir, "restricted.yaml")
content := "restricted: content"
err := os.WriteFile(restrictedFile, []byte(content), 0000)
if err != nil {
return "", err
}
return fmt.Sprintf("# !include:%s", restrictedFile), nil
},
expectError: true,
diff --git a/pkg/utils/yaml/preprocess_test.go b/pkg/utils/yaml/preprocess_test.go
index abcdef0..1234567 100644
++ b/pkg/utils/yaml/preprocess_test.go
@@ -1,6 +1,7 @@
import (
"fmt"
"os"
"path/filepath"
"strings"
"testing"
"runtime"
)
func TestPreprocess(t *testing.T) {
@@ -241,10 +242,14 @@ func TestPreprocess(t *testing.T) {
// Mutate the input to point to a file with no permissions
func() (string, error) {
- if os.Getuid() == 0 { // Skip if running as root
- t.Skip("Skipping permission test when running as root")
// Skip on non-Unix OSes and when running as root on Unix
if runtime.GOOS == "windows" || runtime.GOOS == "plan9" {
t.Skip("Skipping permission test on non-Unix OS")
}
if os.Geteuid() == 0 {
t.Skip("Skipping permission test when running as root")
}
restrictedFile := filepath.Join(tempDir, "restricted.yaml")
content := "restricted: content"
err := os.WriteFile(restrictedFile, []byte(content), 0000)
if err != nil {
return "", err
}
return fmt.Sprintf("# !include:%s", restrictedFile), nil
},
expectError: true,
🤖 Prompt for AI Agents
In pkg/utils/yaml/preprocess_test.go around lines 241 to 253, the test uses
os.Getuid() which is Unix-only and prevents building on Windows; change the test
to skip this permission check on non-Unix platforms (e.g., guard with a
runtime.GOOS check or build tags) or replace with a portable approach (skip when
runtime.GOOS == "windows"); also ensure that after mutating StrictSyntax in the
test you restore its original value before returning so subsequent tests are
unaffected.

Comment on lines +293 to +316
func TestDecodeAndValidate_ConcurrentAccess(t *testing.T) {
yamlData := `name: "John Doe"
email: "john@example.com"
age: 30
username: "johndoe"`

// Test concurrent access to ensure validator is thread-safe
done := make(chan bool, 10)
for i := 0; i < 10; i++ {
go func() {
defer func() { done <- true }()
for j := 0; j < 100; j++ {
var target ValidStruct
reader := strings.NewReader(yamlData)
err := DecodeAndValidate(reader, &target)
assert.NoError(t, err)
}
}()
}

for i := 0; i < 10; i++ {
<-done
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Concurrent access test may race on lazy validator init; fix production code with sync.Once.

DecodeAndValidate initializes a package-level validator without synchronization. Under -race this will flag a data race. Make initialization once-only.

Proposed change in pkg/utils/yaml/yaml_decode_wrapper.go (outside this diff):

package yaml

import "sync"

// ...
var (
    validate     *validator.Validate
    validateOnce sync.Once
)

func DecodeAndValidate(r io.Reader, v interface{}) error {
    if err := yaml.NewDecoder(r).Decode(v); err != nil {
        return err
    }
    validateOnce.Do(func() { validate = validator.New() })
    // ...
}
🤖 Prompt for AI Agents
In pkg/utils/yaml/yaml_decode_wrapper_test.go around lines 293 to 316, the test
exposes a data race because the package-level validator is lazily initialized
without synchronization; fix the production file
pkg/utils/yaml/yaml_decode_wrapper.go by adding a sync.Once-controlled
initialization: declare a package-level validateOnce sync.Once alongside the
validate *validator.Validate variable, ensure the file imports "sync", and call
validateOnce.Do(func(){ validate = validator.New() }) inside DecodeAndValidate
before using validate so the validator is created exactly once and is safe under
concurrent calls.

"gopkg.in/yaml.v3"
)

var validate *validator.Validate
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Data race risk on package-global validator initialization.

validate is lazily initialized without synchronization. Concurrent calls can race. Use sync.Once.

 import (
   "io"
   "strings"

   "github.com/go-playground/validator/v10"
   "github.com/pkg/errors"
   "gopkg.in/yaml.v3"
+  "sync"
 )
 
 var validate *validator.Validate
+var validateOnce sync.Once
@@
-  if validate == nil {
-    validate = validator.New()
-  }
+  validateOnce.Do(func() {
+    validate = validator.New()
+  })

Also applies to: 19-21

🤖 Prompt for AI Agents
In pkg/utils/yaml/yaml_decode_wrapper.go around lines 12 and 19-21, the
package-global variable "validate" is lazily initialized without synchronization
which creates a data race under concurrent calls; fix by introducing a sync.Once
(e.g., validateOnce) and perform the initialization of the validator inside
validateOnce.Do(...) so the validator is created exactly once before use, ensure
all code paths use the initialized validator and remove any unsynchronized nil
checks; update imports to include sync and verify no other unsynchronized
package globals remain.

@knakul853 knakul853 closed this Aug 22, 2025
@knakul853 knakul853 deleted the memory-improvement branch August 22, 2025 22:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant