[Enhancement] PowerShell - Optimize Entropy Calculation, Add Normalized Entropy, Add Pipeline Benchmark #16707

w0rk3r · 2025-12-26T22:21:17Z

Proposed commit message

windows: refine PowerShell script entropy pipeline

Replace code-point HashMap counting with a fixed 65k UTF-16 char histogram
and skip truncated signature fragments before entropy is computed. Add a
normalized entropy field scaled by script length (0–1).

Summary

Related issue:

https://github.com/elastic/ia-trade-team/issues/745

This PR:

Replaces code‑point HashMap counting with a fixed 65k UTF‑16 char histogram for script entropy, reducing the script processor time and improving eps (2924 → 4873 eps in warm run).
Skips truncated signature fragments before entropy is computed.
Adds powershell.file.script_block_entropy_normalized = entropy_bits / log2(script_block_length) (0–1).
Adds benchmark fixtures to track performance regressions during our research.

Old pipeline:

Improved pipeline:

Complete benchmark output

Old:

PS C:\Users\Jonhnathan\Documents\Github\integrations\packages\windows> .\..\..\elastic-package.exe benchmark pipeline --data-streams powershell_operational --use-test-samples=false
Run pipeline benchmarks for the package
--- Benchmark results for package: windows - START ---
╭─────────────────────────╮
│ parameters              │
├──────────────────┬──────┤
│ source_doc_count │   11 │
│ doc_count        │ 2500 │
╰──────────────────┴──────╯
╭───────────────────────────╮
│ pipeline_performance      │
├─────────────────┬─────────┤
│ processing_time │   1.10s │
│ eps             │ 2278.94 │
╰─────────────────┴─────────╯
╭────────────────────────────────────────╮
│ procs_by_total_time                    │
├───────────────────────────────┬────────┤
│ script @ default.yml:322      │ 47.49% │
│ gsub @ default.yml:305        │ 30.36% │
│ fingerprint @ default.yml:311 │  3.19% │
│ set @ default.yml:60          │  2.10% │
│ script @ default.yml:13       │  1.82% │
│ gsub @ default.yml:316        │  1.09% │
│ script @ default.yml:30       │  1.00% │
│ remove @ default.yml:575      │  0.55% │
│ rename @ default.yml:290      │  0.18% │
│ trim @ default.yml:302        │  0.18% │
╰───────────────────────────────┴────────╯
╭─────────────────────────────────────────╮
│ procs_by_avg_time_per_doc               │
├───────────────────────────────┬─────────┤
│ script @ default.yml:322      │ 208.4µs │
│ gsub @ default.yml:305        │ 133.2µs │
│ fingerprint @ default.yml:311 │    14µs │
│ set @ default.yml:60          │   9.2µs │
│ script @ default.yml:13       │     8µs │
│ gsub @ default.yml:316        │   4.8µs │
│ script @ default.yml:30       │   4.4µs │
│ remove @ default.yml:575      │   2.4µs │
│ rename @ default.yml:290      │   800ns │
│ trim @ default.yml:302        │   800ns │
╰───────────────────────────────┴─────────╯

--- Benchmark results for package: windows - END   ---
Done
--- Benchmark results for package: windows - START ---
╭─────────────────────────╮
│ parameters              │
├──────────────────┬──────┤
│ source_doc_count │   11 │
│ doc_count        │ 2500 │
╰──────────────────┴──────╯
╭───────────────────────────╮
│ pipeline_performance      │
├─────────────────┬─────────┤
│ processing_time │   0.85s │
│ eps             │ 2923.98 │
╰─────────────────┴─────────╯
╭────────────────────────────────────────╮
│ procs_by_total_time                    │
├───────────────────────────────┬────────┤
│ script @ default.yml:322      │ 50.53% │
│ gsub @ default.yml:305        │ 34.15% │
│ fingerprint @ default.yml:311 │  2.57% │
│ gsub @ default.yml:316        │  1.17% │
│ script @ default.yml:13       │  0.70% │
│ set @ default.yml:60          │  0.58% │
│ remove @ default.yml:575      │  0.35% │
│ script @ default.yml:30       │  0.35% │
│ rename @ default.yml:290      │  0.12% │
╰───────────────────────────────┴────────╯
╭─────────────────────────────────────────╮
│ procs_by_avg_time_per_doc               │
├───────────────────────────────┬─────────┤
│ script @ default.yml:322      │ 172.8µs │
│ gsub @ default.yml:305        │ 116.8µs │
│ fingerprint @ default.yml:311 │   8.8µs │
│ gsub @ default.yml:316        │     4µs │
│ script @ default.yml:13       │   2.4µs │
│ set @ default.yml:60          │     2µs │
│ remove @ default.yml:575      │   1.2µs │
│ script @ default.yml:30       │   1.2µs │
│ rename @ default.yml:290      │   400ns │
╰───────────────────────────────┴─────────╯

--- Benchmark results for package: windows - END   ---
Done

Improved:

PS C:\Users\Jonhnathan\Documents\Github\integrations\packages\windows> .\..\..\elastic-package.exe benchmark pipeline --data-streams powershell_operational --use-test-samples=false
Run pipeline benchmarks for the package
--- Benchmark results for package: windows - START ---
╭─────────────────────────╮
│ parameters              │
├──────────────────┬──────┤
│ source_doc_count │   11 │
│ doc_count        │ 2500 │
╰──────────────────┴──────╯
╭───────────────────────────╮
│ pipeline_performance      │
├─────────────────┬─────────┤
│ processing_time │   0.51s │
│ eps             │ 4892.37 │
╰─────────────────┴─────────╯
╭────────────────────────────────────────╮
│ procs_by_total_time                    │
├───────────────────────────────┬────────┤
│ gsub @ default.yml:305        │ 55.19% │
│ script @ default.yml:322      │ 28.18% │
│ fingerprint @ default.yml:311 │  4.11% │
│ gsub @ default.yml:316        │  1.96% │
│ script @ default.yml:13       │  0.59% │
│ remove @ default.yml:657      │  0.39% │
│ rename @ default.yml:290      │  0.20% │
│ set @ default.yml:60          │  0.20% │
╰───────────────────────────────┴────────╯
╭─────────────────────────────────────────╮
│ procs_by_avg_time_per_doc               │
├───────────────────────────────┬─────────┤
│ gsub @ default.yml:305        │ 112.8µs │
│ script @ default.yml:322      │  57.6µs │
│ fingerprint @ default.yml:311 │   8.4µs │
│ gsub @ default.yml:316        │     4µs │
│ script @ default.yml:13       │   1.2µs │
│ remove @ default.yml:657      │   800ns │
│ rename @ default.yml:290      │   400ns │
│ set @ default.yml:60          │   400ns │
╰───────────────────────────────┴─────────╯

--- Benchmark results for package: windows - END   ---
Done
--- Benchmark results for package: windows - START ---
╭─────────────────────────╮
│ parameters              │
├──────────────────┬──────┤
│ source_doc_count │   11 │
│ doc_count        │ 2500 │
╰──────────────────┴──────╯
╭───────────────────────────╮
│ pipeline_performance      │
├─────────────────┬─────────┤
│ processing_time │   0.51s │
│ eps             │ 4873.29 │
╰─────────────────┴─────────╯
╭────────────────────────────────────────╮
│ procs_by_total_time                    │
├───────────────────────────────┬────────┤
│ gsub @ default.yml:305        │ 57.89% │
│ script @ default.yml:322      │ 25.93% │
│ fingerprint @ default.yml:311 │  3.51% │
│ gsub @ default.yml:316        │  1.95% │
│ script @ default.yml:13       │  0.78% │
│ remove @ default.yml:657      │  0.39% │
│ set @ default.yml:60          │  0.19% │
╰───────────────────────────────┴────────╯
╭─────────────────────────────────────────╮
│ procs_by_avg_time_per_doc               │
├───────────────────────────────┬─────────┤
│ gsub @ default.yml:305        │ 118.8µs │
│ script @ default.yml:322      │  53.2µs │
│ fingerprint @ default.yml:311 │   7.2µs │
│ gsub @ default.yml:316        │     4µs │
│ script @ default.yml:13       │   1.6µs │
│ remove @ default.yml:657      │   800ns │
│ set @ default.yml:60          │   400ns │
╰───────────────────────────────┴─────────╯

--- Benchmark results for package: windows - END   ---
Done

Checklist

I have reviewed tips for building integrations and this pull request is aligned with them.
I have verified that all data streams collect metrics or logs.
I have added an entry to my package's changelog.yml file.
I have verified that Kibana version constraints are current according to guidelines.
I have verified that any added dashboard complies with Kibana's Dashboard good practices

…ed Entropy, Add Pipeline Benchmark

elasticmachine · 2025-12-26T22:21:21Z

Pinging @elastic/sec-windows-platform (Team:Security-Windows Platform)

elasticmachine · 2026-01-04T12:13:45Z

Pinging @elastic/elastic-agent-data-plane (Team:Elastic-Agent-Data-Plane)

packages/windows/changelog.yml

mauri870

LGTM, but I'm not very proficient with PowerShell. The code looks fine, but it needs a deeper look from the Windows team.

eric-forte-elastic · 2026-01-16T21:24:59Z

packages/windows/data_stream/powershell_operational/elasticsearch/ingest_pipeline/default.yml

+
+        double normalizedEntropy = 0.0;
+        if (length > 1) {
+            double maxEntropy = Math.log((double) length) * invLog2; // max bits if every character is unique


I think the normalized entropy calculation looks good 👍

Few notes for posterity:

For the line double maxEntropy = Math.log((double) length) * invLog2; // max bits if every character is unique I think it makes sense to use length here. Typical normalized entropy calculations (like that for R/Posterior ref) would use something akin to seenCount instead of length. However, this is expecting the input to be more akin to categories where a and a are equivalent regardless of their position in the script block. In our case, I think we want the position to mater as well, so each value is by definition unique making length the correct number to use here (as is correctly done in the code).

The pre-output check else if (normalizedEntropy > 1.0) normalizedEntropy = 1.0; I think is technically not necessary, as this should not occur. However, I think we should keep this check as it could catch floating point rounding issues without impacting the integrity of the data result (code is correct as is).

elastic-vault-github-plugin-prod · 2026-01-23T14:48:01Z

🚀 Benchmarks report

To see the full report comment with /test benchmark fullreport

elasticmachine · 2026-01-23T14:48:06Z

💚 Build Succeeded

Buildkite Build
Commit: da41d41

History

💔 Build #36784 failed eeae9ce
💔 Build #35908 failed a2f9ae4

cc @w0rk3r

w0rk3r added 4 commits December 26, 2025 18:30

[Enhancement] PowerShell - Optimize Entropy Calculation, Add Normaliz…

c502016

…ed Entropy, Add Pipeline Benchmark

Merge branch 'main' into posh_entropy_2

ee3c619

Update test-powershell-operational-events.json-expected.json

370c5d6

Update changelog.yml

a2f9ae4

w0rk3r self-assigned this Dec 26, 2025

w0rk3r requested review from a team as code owners December 26, 2025 22:21

w0rk3r added enhancement New feature or request Integration:windows Windows Team:Security-Windows Platform Security Windows Platform team [elastic/sec-windows-platform] labels Dec 26, 2025

w0rk3r requested review from faec and mauri870 December 26, 2025 22:21

pierrehilbert added the Team:Elastic-Agent-Data-Plane Agent Data Plane team [elastic/elastic-agent-data-plane] label Jan 4, 2026

mauri870 reviewed Jan 5, 2026

View reviewed changes

packages/windows/changelog.yml Show resolved Hide resolved

mauri870 self-requested a review January 5, 2026 12:15

mauri870 approved these changes Jan 5, 2026

View reviewed changes

andrewkroh added the documentation Improvements or additions to documentation. Applied to PRs that modify *.md files. label Jan 8, 2026

eric-forte-elastic reviewed Jan 16, 2026

View reviewed changes

Merge branch 'main' into posh_entropy_2

eeae9ce

nfritts approved these changes Jan 23, 2026

View reviewed changes

rename benchmark file

da41d41

gogochan approved these changes Jan 23, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Enhancement] PowerShell - Optimize Entropy Calculation, Add Normalized Entropy, Add Pipeline Benchmark #16707

[Enhancement] PowerShell - Optimize Entropy Calculation, Add Normalized Entropy, Add Pipeline Benchmark #16707

Uh oh!

w0rk3r commented Dec 26, 2025 •

edited

Loading

Uh oh!

elasticmachine commented Dec 26, 2025

Uh oh!

elasticmachine commented Jan 4, 2026

Uh oh!

Uh oh!

mauri870 left a comment

Uh oh!

eric-forte-elastic Jan 16, 2026

Uh oh!

elastic-vault-github-plugin-prod bot commented Jan 23, 2026

Uh oh!

elasticmachine commented Jan 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

9 participants

[Enhancement] PowerShell - Optimize Entropy Calculation, Add Normalized Entropy, Add Pipeline Benchmark #16707

Are you sure you want to change the base?

[Enhancement] PowerShell - Optimize Entropy Calculation, Add Normalized Entropy, Add Pipeline Benchmark #16707

Uh oh!

Conversation

w0rk3r commented Dec 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Proposed commit message

Summary

Checklist

Uh oh!

elasticmachine commented Dec 26, 2025

Uh oh!

elasticmachine commented Jan 4, 2026

Uh oh!

Uh oh!

mauri870 left a comment

Choose a reason for hiding this comment

Uh oh!

eric-forte-elastic Jan 16, 2026

Choose a reason for hiding this comment

Uh oh!

elastic-vault-github-plugin-prod bot commented Jan 23, 2026

🚀 Benchmarks report

Uh oh!

elasticmachine commented Jan 23, 2026

💚 Build Succeeded

History

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

9 participants

w0rk3r commented Dec 26, 2025 •

edited

Loading