-
Notifications
You must be signed in to change notification settings - Fork 531
[Enhancement] PowerShell - Optimize Entropy Calculation, Add Normalized Entropy, Add Pipeline Benchmark #16707
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
Pinging @elastic/sec-windows-platform (Team:Security-Windows Platform) |
|
Pinging @elastic/elastic-agent-data-plane (Team:Elastic-Agent-Data-Plane) |
mauri870
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, but I'm not very proficient with PowerShell. The code looks fine, but it needs a deeper look from the Windows team.
| double normalizedEntropy = 0.0; | ||
| if (length > 1) { | ||
| double maxEntropy = Math.log((double) length) * invLog2; // max bits if every character is unique |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the normalized entropy calculation looks good 👍
Few notes for posterity:
- For the line
double maxEntropy = Math.log((double) length) * invLog2; // max bits if every character is uniqueI think it makes sense to use length here. Typical normalized entropy calculations (like that for R/Posterior ref) would use something akin toseenCountinstead oflength. However, this is expecting the input to be more akin to categories whereaandaare equivalent regardless of their position in the script block. In our case, I think we want the position to mater as well, so each value is by definition unique makinglengththe correct number to use here (as is correctly done in the code). - The pre-output check
else if (normalizedEntropy > 1.0) normalizedEntropy = 1.0;I think is technically not necessary, as this should not occur. However, I think we should keep this check as it could catch floating point rounding issues without impacting the integrity of the data result (code is correct as is).
🚀 Benchmarks reportTo see the full report comment with |
💚 Build Succeeded
History
cc @w0rk3r |
Proposed commit message
Summary
Related issue:
This PR:
powershell.file.script_block_entropy_normalized= entropy_bits / log2(script_block_length) (0–1).Old pipeline:
Improved pipeline:
Complete benchmark output
Old:
Improved:
Checklist
changelog.ymlfile.