-
Notifications
You must be signed in to change notification settings - Fork 451
Extend file.pe Fieldset #1071
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Extend file.pe Fieldset #1071
Changes from 24 commits
Commits
Show all changes
57 commits
Select commit
Hold shift + click to select a range
7080358
Merge pull request #1 from elastic/master
peasead 314f9ab
Merge pull request #2 from elastic/master
peasead 408816b
initial commit
peasead 342cf00
update module
peasead 4c5f266
further clarification
peasead 2213d49
updates
peasead a8f1a80
'make' and 'make test'
peasead 6582bde
changelog
peasead 51fd471
added PR
peasead 41a5bf0
added PR
peasead 7744bd1
reorganized and fixed orig pe.yml
peasead 205eeac
updatd SMEs
peasead 7fa6ae5
reran make to reset files
peasead a80581e
removed changelog entry
peasead ec587e4
removed existing fields
peasead 6e86d31
Update rfcs/text/0000-extend-file-pe.md
peasead 24baef4
Update rfcs/text/0000-extend-file-pe.md
peasead c091452
Update rfcs/text/0000-extend-file-pe.md
peasead c1ac596
Update rfcs/text/0000-extend-file-pe.md
peasead 70e68b0
Update rfcs/text/0000-extend-file-pe.md
peasead b9b2686
Update rfcs/text/pe/pe.yml
peasead 80641f0
remove stage headers
peasead a0f193f
add examples and references
peasead b3e72da
removed vt module blob for now
peasead b6f9dfc
adjustments to entry_point as keyword
peasead 6f6cd39
move rich_headers into its own fields
peasead b12961b
extended compiler to include name and version
peasead 0063d23
adjusted dhash description
peasead f0e7d61
Update icon fields
peasead 0bab0ec
duplicate fields
peasead 4d2d65b
removed unnecessary hashing algos
peasead 7cc04f8
moving overlay to file.*
peasead f5a0533
removing resource_languages in favor of resource_details
peasead 4356939
removed packers, not part of peinfo
peasead 53467c6
moved debug to nested fields
peasead f7c1af7
moved sections to nested
peasead 958c646
updated imports name and change type to flattened
peasead 5bbc6f5
resources rename
peasead 4fa79d5
added "s" to types
peasead 13d264f
remove resources.types aggregation
peasead 550b038
removed plurality of resources.type
peasead 74628f9
add nested resources fields to table
peasead a8d954e
update entry_point desc.
peasead ccbed5d
Update rfcs/text/pe/pe.yml
peasead cb7631a
Merge branch 'file.pe-extend' of github.com:peasead/ecs into file.pe-…
peasead 4a68601
update/add pe.packers
peasead 1f88931
fixed compiler type
peasead 1bd64de
added pe.icon to table
peasead ad156fd
removed file. from names
peasead 58fca91
Update rfcs/text/0000-extend-file-pe.md
peasead d54ad8c
Update pe.yml
peasead a63a716
Update pe.yml
peasead 49d069b
combined debug.type and debut.type_str
peasead 9e2d59d
field definition housekeeping
ebeahan 9464876
adjust markdown comments to align with updated proposal stages
ebeahan 3f816cb
assigning rfc number and set advance date
ebeahan aa000c2
rename using assigned rfc number
ebeahan File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,139 @@ | ||
| # 0000: Extend the PE field set | ||
|
|
||
| - Stage: **1 (proposal)** | ||
| - Date: **TBD** | ||
|
|
||
| The Portable Executable (PE) sub-field, of the `file` top-level fieldset, can be updated to include more file attributes to aid in file analysis. This additional document metadata can be used for malware research, as well as coding and other application development efforts. | ||
|
|
||
| ## Fields | ||
|
|
||
| This RFC is to create 25 additional sub-fields within the `file.pe` fieldset. | ||
|
|
||
| | Name | Type | Description | | ||
| | ---- | ---- | ----------- | | ||
| | file.pe.authentihash | keyword | Authentihash of the PE file. | | ||
|
peasead marked this conversation as resolved.
Outdated
peasead marked this conversation as resolved.
Outdated
|
||
| | file.pe.compile_timestamp | date | Compile timestamp of the PE file. | | ||
| | file.pe.compiler | wildcard | Name and version of the compiler. | | ||
| | file.pe.creation_date | date | Extracted when possible from the file's metadata. Indicates when it was built or compiled. It can also be faked by malware creators. | | ||
| | file.pe.entry_point | long | Entry point of the PE file. | | ||
| | file.pe.exports | keyword | List of symbols exported by PE | | ||
| | file.pe.debug | flattened | Debug information, if present | | ||
| | file.pe.import_list | flattened | List of all imported functions | | ||
|
peasead marked this conversation as resolved.
Outdated
|
||
| | file.pe.sections | flattened | Data about sections of compiled binary PE | | ||
|
peasead marked this conversation as resolved.
Outdated
|
||
| | file.pe.resource_details | flattened | If the PE contains resources, some info about them | | ||
| | file.pe.resource_languages | flattened | Digest of languages found in resources. Key is language (as string) and value is how many resources there are having that language (as integer) | | ||
| | file.pe.resource_types | flattened | Digest of resource types. Key is resource type (as string) and value is how many resources there are of that specific type (as integer) | | ||
| | file.pe.packers | flattened | Identifies packers used on Windows PE files by several tools and AVs. Keys are tool names and values are identified packers, both strings. See `file.pe.packers` for merged list of packers from all tools. | | ||
|
peasead marked this conversation as resolved.
Outdated
|
||
| | file.pe.machine_type | keyword | Machine type of the PE file. | | ||
| | file.pe.main_icon.hash.dhash | keyword | Difference Hash for a given PE file. | | ||
| | file.pe.main_icon.hash.md5 | keyword | MD5 hash of raw icon data | | ||
| | file.pe.overlay.chi2 | float | Chi2 information of the PE file. | | ||
| | file.pe.overlay.entropy | float | Entropy information of the PE file. | | ||
|
peasead marked this conversation as resolved.
Outdated
|
||
| | file.pe.overlay.filetype | keyword | Filetype of the PE file. | | ||
| | file.pe.overlay.md5 | keyword | Overlay MD5 hash of the PE file. | | ||
| | file.pe.overlay.offset | long | Offset of the overlay information of the PE file. | | ||
| | file.pe.overlay.size | long | Size of the PE file. | | ||
| | file.pe.overlay.rich_pe_header_hash | keyword | Hash of the header for the PE file. | | ||
| | file.pe.packers | keyword | Merged list of all detected packers by all tools used. | | ||
|
peasead marked this conversation as resolved.
Outdated
|
||
| | file.pe.rich_pe_header_hash | keyword | Hash of the PE header. | | ||
|
|
||
| [New `pe.yml` fields](pe/pe.yml) | ||
|
|
||
| <!-- | ||
| Stage 3: Add or update all remaining field definitions. The list should now be exhaustive. The goal here is to validate the technical details of all remaining fields and to provide a basis for releasing these field definitions as beta in the schema. Use GitHub code blocks with yml syntax formatting. | ||
| --> | ||
|
|
||
| ## Usage | ||
|
|
||
| In performing file analysis, specifically for malware research, understanding file similarities can be used to chain together malware samples and families to identify campaigns and possibly attribution. Additionally, understanding how malware components are re-used is useful in understanding malware telemetry, especially in understanding the impact being made through the introduction of defensive countermeasures. | ||
|
|
||
| As an example, if XDR vendors deploys a new malware model to defeat a specific type of ransomware and we start observing a change and/or relationship to the headers, import tables, packers, etc of that malware family, we can make assumptions that the changes to the malware model are making an impact against the malware family. | ||
|
|
||
| As another example, tracking file metadata for specific families is useful in predicting new campaigns if we see similar file metadata being used for new samples. [Example](https://www.bleepingcomputer.com/news/security/maze-ransomware-is-shutting-down-its-cybercrime-operation/), the Maze ransomware family shutting down and re-purposing as Egregor. | ||
|
|
||
| ## Source data | ||
|
|
||
| This type of data can be provided by logs from VirusTotal, Reversing Labs, Lockheed Martin's LAIKABOSS, Emerson's File Scanning Framework, Target's Strelka, or other file/malware analysis platforms. | ||
|
|
||
| * [VirusTotal Filebeat module PR](https://github.com/elastic/beats/pull/21815) | ||
| * [VirusTotal API](https://developers.virustotal.com/v3.0/reference) | ||
| * [Emerson FSF](https://github.com/EmersonElectricCo/fsf) | ||
| * [Target Strelka](https://github.com/target/strelka) | ||
| * [Lockheed Martin LAIKABOSS](https://github.com/lmco/laikaboss) | ||
|
|
||
| <!-- | ||
| Stage 1: Provide a high-level description of example sources of data. This does not yet need to be a concrete example of a source document, but instead can simply describe a potential source (e.g. nginx access log). This will ultimately be fleshed out to include literal source examples in a future stage. The goal here is to identify practical sources for these fields in the real world. ~1-3 sentences or unordered list. | ||
| --> | ||
|
|
||
| <!-- | ||
| Stage 2: Included a real world example source document. Ideally this example comes from the source(s) identified in stage 1. If not, it should replace them. The goal here is to validate the utility of these field changes in the context of a real world example. Format with the source name as a ### header and the example document in a GitHub code block with json formatting. | ||
| --> | ||
|
|
||
| <!-- | ||
| Stage 3: Add more real world example source documents so we have at least 2 total, but ideally 3. Format as described in stage 2. | ||
| --> | ||
|
|
||
| ## Scope of impact | ||
|
|
||
| There should be no breaking changes, depreciation strategies, or significant refactoring as this is extending the existing fieldset. | ||
|
|
||
| While likely not a large-scale ECS project, there would be documentation updates needed to explain the new fields. | ||
|
|
||
| <!-- | ||
| Stage 2: Identifies scope of impact of changes. Are breaking changes required? Should deprecation strategies be adopted? Will significant refactoring be involved? Break the impact down into: | ||
| * Ingestion mechanisms (e.g. beats/logstash) | ||
| * Usage mechanisms (e.g. Kibana applications, detections) | ||
| * ECS project (e.g. docs, tooling) | ||
| The goal here is to research and understand the impact of these changes on users in the community and development teams across Elastic. 2-5 sentences each. | ||
| --> | ||
|
|
||
| ## Concerns | ||
|
|
||
| <!-- | ||
| Stage 1: Identify potential concerns, implementation challenges, or complexity. Spend some time on this. Play devil's advocate. Try to identify the sort of non-obvious challenges that tend to surface later. The goal here is to surface risks early, allow everyone the time to work through them, and ultimately document resolution for posterity's sake. | ||
| --> | ||
|
|
||
| <!-- | ||
| Stage 2: Document new concerns or resolutions to previously listed concerns. It's not critical that all concerns have resolutions at this point, but it would be helpful if resolutions were taking shape for the most significant concerns. | ||
| --> | ||
|
|
||
| <!-- | ||
| Stage 3: Document resolutions for all existing concerns. Any new concerns should be documented along with their resolution. The goal here is to eliminate the risk of churn and instability by resolving outstanding concerns. | ||
| --> | ||
|
|
||
| <!-- | ||
| Stage 4: Document any new concerns and their resolution. The goal here is to eliminate risk of churn and instability by ensuring all concerns have been addressed. | ||
| --> | ||
|
|
||
| ## Real-world implementations | ||
|
|
||
| <!-- | ||
| Stage 4: Identify at least one real-world, production-ready implementation that uses these updated field definitions. An example of this might be a GA feature in an Elastic application in Kibana. | ||
| --> | ||
|
|
||
| ## People | ||
|
|
||
| The following are the people that consulted on the contents of this RFC. | ||
|
|
||
| * @peasead | author | ||
| * @devonakerr | sponsor | ||
| * @dcode, @peasead | subject matter expert | ||
|
|
||
| ## References | ||
|
peasead marked this conversation as resolved.
Outdated
|
||
|
|
||
| * [VirusTotal Filebeat module PR](https://github.com/elastic/beats/pull/21815) | ||
| * [VirusTotal API](https://developers.virustotal.com/v3.0/reference) | ||
| * [Emerson FSF](https://github.com/EmersonElectricCo/fsf) | ||
| * [Target Strelka](https://github.com/target/strelka) | ||
| * [Lockheed Martin LAIKABOSS](https://github.com/lmco/laikaboss) | ||
|
|
||
| ### RFC Pull Requests | ||
|
|
||
| <!-- An RFC should link to the PRs for each of it stage advancements. --> | ||
|
|
||
| * Stage 1: https://github.com/elastic/ecs/pull/1071 | ||
|
|
||
| <!-- | ||
| * Stage 1: https://github.com/elastic/ecs/pull/NNN | ||
| ... | ||
| --> | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,193 @@ | ||
| --- | ||
| - name: pe | ||
| type: group | ||
| fields: | ||
| - name: main_icon | ||
|
peasead marked this conversation as resolved.
Outdated
|
||
| level: extended | ||
| type: object | ||
| description: > | ||
| Hashes of embedded program icon. | ||
| fields: | ||
| - name: dhash | ||
| level: extended | ||
| type: keyword | ||
| description: > | ||
| Difference Hash (dhash) for a given PE file. | ||
|
peasead marked this conversation as resolved.
Outdated
|
||
| example: b806e17c8e330d82 | ||
|
|
||
| - name: md5 | ||
|
peasead marked this conversation as resolved.
Outdated
|
||
| level: extended | ||
| type: keyword | ||
| description: > | ||
| MD5 hash of raw icon data. | ||
| example: 6d1cae6272afbb88876ed6476b990d8c | ||
|
|
||
| - name: debug | ||
|
peasead marked this conversation as resolved.
Outdated
|
||
| level: extended | ||
| type: keyword | ||
| description: Debug information, if present | ||
| example: { "offset" : 1296336, "size" : 816, "type_str" : "IMAGE_DEBUG_TYPE_POGO", "type" : 13, "timestamp" : "Wed Oct 21 09:01:33 2020" } | ||
|
|
||
| - name: import_list | ||
| level: extended | ||
| type: keyword | ||
| description: List of all imported functions | ||
| example: { "library_name" : "mscoree.dll", "imported_functions" : "GetFileVersionInfoSizeA" } | ||
|
peasead marked this conversation as resolved.
Outdated
|
||
|
|
||
| - name: sections | ||
| level: extended | ||
| description: > | ||
| Data about sections of compiled binary PE | ||
| type: keyword | ||
| example: { "chi2" : 3027194, "virtual_address": 8192, "entropy": 6.24, "flags": "rx", "name": ".text", "raw_size": 198144 } | ||
|
|
||
| - name: resource_details | ||
| level: extended | ||
| type: keyword | ||
| description: > | ||
| If the PE contains resources, some info about them | ||
| example: { "chi2": -1, "filetype": "English text", "entropy": 0, "sha256": "e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855", "lang": "CHINESE SIMPLIFIED" } | ||
|
peasead marked this conversation as resolved.
Outdated
|
||
|
|
||
| - name: resource_languages | ||
| level: extended | ||
| type: keyword | ||
| short: List of resource languages. | ||
| description: > | ||
| Digest of languages found in resources. Key is language (as string) and | ||
| value is how many resources there are having that language (as integer) | ||
| example: { "ENGLISH US": 1, "CHINESE SIMPLIFIED": 760 } | ||
|
peasead marked this conversation as resolved.
Outdated
|
||
|
|
||
| - name: resource_types | ||
|
peasead marked this conversation as resolved.
Outdated
|
||
| level: extended | ||
| type: keyword | ||
| short: List of resource types. | ||
| description: > | ||
| Digest of resource types. Key is resource type (as string) and value | ||
| is how many resources there are of that specific type (as integer) | ||
| example: { "RT_VERSION": 1, "RT_MANIFEST": 1 } | ||
|
|
||
| - name: packers | ||
| level: extended | ||
| type: keyword | ||
| short: Identifies packers used. | ||
| description: > | ||
| Identifies packers used on Windows PE files by several tools and AVs. | ||
| Keys are tool names and values are identified packers, both strings. | ||
| see `file.pe.packers` for merged list of packers from all tools. | ||
| example: { "tool_name": "PEiD", "name": ".NET executable" } | ||
|
|
||
| - name: exports | ||
| level: extended | ||
| type: keyword | ||
| description: > | ||
| List of symbols exported by PE | ||
| example: DllInstall, DllRegisterServer, DllUnregisterServer | ||
|
peasead marked this conversation as resolved.
Outdated
|
||
|
|
||
| - name: creation_date | ||
| level: extended | ||
| short: Build or compile date. | ||
| description: > | ||
| Extracted when possible from the file's metadata. Indicates when it was | ||
| built or compiled. It can also be faked by malware creators. | ||
| type: date | ||
|
peasead marked this conversation as resolved.
Outdated
|
||
| example: "2020-11-05T17:25:47.000Z" | ||
|
|
||
| - name: authentihash | ||
| level: extended | ||
| description: > | ||
| Authentihash of the PE file. | ||
| type: keyword | ||
| example: ac9555d914bbb112ecc5f15bb9887ca8371f493ab0941344e976bb8410c8aa78 | ||
|
|
||
| - name: compile_timestamp | ||
| level: extended | ||
| description: > | ||
| Compile timestamp of the PE file. | ||
| type: date | ||
| example: "2020-11-05T17:25:47.000Z" | ||
|
|
||
| - name: compiler_product_versions | ||
|
peasead marked this conversation as resolved.
Outdated
|
||
| level: extended | ||
| type: keyword | ||
| description: > | ||
| Version of the compiler. | ||
| example: VS98 (6.0) build 8168 | ||
|
|
||
| - name: rich_pe_header_hash | ||
|
peasead marked this conversation as resolved.
Outdated
|
||
| level: extended | ||
| type: keyword | ||
| description: > | ||
| Hash of the PE header. | ||
| example: 5aa1aa0f2b4be70397a1e9e2b87627cd | ||
|
|
||
| - name: entry_point | ||
| level: extended | ||
| description: > | ||
| Entry point of the PE file. | ||
|
peasead marked this conversation as resolved.
Outdated
|
||
| format: string | ||
| type: long | ||
|
peasead marked this conversation as resolved.
Outdated
|
||
| example: 25856 | ||
|
|
||
| - name: machine_type | ||
| level: extended | ||
| description: > | ||
| Machine type of the PE file. | ||
| type: keyword | ||
| example: "Intel 386 or later, and compatibles", "AMD AMD64", 332, 34404 | ||
|
|
||
| - name: overlay | ||
|
peasead marked this conversation as resolved.
Outdated
|
||
| level: extended | ||
| description: > | ||
| Overlay information of the PE file. | ||
| type: object | ||
| fields: | ||
| - name: chi2 | ||
| level: extended | ||
| description: > | ||
| Chi2 information of the PE file. | ||
| type: float | ||
| format: bytes | ||
| example: 6047 | ||
|
|
||
| - name: entropy | ||
| level: extended | ||
| description: > | ||
| Entropy information of the PE file. | ||
| type: float | ||
| example: 5.221 | ||
|
|
||
| - name: filetype | ||
| level: extended | ||
| description: > | ||
| Filetype of the PE file. | ||
| type: keyword | ||
| example: Data, "ASCII text" | ||
|
|
||
| - name: md5 | ||
| level: extended | ||
| description: > | ||
| Overlay MD5 hash of the PE file. | ||
| type: keyword | ||
| example: 9ac2c4965776e2483ffd11718d653a77 | ||
|
|
||
| - name: offset | ||
| level: extended | ||
| description: > | ||
| Offset of the overlay information of the PE file. | ||
| type: long | ||
| example: 32256 | ||
|
|
||
| - name: size | ||
| level: extended | ||
| description: > | ||
| Size of the PE file. | ||
| format: bytes | ||
| type: long | ||
| example: 512, 7168 | ||
|
|
||
| - name: rich_pe_header_hash | ||
| level: extended | ||
| description: > | ||
| Hash of the header for the PE file. | ||
| type: keyword | ||
| example: 5aa1aa0f2b4be70397a1e9e2b87627cd | ||
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.