Skip to content
Merged
Show file tree
Hide file tree
Changes from 24 commits
Commits
Show all changes
57 commits
Select commit Hold shift + click to select a range
7080358
Merge pull request #1 from elastic/master
peasead Oct 20, 2020
314f9ab
Merge pull request #2 from elastic/master
peasead Nov 3, 2020
408816b
initial commit
peasead Nov 3, 2020
342cf00
update module
peasead Nov 3, 2020
4c5f266
further clarification
peasead Nov 3, 2020
2213d49
updates
peasead Nov 3, 2020
a8f1a80
'make' and 'make test'
peasead Nov 3, 2020
6582bde
changelog
peasead Nov 3, 2020
51fd471
added PR
peasead Nov 3, 2020
41a5bf0
added PR
peasead Nov 3, 2020
7744bd1
reorganized and fixed orig pe.yml
peasead Nov 5, 2020
205eeac
updatd SMEs
peasead Nov 5, 2020
7fa6ae5
reran make to reset files
peasead Nov 5, 2020
a80581e
removed changelog entry
peasead Nov 5, 2020
ec587e4
removed existing fields
peasead Nov 5, 2020
6e86d31
Update rfcs/text/0000-extend-file-pe.md
peasead Nov 5, 2020
24baef4
Update rfcs/text/0000-extend-file-pe.md
peasead Nov 6, 2020
c091452
Update rfcs/text/0000-extend-file-pe.md
peasead Nov 6, 2020
c1ac596
Update rfcs/text/0000-extend-file-pe.md
peasead Nov 6, 2020
70e68b0
Update rfcs/text/0000-extend-file-pe.md
peasead Nov 6, 2020
b9b2686
Update rfcs/text/pe/pe.yml
peasead Nov 6, 2020
80641f0
remove stage headers
peasead Nov 6, 2020
a0f193f
add examples and references
peasead Nov 6, 2020
b3e72da
removed vt module blob for now
peasead Nov 6, 2020
b6f9dfc
adjustments to entry_point as keyword
peasead Nov 10, 2020
6f6cd39
move rich_headers into its own fields
peasead Nov 10, 2020
b12961b
extended compiler to include name and version
peasead Nov 10, 2020
0063d23
adjusted dhash description
peasead Nov 10, 2020
f0e7d61
Update icon fields
peasead Nov 10, 2020
0bab0ec
duplicate fields
peasead Nov 10, 2020
4d2d65b
removed unnecessary hashing algos
peasead Nov 10, 2020
7cc04f8
moving overlay to file.*
peasead Nov 11, 2020
f5a0533
removing resource_languages in favor of resource_details
peasead Nov 11, 2020
4356939
removed packers, not part of peinfo
peasead Nov 11, 2020
53467c6
moved debug to nested fields
peasead Nov 11, 2020
f7c1af7
moved sections to nested
peasead Nov 11, 2020
958c646
updated imports name and change type to flattened
peasead Nov 19, 2020
5bbc6f5
resources rename
peasead Nov 19, 2020
4fa79d5
added "s" to types
peasead Nov 19, 2020
13d264f
remove resources.types aggregation
peasead Dec 23, 2020
550b038
removed plurality of resources.type
peasead Dec 23, 2020
74628f9
add nested resources fields to table
peasead Dec 23, 2020
a8d954e
update entry_point desc.
peasead Dec 23, 2020
ccbed5d
Update rfcs/text/pe/pe.yml
peasead Dec 23, 2020
cb7631a
Merge branch 'file.pe-extend' of github.com:peasead/ecs into file.pe-…
peasead Dec 23, 2020
4a68601
update/add pe.packers
peasead Dec 23, 2020
1f88931
fixed compiler type
peasead Dec 23, 2020
1bd64de
added pe.icon to table
peasead Dec 23, 2020
ad156fd
removed file. from names
peasead Dec 23, 2020
58fca91
Update rfcs/text/0000-extend-file-pe.md
peasead Jan 13, 2021
d54ad8c
Update pe.yml
peasead Jan 13, 2021
a63a716
Update pe.yml
peasead Jan 13, 2021
49d069b
combined debug.type and debut.type_str
peasead Feb 1, 2021
9e2d59d
field definition housekeeping
ebeahan Feb 5, 2021
9464876
adjust markdown comments to align with updated proposal stages
ebeahan Feb 8, 2021
3f816cb
assigning rfc number and set advance date
ebeahan Feb 8, 2021
aa000c2
rename using assigned rfc number
ebeahan Feb 8, 2021
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
139 changes: 139 additions & 0 deletions rfcs/text/0000-extend-file-pe.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,139 @@
# 0000: Extend the PE field set

- Stage: **1 (proposal)**
- Date: **TBD**

The Portable Executable (PE) sub-field, of the `file` top-level fieldset, can be updated to include more file attributes to aid in file analysis. This additional document metadata can be used for malware research, as well as coding and other application development efforts.

## Fields

This RFC is to create 25 additional sub-fields within the `file.pe` fieldset.
Comment thread
peasead marked this conversation as resolved.

| Name | Type | Description |
| ---- | ---- | ----------- |
| file.pe.authentihash | keyword | Authentihash of the PE file. |
Comment thread
peasead marked this conversation as resolved.
Outdated
Comment thread
peasead marked this conversation as resolved.
Outdated
| file.pe.compile_timestamp | date | Compile timestamp of the PE file. |
| file.pe.compiler | wildcard | Name and version of the compiler. |
| file.pe.creation_date | date | Extracted when possible from the file's metadata. Indicates when it was built or compiled. It can also be faked by malware creators. |
| file.pe.entry_point | long | Entry point of the PE file. |
| file.pe.exports | keyword | List of symbols exported by PE |
| file.pe.debug | flattened | Debug information, if present |
| file.pe.import_list | flattened | List of all imported functions |
Comment thread
peasead marked this conversation as resolved.
Outdated
| file.pe.sections | flattened | Data about sections of compiled binary PE |
Comment thread
peasead marked this conversation as resolved.
Outdated
| file.pe.resource_details | flattened | If the PE contains resources, some info about them |
| file.pe.resource_languages | flattened | Digest of languages found in resources. Key is language (as string) and value is how many resources there are having that language (as integer) |
| file.pe.resource_types | flattened | Digest of resource types. Key is resource type (as string) and value is how many resources there are of that specific type (as integer) |
| file.pe.packers | flattened | Identifies packers used on Windows PE files by several tools and AVs. Keys are tool names and values are identified packers, both strings. See `file.pe.packers` for merged list of packers from all tools. |
Comment thread
peasead marked this conversation as resolved.
Outdated
| file.pe.machine_type | keyword | Machine type of the PE file. |
| file.pe.main_icon.hash.dhash | keyword | Difference Hash for a given PE file. |
| file.pe.main_icon.hash.md5 | keyword | MD5 hash of raw icon data |
| file.pe.overlay.chi2 | float | Chi2 information of the PE file. |
| file.pe.overlay.entropy | float | Entropy information of the PE file. |
Comment thread
peasead marked this conversation as resolved.
Outdated
| file.pe.overlay.filetype | keyword | Filetype of the PE file. |
| file.pe.overlay.md5 | keyword | Overlay MD5 hash of the PE file. |
| file.pe.overlay.offset | long | Offset of the overlay information of the PE file. |
| file.pe.overlay.size | long | Size of the PE file. |
| file.pe.overlay.rich_pe_header_hash | keyword | Hash of the header for the PE file. |
| file.pe.packers | keyword | Merged list of all detected packers by all tools used. |
Comment thread
peasead marked this conversation as resolved.
Outdated
| file.pe.rich_pe_header_hash | keyword | Hash of the PE header. |

[New `pe.yml` fields](pe/pe.yml)

<!--
Stage 3: Add or update all remaining field definitions. The list should now be exhaustive. The goal here is to validate the technical details of all remaining fields and to provide a basis for releasing these field definitions as beta in the schema. Use GitHub code blocks with yml syntax formatting.
-->

## Usage

In performing file analysis, specifically for malware research, understanding file similarities can be used to chain together malware samples and families to identify campaigns and possibly attribution. Additionally, understanding how malware components are re-used is useful in understanding malware telemetry, especially in understanding the impact being made through the introduction of defensive countermeasures.

As an example, if XDR vendors deploys a new malware model to defeat a specific type of ransomware and we start observing a change and/or relationship to the headers, import tables, packers, etc of that malware family, we can make assumptions that the changes to the malware model are making an impact against the malware family.

As another example, tracking file metadata for specific families is useful in predicting new campaigns if we see similar file metadata being used for new samples. [Example](https://www.bleepingcomputer.com/news/security/maze-ransomware-is-shutting-down-its-cybercrime-operation/), the Maze ransomware family shutting down and re-purposing as Egregor.

## Source data

This type of data can be provided by logs from VirusTotal, Reversing Labs, Lockheed Martin's LAIKABOSS, Emerson's File Scanning Framework, Target's Strelka, or other file/malware analysis platforms.

* [VirusTotal Filebeat module PR](https://github.com/elastic/beats/pull/21815)
* [VirusTotal API](https://developers.virustotal.com/v3.0/reference)
* [Emerson FSF](https://github.com/EmersonElectricCo/fsf)
* [Target Strelka](https://github.com/target/strelka)
* [Lockheed Martin LAIKABOSS](https://github.com/lmco/laikaboss)

<!--
Stage 1: Provide a high-level description of example sources of data. This does not yet need to be a concrete example of a source document, but instead can simply describe a potential source (e.g. nginx access log). This will ultimately be fleshed out to include literal source examples in a future stage. The goal here is to identify practical sources for these fields in the real world. ~1-3 sentences or unordered list.
-->

<!--
Stage 2: Included a real world example source document. Ideally this example comes from the source(s) identified in stage 1. If not, it should replace them. The goal here is to validate the utility of these field changes in the context of a real world example. Format with the source name as a ### header and the example document in a GitHub code block with json formatting.
-->

<!--
Stage 3: Add more real world example source documents so we have at least 2 total, but ideally 3. Format as described in stage 2.
-->

## Scope of impact

There should be no breaking changes, depreciation strategies, or significant refactoring as this is extending the existing fieldset.

While likely not a large-scale ECS project, there would be documentation updates needed to explain the new fields.

<!--
Stage 2: Identifies scope of impact of changes. Are breaking changes required? Should deprecation strategies be adopted? Will significant refactoring be involved? Break the impact down into:
* Ingestion mechanisms (e.g. beats/logstash)
* Usage mechanisms (e.g. Kibana applications, detections)
* ECS project (e.g. docs, tooling)
The goal here is to research and understand the impact of these changes on users in the community and development teams across Elastic. 2-5 sentences each.
-->

## Concerns

<!--
Stage 1: Identify potential concerns, implementation challenges, or complexity. Spend some time on this. Play devil's advocate. Try to identify the sort of non-obvious challenges that tend to surface later. The goal here is to surface risks early, allow everyone the time to work through them, and ultimately document resolution for posterity's sake.
-->

<!--
Stage 2: Document new concerns or resolutions to previously listed concerns. It's not critical that all concerns have resolutions at this point, but it would be helpful if resolutions were taking shape for the most significant concerns.
-->

<!--
Stage 3: Document resolutions for all existing concerns. Any new concerns should be documented along with their resolution. The goal here is to eliminate the risk of churn and instability by resolving outstanding concerns.
-->

<!--
Stage 4: Document any new concerns and their resolution. The goal here is to eliminate risk of churn and instability by ensuring all concerns have been addressed.
-->

## Real-world implementations

<!--
Stage 4: Identify at least one real-world, production-ready implementation that uses these updated field definitions. An example of this might be a GA feature in an Elastic application in Kibana.
-->

## People

The following are the people that consulted on the contents of this RFC.

* @peasead | author
* @devonakerr | sponsor
* @dcode, @peasead | subject matter expert

## References
Comment thread
peasead marked this conversation as resolved.
Outdated

* [VirusTotal Filebeat module PR](https://github.com/elastic/beats/pull/21815)
* [VirusTotal API](https://developers.virustotal.com/v3.0/reference)
* [Emerson FSF](https://github.com/EmersonElectricCo/fsf)
* [Target Strelka](https://github.com/target/strelka)
* [Lockheed Martin LAIKABOSS](https://github.com/lmco/laikaboss)

### RFC Pull Requests

<!-- An RFC should link to the PRs for each of it stage advancements. -->

* Stage 1: https://github.com/elastic/ecs/pull/1071

<!--
* Stage 1: https://github.com/elastic/ecs/pull/NNN
...
-->
193 changes: 193 additions & 0 deletions rfcs/text/pe/pe.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,193 @@
---
- name: pe
type: group
fields:
- name: main_icon
Comment thread
peasead marked this conversation as resolved.
Outdated
level: extended
type: object
description: >
Hashes of embedded program icon.
fields:
- name: dhash
level: extended
type: keyword
description: >
Difference Hash (dhash) for a given PE file.
Comment thread
peasead marked this conversation as resolved.
Outdated
example: b806e17c8e330d82

- name: md5
Comment thread
peasead marked this conversation as resolved.
Outdated
level: extended
type: keyword
description: >
MD5 hash of raw icon data.
example: 6d1cae6272afbb88876ed6476b990d8c

- name: debug
Comment thread
peasead marked this conversation as resolved.
Outdated
level: extended
type: keyword
description: Debug information, if present
example: { "offset" : 1296336, "size" : 816, "type_str" : "IMAGE_DEBUG_TYPE_POGO", "type" : 13, "timestamp" : "Wed Oct 21 09:01:33 2020" }

- name: import_list
level: extended
type: keyword
description: List of all imported functions
example: { "library_name" : "mscoree.dll", "imported_functions" : "GetFileVersionInfoSizeA" }
Comment thread
peasead marked this conversation as resolved.
Outdated

- name: sections
level: extended
description: >
Data about sections of compiled binary PE
type: keyword
example: { "chi2" : 3027194, "virtual_address": 8192, "entropy": 6.24, "flags": "rx", "name": ".text", "raw_size": 198144 }

- name: resource_details
level: extended
type: keyword
description: >
If the PE contains resources, some info about them
example: { "chi2": -1, "filetype": "English text", "entropy": 0, "sha256": "e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855", "lang": "CHINESE SIMPLIFIED" }
Comment thread
peasead marked this conversation as resolved.
Outdated

- name: resource_languages
level: extended
type: keyword
short: List of resource languages.
description: >
Digest of languages found in resources. Key is language (as string) and
value is how many resources there are having that language (as integer)
example: { "ENGLISH US": 1, "CHINESE SIMPLIFIED": 760 }
Comment thread
peasead marked this conversation as resolved.
Outdated

- name: resource_types
Comment thread
peasead marked this conversation as resolved.
Outdated
level: extended
type: keyword
short: List of resource types.
description: >
Digest of resource types. Key is resource type (as string) and value
is how many resources there are of that specific type (as integer)
example: { "RT_VERSION": 1, "RT_MANIFEST": 1 }

- name: packers
level: extended
type: keyword
short: Identifies packers used.
description: >
Identifies packers used on Windows PE files by several tools and AVs.
Keys are tool names and values are identified packers, both strings.
see `file.pe.packers` for merged list of packers from all tools.
example: { "tool_name": "PEiD", "name": ".NET executable" }

- name: exports
level: extended
type: keyword
description: >
List of symbols exported by PE
example: DllInstall, DllRegisterServer, DllUnregisterServer
Comment thread
peasead marked this conversation as resolved.
Outdated

- name: creation_date
level: extended
short: Build or compile date.
description: >
Extracted when possible from the file's metadata. Indicates when it was
built or compiled. It can also be faked by malware creators.
type: date
Comment thread
peasead marked this conversation as resolved.
Outdated
example: "2020-11-05T17:25:47.000Z"

- name: authentihash
level: extended
description: >
Authentihash of the PE file.
type: keyword
example: ac9555d914bbb112ecc5f15bb9887ca8371f493ab0941344e976bb8410c8aa78

- name: compile_timestamp
level: extended
description: >
Compile timestamp of the PE file.
type: date
example: "2020-11-05T17:25:47.000Z"

- name: compiler_product_versions
Comment thread
peasead marked this conversation as resolved.
Outdated
level: extended
type: keyword
description: >
Version of the compiler.
example: VS98 (6.0) build 8168

- name: rich_pe_header_hash
Comment thread
peasead marked this conversation as resolved.
Outdated
level: extended
type: keyword
description: >
Hash of the PE header.
example: 5aa1aa0f2b4be70397a1e9e2b87627cd

- name: entry_point
level: extended
description: >
Entry point of the PE file.
Comment thread
peasead marked this conversation as resolved.
Outdated
format: string
type: long
Comment thread
peasead marked this conversation as resolved.
Outdated
example: 25856

- name: machine_type
level: extended
description: >
Machine type of the PE file.
type: keyword
example: "Intel 386 or later, and compatibles", "AMD AMD64", 332, 34404

- name: overlay
Comment thread
peasead marked this conversation as resolved.
Outdated
level: extended
description: >
Overlay information of the PE file.
type: object
fields:
- name: chi2
level: extended
description: >
Chi2 information of the PE file.
type: float
format: bytes
example: 6047

- name: entropy
level: extended
description: >
Entropy information of the PE file.
type: float
example: 5.221

- name: filetype
level: extended
description: >
Filetype of the PE file.
type: keyword
example: Data, "ASCII text"

- name: md5
level: extended
description: >
Overlay MD5 hash of the PE file.
type: keyword
example: 9ac2c4965776e2483ffd11718d653a77

- name: offset
level: extended
description: >
Offset of the overlay information of the PE file.
type: long
example: 32256

- name: size
level: extended
description: >
Size of the PE file.
format: bytes
type: long
example: 512, 7168

- name: rich_pe_header_hash
level: extended
description: >
Hash of the header for the PE file.
type: keyword
example: 5aa1aa0f2b4be70397a1e9e2b87627cd