diff --git a/rfcs/text/0014-extend-file-pe.md b/rfcs/text/0014-extend-file-pe.md new file mode 100644 index 0000000000..f1cddf435c --- /dev/null +++ b/rfcs/text/0014-extend-file-pe.md @@ -0,0 +1,139 @@ +# 0014: Extend the PE field set + +- Stage: **1 (draft)** +- Date: **2021-02-08** + +The Portable Executable (PE) sub-field, of the `file` top-level fieldset, can be updated to include more file attributes to aid in file analysis. This additional document metadata can be used for malware research, as well as coding and other application development efforts. + +## Fields + +This RFC is to create 25 additional sub-fields within the `file.pe` fieldset. + +| Name | Type | Description | +| ---- | ---- | ----------- | +| pe.authentihash | keyword | Authentihash of the PE file. | +| pe.compile_timestamp | date | Compile timestamp of the PE file. | +| pe.compiler | nested | Compiler information. | +| pe.compiler.version | keyword | Version of the compiler. | +| pe.compiler.name | keyword | Name of the compiler. | +| pe.creation_date | date | Extracted when possible from the file's metadata. Indicates when it was built or compiled. It can also be faked by malware creators. | +| pe.entry_point | keyword | Relative byte offset to the base of the PE file. | +| pe.exports | keyword | List of symbols exported by PE | +| pe.debug | nested | Debug information, if present | +| pe.debug.offset | keyword | Debug offset information. | +| pe.debug.size | keyword | Size of the debug information. | +| pe.debug.type | keyword | Information type generated by the debug options. | +| pe.debug.timestamp | date | Timestamp of the debug information. | +| pe.imports | flattened | List of all imported functions | +| pe.sections | nested | Data about sections of compiled binary PE | +| pe.sections.chi2 | long | Chi-square probability distribution. | +| pe.sections.virtual_address | long | Virtual address available to the file. | +| pe.sections.entropy | float | Measurement of entropy randomness in the file. | +| pe.sections.flags | keyword | Section flags of the file. | +| pe.sections.name | keyword | Section names of the file. | +| pe.sections.raw_size | long | Size of the section or the size of the initialized data on disk. | +| pe.resources | nested | If the PE contains resources, some info about them | +| pe.resources.chi2 | long | Chi-square probability distribution | +| pe.resources.filetype | keyword | File type of the resources section | +| pe.resources.entropy | long | Measurement of entropy randomness in the resources section. | +| pe.resources.sha256 | keyword | SHA256 hash of resources section | +| pe.resources.language | keyword | Language identification | +| pe.resources.type | keyword | List of resource types. | +| pe.machine_type | keyword | Machine type of the PE file. | +| pe.packers | keyword | List of packers and tools used. | +| pe.rich_header.hash.md5 | keyword | Hash of the PE header. | +| pe.icon | nested | Information of embedded program icon. | +| pe.icon.hash | nested | Hash information for the embedded program icon. | +| pe.icon.hash.dhash | keyword | Difference Hash (dhash) to find files with a visually similar icon or thumbnail. | + + +[New `pe.yml` fields](pe/pe.yml) + + + +## Usage + +In performing file analysis, specifically for malware research, understanding file similarities can be used to chain together malware samples and families to identify campaigns and possibly attribution. Additionally, understanding how malware components are re-used is useful in understanding malware telemetry, especially in understanding the impact being made through the introduction of defensive countermeasures. + +As an example, if XDR vendors deploys a new malware model to defeat a specific type of ransomware and we start observing a change and/or relationship to the headers, import tables, packers, etc of that malware family, we can make assumptions that the changes to the malware model are making an impact against the malware family. + +As another example, tracking file metadata for specific families is useful in predicting new campaigns if we see similar file metadata being used for new samples. [Example](https://www.bleepingcomputer.com/news/security/maze-ransomware-is-shutting-down-its-cybercrime-operation/), the Maze ransomware family shutting down and re-purposing as Egregor. + +## Source data + +This type of data can be provided by logs from VirusTotal, Reversing Labs, Lockheed Martin's LAIKABOSS, Emerson's File Scanning Framework, Target's Strelka, or other file/malware analysis platforms. + +* [VirusTotal Filebeat module PR](https://github.com/elastic/beats/pull/21815) +* [VirusTotal API](https://developers.virustotal.com/v3.0/reference) +* [Emerson FSF](https://github.com/EmersonElectricCo/fsf) +* [Target Strelka](https://github.com/target/strelka) +* [Lockheed Martin LAIKABOSS](https://github.com/lmco/laikaboss) + + + + + + + +## Scope of impact + +There should be no breaking changes, depreciation strategies, or significant refactoring as this is extending the existing fieldset. + +While likely not a large-scale ECS project, there would be documentation updates needed to explain the new fields. + + + +## Concerns + + + + + + + +## People + +The following are the people that consulted on the contents of this RFC. + +* @peasead | author +* @devonakerr | sponsor +* @dcode, @peasead | subject matter expert + +## References + +* [VirusTotal Filebeat module PR](https://github.com/elastic/beats/pull/21815) +* [VirusTotal API](https://developers.virustotal.com/v3.0/reference) +* [Emerson FSF](https://github.com/EmersonElectricCo/fsf) +* [Target Strelka](https://github.com/target/strelka) +* [Lockheed Martin LAIKABOSS](https://github.com/lmco/laikaboss) + +### RFC Pull Requests + + + +* Stage 1: https://github.com/elastic/ecs/pull/1071 + + diff --git a/rfcs/text/0014/pe.yml b/rfcs/text/0014/pe.yml new file mode 100644 index 0000000000..302464a001 --- /dev/null +++ b/rfcs/text/0014/pe.yml @@ -0,0 +1,216 @@ +--- +- name: pe + + fields: + + - name: icon.hash.dhash + level: extended + type: keyword + description: > + Difference Hash (dhash) to find files with a visually similar icon or thumbnail. + + example: b806e17c8e330d82 + + - name: debug + level: extended + type: nested + description: > + Debug information, if present + + - name: debug.offset + level: extended + type: keyword + description: Debug offset information. + example: 1296336 + + - name: debug.size + level: extended + type: long + format: bytes + description: Size of the debug information. + example: 816 + + - name: debug.type + level: extended + type: keyword + description: Information type generated by the debug options. + example: IMAGE_DEBUG_TYPE_POGO + + - name: debug.timestamp + level: extended + type: date + description: Timestamp of the debug information. + example: "2020-11-05T17:25:47.000Z" + + - name: imports + level: extended + type: flattened + description: List of all imported functions + example: '{ "library_name" : "mscoree.dll", "imported_functions" : "GetFileVersionInfoSizeA" }' + + - name: sections + level: extended + description: > + Data about sections of compiled binary PE + type: nested + + - name: sections.chi2 + level: extended + description: Chi-square probability distribution. + type: long + example: 3027194 + + - name: sections.virtual_address + level: extended + description: Virtual address available to the file. + type: long + format: bytes + example: 8192 + + - name: sections.entropy + level: extended + description: Measurement of entropy randomness in the file. + type: float + example: 6.24 + + - name: sections.flags + level: extended + description: Section flags of the file. + type: keyword + example: rx + + - name: sections.name + level: extended + description: Section names of the file. + type: keyword + example: .text, .data + + - name: sections.raw_size + level: extended + description: Size of the section or the dize of the initialized data on disk. + type: long + format: bytes + example: 198144 + + - name: resources + level: extended + type: nested + description: > + If the PE contains resources, some info about them + + - name: resources.chi2 + level: extended + description: Chi-square probability distribution. + type: long + example: -1 + + - name: resources.filetype + level: extended + description: File type of the resources section. + type: keyword + example: Data + + - name: resources.entropy + level: extended + description: Measurement of entropy randomness in the resources section. + type: long + example: 0, 1 + + - name: resources.sha256 + level: extended + description: SHA256 hash of resources section. + type: keyword + example: e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855 + + - name: resources.language + level: extended + description: Language identification. + type: keyword + example: "CHINESE SIMPLIFIED" + + - name: resources.type + level: extended + type: keyword + short: List of resource types. + description: > + Digest of resource types. + example: '["RT_VERSION", "RT_MANIFEST"]' + normalize: + - array + + - name: exports + level: extended + type: keyword + description: > + List of symbols exported by PE + example: '["DllInstall", "DllRegisterServer", "DllUnregisterServer"]' + normalize: + - array + + - name: creation_date + level: extended + short: Build or compile date. + description: > + Extracted when possible from the file's metadata. Indicates when it was + built or compiled. It can also be faked by malware creators. + type: date + example: "2020-11-05T17:25:47.000Z" + + - name: authentihash + level: extended + description: > + Authentihash of the PE file. + type: keyword + example: ac9555d914bbb112ecc5f15bb9887ca8371f493ab0941344e976bb8410c8aa78 + + - name: compile_timestamp + level: extended + description: > + Compile timestamp of the PE file. + type: date + example: "2020-11-05T17:25:47.000Z" + + - name: compiler.name + level: extended + type: keyword + description: > + Name of the compiler + example: Clang + + - name: compiler.version + level: extended + type: keyword + description: > + Version of the compiler. + example: 11.0.0 + + - name: rich_header.hash.md5 + level: extended + type: keyword + description: > + MD5 hash of the header for the PE file. + + example: 5aa1aa0f2b4be70397a1e9e2b87627cd + + - name: entry_point + level: extended + description: > + Relative byte offset to the base of the PE file. + type: keyword + example: 25856 + + - name: machine_type + level: extended + description: > + Machine type of the PE file. + type: keyword + example: "Intel 386 or later, and compatibles" + + - name: packers + level: extended + description: > + List of packers and tools used. + type: keyword + example: '["ASPack v2.12", ".NET executable"]' + normalize: + - array