Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
49 commits
Select commit Hold shift + click to select a range
7080358
Merge pull request #1 from elastic/master
peasead Oct 20, 2020
314f9ab
Merge pull request #2 from elastic/master
peasead Nov 3, 2020
1448cd6
Merge pull request #3 from elastic/master
peasead Nov 4, 2020
16aae5f
Merge pull request #4 from elastic/master
peasead Nov 5, 2020
ef7bd12
initial commit
peasead Nov 5, 2020
0107542
added PR#
peasead Nov 5, 2020
de73a01
removed field present in code_signature
peasead Nov 10, 2020
714c859
removed field present in code_signature
peasead Nov 10, 2020
07c011d
updated work in signature
peasead Nov 20, 2020
aeadc6b
move executable fields to segments.
peasead Nov 20, 2020
29ecf43
removed signature fields
peasead Dec 23, 2020
6d77439
removed file. from field names
peasead Dec 23, 2020
16ad2bc
Update rfcs/text/0000-create-file-mach-o.md
peasead Dec 23, 2020
6969054
Update rfcs/text/0000-create-file-mach-o.md
peasead Dec 23, 2020
692cc5a
Update rfcs/text/0000-create-file-mach-o.md
peasead Dec 23, 2020
f64a08d
Update rfcs/text/0000-create-file-mach-o.md
peasead Dec 23, 2020
805f6c5
Update rfcs/text/0000-create-file-mach-o.md
peasead Dec 23, 2020
cd6a5e0
Update rfcs/text/0000-create-file-mach-o.md
peasead Dec 23, 2020
cdd9766
Update rfcs/text/0000-create-file-mach-o.md
peasead Dec 23, 2020
b8c02ce
renamed mach-o to macho
peasead Dec 23, 2020
6e8e729
Merge branch 'file.macho-create' of github.com:peasead/ecs into file.…
peasead Dec 23, 2020
72ee845
removed plurality from "header"
peasead Dec 23, 2020
c6f20b2
created usage doc
peasead Dec 23, 2020
bbd1afd
removed header plurality, sections to flattened
peasead Jan 13, 2021
e0e5a1a
changed macho.segments to nested
peasead Feb 1, 2021
689fa39
typo in segments.size
peasead Feb 1, 2021
ccf1b88
corrected segments.sections fieldtype
peasead Feb 1, 2021
84bdb2e
added cdhash to RFC doc.
peasead Feb 1, 2021
c049773
Fixed segments.offset fieldtype
peasead Feb 1, 2021
3fd0931
typo on rfc doc for segments.flags
peasead Feb 1, 2021
a53a52b
back to headers from header
peasead Feb 3, 2021
276acfe
Update 0000-create-file-macho.md
peasead Feb 9, 2021
d996d6f
Update macho.yml
peasead Feb 9, 2021
fc30c23
ecs housekeeping edits
ebeahan Feb 10, 2021
442d212
Update rfcs/text/0000-create-file-macho.md
peasead Feb 16, 2021
a7ff6ae
Update rfcs/text/0000-create-file-macho.md
peasead Feb 16, 2021
a96fd55
Update rfcs/text/0000-create-file-macho.md
peasead Feb 16, 2021
5177e2e
Update rfcs/text/0000-create-file-macho.md
peasead Mar 11, 2021
1347c0d
Update rfcs/text/0000-create-file-macho.md
peasead Mar 11, 2021
3f68b02
Make macho a nested object.
dcode Apr 7, 2021
8fc5ca4
Update 0000-create-file-macho.md
peasead May 4, 2021
4766435
Update macho.yml
peasead May 4, 2021
9ddb33f
Update 0000-create-file-macho.md
peasead May 4, 2021
2f65d5b
syncing field descriptions
peasead May 4, 2021
9253abb
removed examples with multiple values
peasead May 4, 2021
70b5b22
Update rfcs/text/0000-create-file-macho.md
peasead May 18, 2021
1db5732
Merge branch 'master' into file.macho-create
peasead May 28, 2021
867eecb
Merge branch 'master' into file.macho-create
djptek Jun 15, 2021
1e7fcf8
Merge branch 'master' into file.macho-create
djptek Jul 1, 2021
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
152 changes: 152 additions & 0 deletions rfcs/text/0000-create-file-macho.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,152 @@
# 0000: Create the Mach-O sub-field of the File fieldset

- Stage: **1 (draft)**
- Date: **TBD**

Create the Mach Object (Mach-O) sub-field, of the `file` or `process` top-level fieldsets. This document metadata can be used for malware research, as well as coding and other application development efforts.

## Fields

**Stage 0**

This RFC is to create the Mach-O sub-field within the `file.` fieldset. This will include 35 sub-fields. `macho` itself is a nested
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd prefer to leave the top-level ECS fields all as objects.

What about defining macho.architectures as nested and having each architecture's fields defined underneath? Borrowing from what @andrewstucki proposed in elastic/beats#24195.

- name: macho
   title: Mach-O file information.
   type: group
   description: These fields contain macOS Mach Object (Mach-O) metadata.
   fields:
     - name: architectures
        description: Object files contained inside this file by architecture
        type: nested
        fields:
          - name: cpu
             description: CPU architecture target for the file.
             type: keyword
             example: 64-bit

          - name: byte_order
             description: Byte order for the file.
             type: keyword
             example: little-endian

          - name: type
             description: Mach-O file type.
             type: keyword

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, you're suggesting changing:

  • macho.cpu.{architecture,byte_order,type,...} to macho.architecture.{cpu,byte_order,type,...}?
  • macho.* being a group instead of nested and macho.architecture.cpu{cpu,byte_order,type,...} being nested?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, mostly. One difference - I'm proposing macho.architecture being the nested field that all the other fields would fall under for each architecture type.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The goal is that the "interface" to ELF, MachO, and PE have a similar feel. This will make building analytics and understanding of the data model. I discussed with @andrewstucki a while back about means to do that. The problem is that MachO fat binaries are literally two fully, self-sufficient executables in one big MachO wrapper. This means that any field that we decide to nest as architecture specific will have to contain all the architecture-dependent data. The result of this discussion is that it probably made the most sense to make file.macho as a nested type, which drastically simplifies the data.

We should meet to discuss.

field to account for [multiarchitecture binaries](https://en.wikipedia.org/wiki/Fat_binary). Each architecture will be represented
by a nested objected located at `file.macho`.

| Name | Type | Description |
|--------------------------------|---------|-----------------------------------------------------------------|
| macho.cdhash | keyword | Code Digest (CD) SHA256 hash of the first 20-bytes of the file. |
| macho.cpu | object | CPU information for the file. |
| macho.cpu.architecture | keyword | CPU architecture target for the file. |
| macho.cpu.byte_order | keyword | CPU byte order for the file. |
| macho.cpu.subtype | keyword | CPU subtype for the file. |
| macho.cpu.type | keyword | CPU type for the file. |
| macho.headers | nested | Header information for the file. |
| macho.headers.commands.number | long | Number of load commands for the Mach-O header. |
| macho.headers.commands.size | long | Size of load commands of the Mach-O header. |
| macho.headers.commands.type | keyword | Type of the load commands for the Mach-O header. |
| macho.headers.magic | keyword | Magic field of the Mach-O header. |
| macho.headers.flags | keyword | Flags set in the Mach-O header. |
| macho.page_size | long | Page size of the file. |
| macho.sections | nested | Section information for the segment of the file. |
| macho.sections.chi2 | float | Chi-square probability distribution of the section. |
| macho.sections.entropy | float | Shannon entropy calculation from the section. |
| macho.sections.flags | keyword | Section flags for the segment of the file. |
| macho.sections.name | keyword | Section name for the segment of the file. |
| macho.sections.type | keyword | Section type for the segment of the file. |
| macho.sections.physical_offset | keyword | Section List offset. |
| macho.sections.physical_size | long | Section List physical size. |
| macho.sections.virtual_address | keyword | Section List virtual address. |
| macho.sections.virtual_size | long | Section List virtual size. |
| macho.segments | nested | Segment information for the file. |
| macho.segments.name | keyword | Name of this segment. |
| macho.segments.physical_offset | keyword | File offset of this segment. |
| macho.segments.physical_size | long | Amount of memory to map from the file. |
| macho.segments.sections | keyword | Section names contained in this segment. |
| macho.segments.virtual_address | keyword | Memory address of this segment. |
| macho.segments.virtual_size | long | Memory size of this segment. |

**Stage 1**

[New `macho.yml` candidate](macho/macho.yml)]

<!--
Stage 3: Add or update all remaining field definitions. The list should now be exhaustive. The goal here is to validate the technical details of all remaining fields and to provide a basis for releasing these field definitions as beta in the schema. Use GitHub code blocks with yml syntax formatting.
-->

## Usage

**Stage 1**

In performing file analysis, specifically for malware research, understanding file similarities can be used to chain together malware samples and families to identify campaigns and possibly attribution. Additionally, understanding how malware components are re-used is useful in understanding malware telemetry, especially in understanding the impact being made through the introduction of defensive countermeasures.

As an example, if XDR vendors deploys a new malware model to defeat a specific type of ransomware and we start observing a change and/or relationship to the headers, import tables, libraries, etc of that malware family, we can make assumptions that the changes to the malware model are making an impact against the malware family.

As another example, tracking file metadata for specific families is useful in predicting new campaigns if we see similar file metadata being used for new samples. [Example](https://www.bleepingcomputer.com/news/security/maze-ransomware-is-shutting-down-its-cybercrime-operation/), the Maze ransomware family shutting down and re-purposing as Egregor (this is for Windows malware, but the concept is the same).

## Source data

**Stage 1**

This type of data can be provided by logs from VirusTotal, Reversing Labs, Lockheed Martin's LAIKABOSS, Emerson's File Scanning Framework, Target's Strelka, or other file/malware analysis platforms.

* [VirusTotal API](https://developers.virustotal.com/v3.0/reference)
* [Emerson FSF](https://github.com/EmersonElectricCo/fsf)
* [Target Strelka](https://github.com/target/strelka)
* [Lockheed Martin LAIKABOSS](https://github.com/lmco/laikaboss)
* [LIEF Analysis Library](https://lief.quarkslab.com/doc/latest/api/python/macho.html)

<!--
Stage 1: Provide a high-level description of example sources of data. This does not yet need to be a concrete example of a source document, but instead can simply describe a potential source (e.g. nginx access log). This will ultimately be fleshed out to include literal source examples in a future stage. The goal here is to identify practical sources for these fields in the real world. ~1-3 sentences or unordered list.
-->

<!--
Stage 2: Included a real world example source document. Ideally this example comes from the source(s) identified in stage 1. If not, it should replace them. The goal here is to validate the utility of these field changes in the context of a real world example. Format with the source name as a ### header and the example document in a GitHub code block with json formatting.
-->

<!--
Stage 3: Add more real world example source documents so we have at least 2 total, but ideally 3. Format as described in stage 2.
-->

## Scope of impact

**Stage 2**

There should be no breaking changes, depreciation strategies, or significant refactoring as this is creating a sub-field for the existing `file.` fieldset.

While likely not a large-scale ECS project, there would be documentation updates needed to explain the new fields.

<!--
Stage 2: Identifies scope of impact of changes. Are breaking changes required? Should deprecation strategies be adopted? Will significant refactoring be involved? Break the impact down into:
* Ingestion mechanisms (e.g. beats/logstash)
* Usage mechanisms (e.g. Kibana applications, detections)
* ECS project (e.g. docs, tooling)
The goal here is to research and understand the impact of these changes on users in the community and development teams across Elastic. 2-5 sentences each.
-->

## Concerns

<!--
Stage 1: Identify potential concerns, implementation challenges, or complexity. Spend some time on this. Play devil's advocate. Try to identify the sort of non-obvious challenges that tend to surface later. The goal here is to surface risks early, allow everyone the time to work through them, and ultimately document resolution for posterity's sake.
-->

<!--
Stage 2: Document new concerns or resolutions to previously listed concerns. It's not critical that all concerns have resolutions at this point, but it would be helpful if resolutions were taking shape for the most significant concerns.
-->

<!--
Stage 3: Document resolutions for all existing concerns. Any new concerns should be documented along with their resolution. The goal here is to eliminate the risk of churn and instability by resolving outstanding concerns.
-->

<!--
Stage 4: Document any new concerns and their resolution. The goal here is to eliminate risk of churn and instability by ensuring all concerns have been addressed.
-->

## Real-world implementations

<!--
Stage 4: Identify at least one real-world, production-ready implementation that uses these updated field definitions. An example of this might be a GA feature in an Elastic application in Kibana.
-->

## People

The following are the people that consulted on the contents of this RFC.

* @peasead | author
* @devonakerr | sponsor
* @dcode, @peasead | subject matter expert

## References

<!-- Insert any links appropriate to this RFC in this section. -->

### RFC Pull Requests

<!-- An RFC should link to the PRs for each of it stage advancements. -->

* Stage 1: https://github.com/elastic/ecs/pull/1346

<!--
* Stage 1: https://github.com/elastic/ecs/pull/NNN
...
-->
19 changes: 19 additions & 0 deletions rfcs/text/macho/docs/usage/macho.asciidoc
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
[[ecs-macho-ussage]]
=== Mach-O Usage

--Description--

[discrete]
=== Mach-O Field Details
| Field | Description | Level |
| ---- | ---- | ----------- |
| macho.cpu | CPU information for the file. | extended |
| ... | ... | ... |
| ... | ... | ... |
| ... | ... | ... |

[discrete]
=== Field Reuse
The `macho` fields are expected to be nested at: `dll.macho`, `file.macho`, `process.macho`.

Note also that the `macho` fields are not expected to be used directly at the root of the events.
169 changes: 169 additions & 0 deletions rfcs/text/macho/macho.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,169 @@
---
- name: macho
title: Mach-O file information.
group: 2
description: >
These fields contain macOS Mach Object (Mach-O) metadata.
type: nested
reusable:
top_level: false
expected:
- file
- process
fields:
- name: cdhash
description: Code Digest (CD) SHA256 hash of the first 20-bytes of the file.
type: keyword
level: extended
example: 2035094a7065b29421e7a51f51db9bd61807c3628f210b1f8e667235777dc592

- name: cpu.architecture
description: CPU architecture target for the file.
type: keyword
level: extended
example: 64-bit

- name: cpu.byte_order
description: CPU byte order for the file.
type: keyword
level: extended
example: Little endian

- name: cpu.subtype
description: CPU subtype for the file.
type: keyword
level: extended
example: ARM (all) 64-bit

- name: cpu.type
description: CPU type for the file.
type: keyword
level: extended
example: ARM 64-bit

- name: headers.commands.number
description: Number of load commands for the Mach-O header.
type: long
level: extended
example: 23

- name: headers.commands.size
description: Size of load commands of the Mach-O header.
type: long
level: extended
format: bytes
example: 3888

- name: headers.commands.type
description: Type of the load commands for the Mach-O header.
type: keyword
level: extended
example: LC_SYMTAB

- name: headers.magic
description: Magic field of the Mach-O header.
type: keyword
level: extended
example: 0xfeedfacf

- name: headers.flags
description: Flags set in the Mach-O header.
type: keyword
level: extended
example: TWOLEVEL

- name: page_size
description: Page size of the file.
type: long
format: bytes
level: extended
example: 4096

- name: sections.chi2
description: Chi-squared probability distribution of the section.
type: float
level: extended
example: 3.413

- name: sections.entropy
description: Shannon entropy calculation from the section.
type: float
level: extended
example: 1.5

- name: sections.flags
description: Section flags for the segment of the file.
type: keyword
level: extended
example: SECTION_ATTRIBUTES_SYS

- name: sections.name
description: Section name for the segment of the file.
type: keyword
level: extended
example: __text

- name: sections.type
description: Section type for the segment of the file.
type: keyword
level: extended
example: S_REGULAR

- name: sections.physical_offset
description: Section List offset.
type: keyword
level: extended
example: 0x0

- name: sections.physical_size
description: Section List physical size.
type: long
level: extended
example: 311296

- name: sections.virtual_address
description: Section List virtual address.
type: keyword
level: extended
example: 0x0

- name: sections.virtual_size
description: Section List virtual size.
type: long
level: extended
example: 311296

- name: segments.name
description: Name of this segment.
type: keyword
level: extended
example: __TEXT

- name: segments.physical_offset
description: File offset of this segment.
type: keyword
level: extended
example: 0x0

- name: segments.physical_size
description: Amount of memory to map from the file.
type: long
level: extended
example: 311296

- name: segments.sections
level: extended
description: Section names contained in this segment.
type: keyword

- name: segments.virtual_address
description: Memory address of this segment.
type: keyword
level: extended
example: 0x0

- name: segments.virtual_size
description: Memory size of this segment.
type: long
level: extended
example: 311296