-
Notifications
You must be signed in to change notification settings - Fork 451
[RFC] E-Mail #999
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RFC] E-Mail #999
Changes from 1 commit
9b4d64d
1a4c2fd
6a8af8f
a92a9d3
f11dea0
fe4e18b
f10cf03
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change | ||||
|---|---|---|---|---|---|---|
| @@ -0,0 +1,144 @@ | ||||||
| # 0008: Email | ||||||
| <!-- Leave this ID at 0000. The ECS team will assign a unique, contiguous RFC number upon merging the initial stage of this RFC. --> | ||||||
|
|
||||||
| - Stage: **1** <!-- Update to reflect target stage. See https://elastic.github.io/ecs/stages.html --> | ||||||
| - Date: **Oct 5th 2020** <!-- The ECS team sets this date at merge time. This is the date of the latest stage advancement. --> | ||||||
|
webmat marked this conversation as resolved.
Outdated
|
||||||
|
|
||||||
| This RFC proposes a new top-level field to facilitate email use cases. | ||||||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Email has similar challenges to web browsing when you start looking at how to model it. Obviously, what is generally just referred to as "email" is actually a complex system of processes and protocols: SMTP, IMAP, POP3, SPF, DMARC, DKIM, DNS, TLS, x509, etc. etc. I think having the top-level fieldset for I mostly bring this up to set the expectations that to accurate capture the different facets of email in ECS, we'll likely be adding several new fieldsets over time. 😄
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'd like us to capture two aspects of Eric's point in the Concerns section for now, so let's add two subsections there: 1 - There are many types of legitimate "email events" that could be captured. This RFC is currently focusing on a subset. But we'll want to keep in mind the whole list of possible types of email events, to make sure there's room in the schema for all. Here's the list:
2 - One design decision we'll have to make is whether we also introduce fields for the 3 main "email protocols" (SMTP, IMAP and POP3), or do we try to fit most things under
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @webmat Would there be any reason to discuss having the protocols under email rather than a top field level? We could for example treat
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Quick question about these comments. Is email authentication (i.e. SPF/DKIM/DMARC) in scope for this RFC? Mainly wondering because they're really a product of alignment on Return-Path (MAIL FROM), From, and DKIM-Signature mail headers. Not that I recommend all implementations that can capture these headers attempt to validate SPF/DKIM/DMARC, but it's totally possible given only the message content (and some fun DNS lookups). I think that they fall under a pretty different realm than say spam/reputation scoring, analytics reports, MTA actions, or even email protocols that were mentioned. |
||||||
|
|
||||||
| <!-- | ||||||
| As you work on your RFC, use the "Stage N" comments to guide you in what you should focus on, for the stage you're targeting. | ||||||
| Feel free to remove these comments as you go along. | ||||||
| --> | ||||||
|
|
||||||
| <!-- | ||||||
| Stage 0: Provide a high level summary of the premise of these changes. Briefly describe the nature, purpose, and impact of the changes. ~2-5 sentences. | ||||||
| --> | ||||||
|
|
||||||
| ## Fields | ||||||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Another item that came to mind, and I think makes sense to capture for further discussion later: Would
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I would say +1 on this. @ebeahan if we would create this new
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Let's mention this in the RFC "Fields" section. |
||||||
|
|
||||||
| <!-- | ||||||
| Stage 1: Describe at a high level how this change affects fields. Which fieldsets will be impacted? How many fields overall? Are we primarily adding fields, removing fields, or changing existing fields? The goal here is to understand the fundamental technical implications and likely extent of these changes. ~2-5 sentences. | ||||||
| --> | ||||||
|
|
||||||
| | field | type | description | | ||||||
| | --- | --- | --- | | ||||||
| | `email.action` | keyword | Action take by the source device, e.g. delivered, blocked, quarantined, deleted | | ||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I would rather have integrations use
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I agree with this as well, we should reuse the ECS fields we already have when we can. |
||||||
| | `email.bcc.address` | keyword | Addresses of Bcc's | | ||||||
| | `email.bcc.domain` | keyword | Domains of the Bcc's | | ||||||
| | `email.cc.address` | keyword | Addresses of Cc's | | ||||||
| | `email.cc.domain` | keyword | Domains of Cc addresses | | ||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is a good first start, for capturing each array of emails. I would make all of them plural, however. Looking at this, I wonder if we should consider the If not, perhaps two distinct arrays of keywords are totally fine, like * My feeling is that in most cases, the sender's email address would be a suspicious one we want to break down all the way, but recipients are not "suspicious", they're the unwitting recipients for the email :-)
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @webmat I think it would make sense to have the structure you have above, but in the sense of looking for a specific address or domain, we should also accompany this with
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Under the current guidance with the Rather we continue with this guidance or adjust and add additional |
||||||
| | `email.cipher` | keyword | Cipher used e.g. TLS | | ||||||
| | `email.file.count` | value | Number of attachments included in the message | | ||||||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. why not allowing multiple files as in an array ? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I would use the term 'attachment' instead of 'file'. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. file already exists as ECS field, it would be great to reuse it
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Agreed that The
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I agree with the name "attachments", and that it should support capturing multiple attachments. Note that I think it should be pluralized, since there can be many. This would be in line with how we named Here's a few suggestions on changes we could make to the attachment fields as we currently have them:
This will be an array of objects. In order for Elasticsearch to be able to index it so that we can query on multiple attachment attributes at a time (e.g. Querying nested fields is slightly different than normal fields, however (API, KQL). It looks to me like it has good support across the stack all the way to KQL, which is great. But since this will be the first use of this type in ECS, and since querying these fields is a bit different than usual, I think this would be worth a mention in the Concerns section. I would still adjust the field listing assuming we'll use I like @vpiserchia's suggestion of reusing |
||||||
| | `email.file.extension` | keyword | Extensions of attachment, e.g. .zip, .docx | | ||||||
| | `email.file.hash` | keyword | Hash of attachments | | ||||||
| | `email.file.name` | keyword | File name of attachements | | ||||||
| | `email.file.size` | keyword | Total size of all attachements in bytes | | ||||||
| | `email.direction` | keyword | Direction of the message based on the sending and receving domains | | ||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. What do we for see as values for this?, and from which address fields (to, cc, bcc) would it be categorized on?
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Good question, @dainperkins. I assume the allowed values in there should be "inbound" and "outbound". Perhaps also "unknown" in the case of relays? Actually just like I agree populating this consistently may not be obvious in all scenarios. I don't think as a third party, our solutions can determine between "inbound", "outbound" and "internal" without specific configuration that says what are "my domains". But once we know that, I assume the heuristic is pretty straightforward:
So I'm +1 on adding the field. I think it makes sense. And unless I'm missing something, I think the heuristics are reasonable; and actually, perhaps some of the email-related event sources already provide such values? It's certainly useful for a spam filter to know which emails to filter. Not sure if it shows up in their logs though. Action item for the RFC, though: let's start listing expected values for this field. I'm providing ideas above as a strawperson, based on what we have in |
||||||
| | `email.from.address` | keyword | Senders email address | | ||||||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The schema should support and distinguish both the envelope/smtp 'from' and the header/mime 'from'.
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @jeffrysleddens Do you think it would be worth considering a specific section in the schema for a breakdown of protocols like SMTP and POP3? Or do you think a general purpose On a related note, I think we should also consider adding
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. How would we distinguish which hash is for which file, and the same with size? Looking at the comment above about changing file to attachments, we will still need to for example have a list of objects if we want to keep track about size, hash, extension belonging to a single file/attachment. Are we thinking all of these fields should just be an array? |
||||||
| | `email.from.domain` | keyword | Senders domain | | ||||||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Just making a note that as we refine and iterate on this list of fields, we will want to consider which would be good candidates for using |
||||||
| | `email.latency` | keyword | The time, in milliseconds, the delivery attempt took | | ||||||
| | `email.message_id` | keyword | Internet message ID of the message | | ||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Message IDs can be pretty creative. For example one of the message IDs for this PR's email notifications was So I would make this one There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. nevertheless the message_id captures the uniqueness of a mail. |
||||||
| | `email.process` | keyword | Name of the executable that carried out the transaction, e.g. outlook, sendmail | | ||||||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. MTA?
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Good call out. Although I wonder what the intent is, with this field? Is it:
I'm curious what folks would like to have here. Would both of these fields be useful? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Why add a field for this, i.m.o this can be captured in This would be in line with the statement on
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I agree with @SHolzhauer on this, I think the name of the process should follow the current ECS format. When it comes to question number 2 from @webmat it is something that might be useful in a separate field. |
||||||
| | `email.protocol` | keyword | The email protocol used, e.g. SMTP, IMAP | | ||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'm thinking we should favor
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. +1 for this. |
||||||
| | `email.reply_to.address` | keyword | Reply-to address | | ||||||
| | `object.return.address` | keyword | The return address for the message | | ||||||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. should not be email.return.address? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. .address exits for email.reply_to and object.return, but not .domain, should be constant and have .domain as an additional field. |
||||||
| | `email.size` | keyword | Total size of the message, in bytes, including attachments | | ||||||
|
webmat marked this conversation as resolved.
Outdated
|
||||||
| | `email.subject` | keyword | Subject of the message | | ||||||
| | `email.to` | keyword | Recipieint address | | ||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think 'recipient' is a more common term when it comes to email than 'to'. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. this really depends if you want to capture the semantics of the envelope/smtp and/or the email headers. This applies to other fields as well (to, cc, bcc, from)
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I agree there's nuances in recipients, and in how closely we want to represent each protocol. After looking at a few raw emails, I'm starting to think we should keep the design of this field set pretty high level. If capturing each protocol's nuances is desired or necessary, then I think we should consider working on a specific breakdown per protocol, as a separate step. So I think the guiding principle we could use here is to capture the commonalities in One point you're raising is pretty interesting, however. Should we capture each "type" of recipient in a different field, or in one parent field with an additional label to indicate which type of recipient this is? The current proposal takes the former approach: The "recipient" suggestion would look like: 🤔 There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I really like this one:
this avoids the need for nested objects. And It also opens to a new one in the "related" field:
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. From an ingest perspective and a visualization perspective I think its better to keep the "to, cc, bcc" field structure over type. |
||||||
| | `email.to.domain` | keyword | Recipient domain | | ||||||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. email.to is already a field of type keyword, how can you define email.to as an object having domain as subfield? sorry for the silly question
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Good catch! 👀 I believe |
||||||
|
|
||||||
| ## Usage | ||||||
|
|
||||||
| <!-- | ||||||
| Stage 1: Describe at a high-level how these field changes will be used in practice. Real world examples are encouraged. The goal here is to understand how people would leverage these fields to gain insights or solve problems. ~1-3 paragraphs. | ||||||
| --> | ||||||
|
|
||||||
| Email use cases stretch across all three Elastic solutions - Search, Observe, Protect. Whether it's searching for content within email, ensuring email infrastrucure is operational or detecting email based attacks, there are many possibilities for email fields within ECS. | ||||||
|
|
||||||
| ## Source data | ||||||
|
|
||||||
| <!-- | ||||||
| Stage 1: Provide a high-level description of example sources of data. This does not yet need to be a concrete example of a source document, but instead can simply describe a potential source (e.g. nginx access log). This will ultimately be fleshed out to include literal source examples in a future stage. The goal here is to identify practical sources for these fields in the real world. ~1-3 sentences or unordered list. | ||||||
| --> | ||||||
|
|
||||||
| - **Email Analytics**: [Hubspot](https://legacydocs.hubspot.com/docs/methods/email/email_events_overview), Marketo, Salesforce Pardot | ||||||
| - **Email Server**: [O365 Message Tracing](https://docs.microsoft.com/en-us/exchange/monitoring/trace-an-email-message/run-a-message-trace-and-view-results), [Postfix](https://nxlog.co/documentation/nxlog-user-guide/postfix.html) | ||||||
| - **Email Security**: [Barracuda](https://campus.barracuda.com/product/emailsecuritygateway/doc/12193950/syslog-and-the-barracuda-email-security-gateway/), [Forcepoint](https://www.websense.com/content/support/library/email/v85/email_siem/siem_log_map.pdf), [Mimecast](https://www.mimecast.com/tech-connect/documentation/tutorials/understanding-siem-logs/), [Proofpoint](https://help.proofpoint.com/Threat_Insight_Dashboard/API_Documentation/SIEM_API) | ||||||
|
|
||||||
| <!-- | ||||||
| Stage 2: Included a real world example source document. Ideally this example comes from the source(s) identified in stage 1. If not, it should replace them. The goal here is to validate the utility of these field changes in the context of a real world example. Format with the source name as a ### header and the example document in a GitHub code block with json formatting. | ||||||
| --> | ||||||
|
|
||||||
| <!-- | ||||||
| Stage 3: Add more real world example source documents so we have at least 2 total, but ideally 3. Format as described in stage 2. | ||||||
| --> | ||||||
|
|
||||||
| ## Scope of impact | ||||||
|
|
||||||
| <!-- | ||||||
| Stage 2: Identifies scope of impact of changes. Are breaking changes required? Should deprecation strategies be adopted? Will significant refactoring be involved? Break the impact down into: | ||||||
| * Ingestion mechanisms (e.g. beats/logstash) | ||||||
| * Usage mechanisms (e.g. Kibana applications, detections) | ||||||
| * ECS project (e.g. docs, tooling) | ||||||
| The goal here is to research and understand the impact of these changes on users in the community and development teams across Elastic. 2-5 sentences each. | ||||||
| --> | ||||||
|
|
||||||
| ## Concerns | ||||||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. When looking at the current fields provided, one of my concerns is it appears that they don't fit well with the rest of ECS. I think this can be partially fixed with the use of aliases, though, I don't believe aliases are standard/common in ECS. Examples: email.from -> source
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. latency might be redundant depending on the specific action being recorded, but I wouldn't equate email.to|from with source and destination (or client/server) network entities
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Agreed, I think email has enough subtleties in both the "senders" (sender, reply_to, return_path) and the receivers (to, cc, bcc) that I don't think it makes sense to put them in If an email server logs the |
||||||
|
|
||||||
| <!-- | ||||||
| Stage 1: Identify potential concerns, implementation challenges, or complexity. Spend some time on this. Play devil's advocate. Try to identify the sort of non-obvious challenges that tend to surface later. The goal here is to surface risks early, allow everyone the time to work through them, and ultimately document resolution for posterity's sake. | ||||||
| --> | ||||||
|
|
||||||
| <!-- | ||||||
| Stage 2: Document new concerns or resolutions to previously listed concerns. It's not critical that all concerns have resolutions at this point, but it would be helpful if resolutions were taking shape for the most significant concerns. | ||||||
| --> | ||||||
|
|
||||||
| <!-- | ||||||
| Stage 3: Document resolutions for all existing concerns. Any new concerns should be documented along with their resolution. The goal here is to eliminate the risk of churn and instability by resolving outstanding concerns. | ||||||
| --> | ||||||
|
|
||||||
| <!-- | ||||||
| Stage 4: Document any new concerns and their resolution. The goal here is to eliminate risk of churn and instability by ensuring all concerns have been addressed. | ||||||
| --> | ||||||
|
|
||||||
| ## Real-world implementations | ||||||
|
|
||||||
| <!-- | ||||||
| Stage 4: Identify at least one real-world, production-ready implementation that uses these updated field definitions. An example of this might be a GA feature in an Elastic application in Kibana. | ||||||
| --> | ||||||
|
|
||||||
| People | ||||||
|
P1llus marked this conversation as resolved.
Outdated
|
||||||
|
|
||||||
| The following are the people that consulted on the contents of this RFC. | ||||||
|
|
||||||
| Jamie Hynds | author | ||||||
| TBD | Sponsor | ||||||
|
|
||||||
| <!-- | ||||||
| Who will be or has been consulted on the contents of this RFC? Identify authorship and sponsorship, and optionally identify the nature of involvement of others. Link to GitHub aliases where possible. This list will likely change or grow stage after stage. | ||||||
|
|
||||||
| e.g.: | ||||||
|
|
||||||
| * @Yasmina | author | ||||||
| * @Monique | sponsor | ||||||
| * @EunJung | subject matter expert | ||||||
| * @JaneDoe | grammar, spelling, prose | ||||||
| * @Mariana | ||||||
| --> | ||||||
|
|
||||||
|
|
||||||
| ## References | ||||||
|
|
||||||
| <!-- Insert any links appropriate to this RFC in this section. --> | ||||||
|
P1llus marked this conversation as resolved.
|
||||||
|
|
||||||
| ### RFC Pull Requests | ||||||
|
|
||||||
| <!-- An RFC should link to the PRs for each of it stage advancements. --> | ||||||
|
|
||||||
| * Stage 0: https://github.com/elastic/ecs/pull/NNN | ||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||
|
|
||||||
| <!-- | ||||||
| * Stage 1: https://github.com/elastic/ecs/pull/NNN | ||||||
| ... | ||||||
| --> | ||||||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not required, but including the stage name with the stage number has become an informal RFC convention 😄