[RFC] Wildcard - stage 2 proposal#970
Conversation
| @@ -0,0 +1,13 @@ | |||
| --- | |||
| - name: process | |||
There was a problem hiding this comment.
do these apply to process.parent asa well?
There was a problem hiding this comment.
Yes, now that process.parent is managed by the field reuse mechanism, this will indeed apply to it :-)
There was a problem hiding this comment.
Thanks for the new table. I have a few nitpicks as comments below, but the meat of the review will be here.
- I agree with
source,destination,file,os,registryanduser_agentfields you've selected - I agree with the
urlfields you've selected so far - I have a few challenges on some of the other fields selected, noted below. But mostly good selection there as well, overall 👍
Here are a few more fields I think we should migrate to wildcard now, and have reflected in the RFC's YAML files, for experimental release in 1.7.0:
- Changes to source & destination should be mirrored to client & server as well. So
client.domain,client.registered_domainand same underserver.* -
agent.build.original -
error.type -
event.original- add
index: truein the RFC YAML file as well. Right now it's index false in ECS.
- add
-
http.request.referrer -
log.file.path,log.logger -
organization.name -
as.organization.name(this is not a reuse of theorganizationfield set, it's defined explicitly) -
tls.client.issuer,tls.client.subject,tls.server.issuer,tls.server.subject -
x509.issuer.distinguished_name,x509.subject.distinguished_name -
url.path
Here are a few I would like to consider to migrate in the future. I'd say capture those just in markdown for now:
-
geo.name -
registry.data.strings -
pe.product -
dns.question.name,dns.answers.data
Here are two fields I think we should migrate at 8.0, because they represent a breaking change. Therefore let's capture them in the RFC text, but not in the YAML files.
-
message -
error.message
Questions / challenges
- Why migrate
agent.name? I doubt this field is widely used. When it is used, I doubt the cardinality is very high. - I would remove
host.namefrom the list for now. It's the main identifier of a host for Elastic Security, so lots of aggregations and filtering aroundhost.namewould become a bit slower if we migrate it. However I think it's fine to leavehost.hostnameaswildcard. - I would also not migrate
host.domainas in this case, it's rarely going to be a fqdn, but rather an AD domain name. Moreover, this is not going to be suspicious data where users need to do wildcard searches, IMO. - Why
process.name? It's a short executable name in Posix envs. Is it pretty long under Windows? (The long process title should be captured atprocess.title) - I would remove
user.domainfor the same reason ashost.domain
Maybe we can simply capture the "considered" and the "8.0" suggestions I'm making via an additional column in the table for now.
|
@rw-access Do you think |
|
@dainperkins Do you think |
|
pe.product - probably not. |
|
@rw-access Yeah .path and .key are already part of the plan. We'll add |
Agree with all the above 👍
Is the thinking to let these fields and their usage mature more and revisit
A separate conversation, but would we add a
I'm fine with removing it since it's not widely used.
👍
Fair. Even in fairly large AD environments, it's probably unlikely to have more than 100s or perhaps low 1000s of unique domains.
Not extremely long. From some I think I keyed off this one due to the
++ |
|
under consideration
No it's purely because I wasn't 100% sure myself. So I wanted to put them on the table for consideration without slowing things down. With Ross' feedback above, we can remove So this leaves only text to wildcard
For process.name Ok this could make sense, but I'm still hesitant. Perhaps we add it to the list of fields under consideration for now? |
I'm onboard with that approach. |
There was a problem hiding this comment.
Thanks for grabbing my commits @ebeahan. Here's a few more notes:
- When building artifacts with
build/ve/bin/python scripts/generator.py --include rfcs/text/0001/, I see the field definition fordns.answersdisappearing from the csv, beats/fields.ecs.yml, ecs_flat.yml and ecs_nested.yml. A problem we should tackle separately from this RFC. - Note that I didn't end up adding anything as "considered", for now. I've directly added both DNS fields based on another discussion out of band. Since this only left
geo.nameto be considered, I've directly migrated it instead. Stage 2 is experimental, so we can walk this back if this is a problem. - I see you've removed
process.namesince I cast doubt on it. But looking back at #570 I see Craig was pushing for more flexibility for that field. Let's ignore my doubts and go with your initial gut feeling, and migrate it to wildcard 😬 - In line with the point above, and with another request from Craig on #570, let's also move
pe.original_file_nameto wildcard
I think we'll be good after this.
|
Overall LGTM One small question/worry. |
|
Good point @leehinman, thanks. Right now ECS doesn't deal very well with non indexed fields. Currently I think we could leave it non-indexed but still "migrate" it to wildcard. Separately from this RFC I think we could improve on these non indexed fields, and say something to the tone of "this field is not indexed, but if users want to index it anyway, we suggest type: wildcard" |
4cdb03b to
7a4a3e9
Compare
I wasn't seeing the same behavior in a quick test to reproduce. I'll try some more, and if I can reliable reproduce I'll open an issue for later.
Sounds good. 👍 I addressed the two items @webmat listed and also added One outstanding question: From @leehinman's observation, should we edit: https://github.com/ebeahan/ecs/blob/wildcard-rfc-stage-2/rfcs/text/0001/event.yml#L5 and remove the |
|
@ebeahan Yes for now, let's remove However we will address the rendering of this subtlety in the docs separately. |
|
Good 👁️ on adding |
|
Stumbled on a typo. Line 158 "Luence" => "Lucene" |
Co-authored-by: Mathieu Martin <webmat@gmail.com>
webmat
left a comment
There was a problem hiding this comment.
LGTM 🚢
Thanks everyone for the input!
Summary
Revisions to the stage 1 wildcard adoption for consideration to be accepted as a stage 2 proposal.
The most significant addition is the list of fields which are the current candidates for
wildcard. Discussion on any fields that should be added or removed is welcomed.Criteria for consideration
Markdown preview of this RFC