From fb0c6a21abfda311428fa1ae0d9cbacbe2488a01 Mon Sep 17 00:00:00 2001 From: Eric Beahan Date: Thu, 17 Sep 2020 11:44:15 -0500 Subject: [PATCH 01/30] stage two updates --- rfcs/text/0001-wildcard-data-type.md | 184 ++++++++++++++++++++++++--- 1 file changed, 164 insertions(+), 20 deletions(-) diff --git a/rfcs/text/0001-wildcard-data-type.md b/rfcs/text/0001-wildcard-data-type.md index ff81ac08a0..0771a26a1e 100644 --- a/rfcs/text/0001-wildcard-data-type.md +++ b/rfcs/text/0001-wildcard-data-type.md @@ -1,7 +1,7 @@ # 0001: Wildcard Field Adoption into ECS -- Stage: **1 (proposal)** +- Stage: **2 (draft)** - Date: **TBD** Wildcard is a data type for Elasticsearch string fields being introduced in Elasticsearch 7.9. Wildcard optimizes performance for queries using wildcards (`*`) and regex, allowing users to perform `grep`-like searches without the limitations of the existing @@ -10,28 +10,36 @@ text[0] and keyword[1] types. ## Fields -For a field to use wildcard, it will require changing the the field's defined schema `type` from `keyword` to `wildcard`. The following fieldsets are expected to adopt `wildcard` in at least one of their fields: - -* `agent.*` -* `destination.*` -* `error.*` -* `file.*` -* `host.*` -* `http.*` -* `os.*` -* `process.*` -* `registry.*` -* `source.*` -* `url.*` -* `user.*` -* `user_agent.*` +### Identified Wildcard Fields + +For a field to use wildcard, it will require changing the the field's defined schema `type` from `keyword` to `wildcard`. These are the following fields identified to transition: + +| Field Set | Field(s) | +| --------- | -------- | +| [`agent`](text/0001/agent.yml) | `name` | +| [`destination`](text/0001/destination.yml) | `domain`
`registered_domain` | +| [`error`](text/0001/error.yml) | `stack_trace` | +| [`file`](text/0001/file.yml) | `directory`
`path`
`target_path` | +| [`host`](text/0001/host.yml) | `hostname`
`name`
`domain` | +| [`http`](text/0001/http.yml) | `request.body.content`
`response.body.content` | +| [`os`](text/0001/os.yml) | `name`
`full` | +| [`process`](text/0001/process.yml) | `command_line`
`executable`
`name`
`title`
`working_directory`
| +| [`registry`](text/0001/registry.yml) | `key`
`path` | +| [`source`](text/0001/source.yml) | `domain`
`registered_domain` | +| [`url`](text/0001/url.yml) | `original`
`full`
`domain`
`registered_domain` | +| [`user`](text/0001/user.yml) | `name`
`full_name`
`email`
`domain` | +| [`user_agent`](text/0001/user_agent.yml) | `original` | + +The full set of schema files which will be transitioning to `wildcard` are located [here](text/0001/). + +### Example definition Here's an example of applying this change to the `process.command_line` field: -**Definition as of ECS 1.5.0** +**Definition as of ECS 1.6.0** Schema definition: @@ -232,9 +240,11 @@ Additional cases for wildcard searching against command line executions: ## Source data +### Categories + * Windows events * Sysmon events * Powershell events @@ -244,6 +254,138 @@ Stage 1: Provide a high-level description of example sources of data. This does * Endpoint agents * Application stack traces +### Real world examples + +Each example in this section contains a partial index mapping, a partial event, and one wildcard search query. Each query example uses a leading wildcard on expected high-cardinality fields where `wildcard` is performs far better than `keyword`. + +**Windows registry event from sysmon:** + +``` +### Mapping (partial) +... + "registry" : { + "properties" : { + "key" : { + "type" : "wildcard" + } + } + } +... + +### Event (partial) +... + "registry": { + "path": "HKU\\S-1-5-21-1957236100-58272097-297103362-500\\Software\\Microsoft\\Windows\\CurrentVersion\\Explorer\\Advanced\\HideFileExt", + "hive": "HKU", + "key": "S-1-5-21-1957236100-58272097-297103362-500\\Software\\Microsoft\\Windows\\CurrentVersion\\Explorer\\Advanced\\HideFileExt", + "value": "HideFileExt", + "data": { + "strings": [ + "1" + ], + "type": "SZ_DWORD" + } +... + +### Query + +GET ecs-*/_search +{ + "query": { + "wildcard": { + "registry.key": { + "value": "*CurrentVersion*" + } + } + } +} + +``` + +**Windows Powershell logging event:** + +``` +### Mapping (partial) +... + "process" : { + "properties" : { + "command_line" : { + "type" : "wildcard", + "fields" : { + "text" : { + "type" : "text", + "norms" : false + } + } + } + } + } +... + +### Event (partial) + + "process": { + "pid": 3540, + ... + "command_line": "C:\\Windows\\System32\\svchost.exe -k netsvcs -p -s NetSetupSvc" + } + +### Query + +GET ecs-winlogbeat-7.9.1-2020.09.16-000001/_search +{ + "_source": false, + "query": { + "wildcard": { + "process.command_line": { + "value": "*-k netsvcs -p*" + } + } + } +} +``` + +**Wildcard query against original URL from a squid web proxy event:** + +``` +### Mapping (partial) + +... + "url" : { + "original" : { + "type" : "wildcard", + "fields" : { + "text" : { + "type" : "text", + "norms" : false + } + } + } +... + +### Event (partial) + +... + "url": { + "original": "http://example.com/cart.do?action=view&itemId=HolyGouda", + "domain": "example.com" + } +... + +### Query + +GET filebeat-7.9.1-2020.09.17-000001/_search +{ + "_source": false, + "query": { + "wildcard": { + "url.original": { + "value": "*action=view*Gouda" + } + } + } +} +``` ## Scope of impact @@ -270,7 +412,7 @@ ECS is and will remain an open source licensed project. However, there will be f ## Concerns ### Wildcard and case-insensitivity @@ -287,6 +429,8 @@ Performance and storage characteristics between wildcard and keyword will be dif ECS applies the `ignore_above` setting to keyword fields to prevent strings longer than 1024 characters from being indexed or stored. While `ignore_above` can be raised, Lucene implements a term byte-length limit of 32766 which cannot be adjusted. Wildcard supports an unlimited max character size for a field value. The `wildcard` field type will still have the `ignore_above` option available, and a reasonable limit may be need applied to mitigate unexpected side-effects. +For the initial adoption into ECS, `wildcard` fields will not have an `ignore_above` option defined. + ### Licensing Until now ECS has relied only on OSS licensed features, but ECS will also support Elastic licensed features. The ECS project will remain OSS licensed with the schema implementing Elastic licensed features as part of the specification. When ECS adopts a feature available only under a license, it will be noted in the documentation. ECS plans to provide tooling options which continue to support OSS consumers of ECS and the Elastic Stack. From da4aa73c399730d12511c43f75b869dba110d306 Mon Sep 17 00:00:00 2001 From: Eric Beahan Date: Thu, 17 Sep 2020 11:44:39 -0500 Subject: [PATCH 02/30] adding wildcard schema files --- rfcs/text/0001/agent.yml | 5 +++++ rfcs/text/0001/destination.yml | 7 +++++++ rfcs/text/0001/error.yml | 6 ++++++ rfcs/text/0001/file.yml | 9 +++++++++ rfcs/text/0001/host.yml | 8 ++++++++ rfcs/text/0001/http.yml | 7 +++++++ rfcs/text/0001/os.yml | 7 +++++++ rfcs/text/0001/process.yml | 13 +++++++++++++ rfcs/text/0001/registry.yml | 7 +++++++ rfcs/text/0001/source.yml | 7 +++++++ rfcs/text/0001/url.yml | 11 +++++++++++ rfcs/text/0001/user.yml | 11 +++++++++++ rfcs/text/0001/user_agent.yml | 5 +++++ 13 files changed, 103 insertions(+) create mode 100644 rfcs/text/0001/agent.yml create mode 100644 rfcs/text/0001/destination.yml create mode 100644 rfcs/text/0001/error.yml create mode 100644 rfcs/text/0001/file.yml create mode 100644 rfcs/text/0001/host.yml create mode 100644 rfcs/text/0001/http.yml create mode 100644 rfcs/text/0001/os.yml create mode 100644 rfcs/text/0001/process.yml create mode 100644 rfcs/text/0001/registry.yml create mode 100644 rfcs/text/0001/source.yml create mode 100644 rfcs/text/0001/url.yml create mode 100644 rfcs/text/0001/user.yml create mode 100644 rfcs/text/0001/user_agent.yml diff --git a/rfcs/text/0001/agent.yml b/rfcs/text/0001/agent.yml new file mode 100644 index 0000000000..db33838ea6 --- /dev/null +++ b/rfcs/text/0001/agent.yml @@ -0,0 +1,5 @@ +--- +- name: agent + fields: + - name: name + type: wildcard diff --git a/rfcs/text/0001/destination.yml b/rfcs/text/0001/destination.yml new file mode 100644 index 0000000000..d64a84c6be --- /dev/null +++ b/rfcs/text/0001/destination.yml @@ -0,0 +1,7 @@ +--- + - name: destination + fields: + - name: domain + type: wildcard + - name: registered_domain + type: wildcard diff --git a/rfcs/text/0001/error.yml b/rfcs/text/0001/error.yml new file mode 100644 index 0000000000..6fab6d58c7 --- /dev/null +++ b/rfcs/text/0001/error.yml @@ -0,0 +1,6 @@ +--- +- name: error + fields: + - name: stack_trace + index: true + type: wildcard diff --git a/rfcs/text/0001/file.yml b/rfcs/text/0001/file.yml new file mode 100644 index 0000000000..f4938d38be --- /dev/null +++ b/rfcs/text/0001/file.yml @@ -0,0 +1,9 @@ +--- +- name: file + fields: + - name: directory + type: wildcard + - name: path + type: wildcard + - name: target_path + type: wildcard diff --git a/rfcs/text/0001/host.yml b/rfcs/text/0001/host.yml new file mode 100644 index 0000000000..79eb12001a --- /dev/null +++ b/rfcs/text/0001/host.yml @@ -0,0 +1,8 @@ +- name: host + fields: + - name: hostname + type: wildcard + - name: name + type: wildcard + - name: domain + type: wildcard diff --git a/rfcs/text/0001/http.yml b/rfcs/text/0001/http.yml new file mode 100644 index 0000000000..eded72da3d --- /dev/null +++ b/rfcs/text/0001/http.yml @@ -0,0 +1,7 @@ +--- +- name: http + fields: + - name: request.body.content + type: wildcard + - name: response.body.content + type: wildcard diff --git a/rfcs/text/0001/os.yml b/rfcs/text/0001/os.yml new file mode 100644 index 0000000000..ec9d71a79c --- /dev/null +++ b/rfcs/text/0001/os.yml @@ -0,0 +1,7 @@ +--- +- name: os + fields: + - name: name + type: wildcard + - name: full + type: wildcard diff --git a/rfcs/text/0001/process.yml b/rfcs/text/0001/process.yml new file mode 100644 index 0000000000..da492e4564 --- /dev/null +++ b/rfcs/text/0001/process.yml @@ -0,0 +1,13 @@ +--- +- name: process + fields: + - name: command_line + type: wildcard + - name: executable + type: wildcard + - name: name + type: wildcard + - name: title + type: wildcard + - name: working_directory + type: wildcard diff --git a/rfcs/text/0001/registry.yml b/rfcs/text/0001/registry.yml new file mode 100644 index 0000000000..8fdae7149e --- /dev/null +++ b/rfcs/text/0001/registry.yml @@ -0,0 +1,7 @@ +--- +- name: registry + fields: + - name: key + type: wildcard + - name: path + type: wildcard diff --git a/rfcs/text/0001/source.yml b/rfcs/text/0001/source.yml new file mode 100644 index 0000000000..d810a6cb79 --- /dev/null +++ b/rfcs/text/0001/source.yml @@ -0,0 +1,7 @@ +--- +- name: source + fields: + - name: domain + type: wildcard + - name: registered_domain + type: wildcard diff --git a/rfcs/text/0001/url.yml b/rfcs/text/0001/url.yml new file mode 100644 index 0000000000..4ff4a411ea --- /dev/null +++ b/rfcs/text/0001/url.yml @@ -0,0 +1,11 @@ +--- +- name: url + fields: + - name: original + type: wildcard + - name: full + type: wildcard + - name: domain + type: wildcard + - name: registered_domain + type: wildcard diff --git a/rfcs/text/0001/user.yml b/rfcs/text/0001/user.yml new file mode 100644 index 0000000000..412ed823cc --- /dev/null +++ b/rfcs/text/0001/user.yml @@ -0,0 +1,11 @@ +--- +- name: user + fields: + - name: name + type: wildcard + - name: full_name + type: wildcard + - name: email + type: wildcard + - name: domain + type: wildcard diff --git a/rfcs/text/0001/user_agent.yml b/rfcs/text/0001/user_agent.yml new file mode 100644 index 0000000000..c413a9d702 --- /dev/null +++ b/rfcs/text/0001/user_agent.yml @@ -0,0 +1,5 @@ +--- +- name: user_agent + fields: + - name: original + type: wildcard From 00eae5b898cf32b46cf14e9414cfefa8cc0c24e7 Mon Sep 17 00:00:00 2001 From: Eric Beahan Date: Thu, 17 Sep 2020 11:49:54 -0500 Subject: [PATCH 03/30] add link for stage 2 PR --- rfcs/text/0001-wildcard-data-type.md | 1 + 1 file changed, 1 insertion(+) diff --git a/rfcs/text/0001-wildcard-data-type.md b/rfcs/text/0001-wildcard-data-type.md index 0771a26a1e..c388940cbd 100644 --- a/rfcs/text/0001-wildcard-data-type.md +++ b/rfcs/text/0001-wildcard-data-type.md @@ -470,3 +470,4 @@ The following are the people that consulted on the contents of this RFC. * Stage 0: https://github.com/elastic/ecs/pull/890 * Stage 1: https://github.com/elastic/ecs/pull/904 +* Stage 2: https://github.com/elastic/ecs/pull/970 From e16ed89b985eb38dd1b804a545e4f54f9ab90228 Mon Sep 17 00:00:00 2001 From: Eric Beahan Date: Thu, 17 Sep 2020 13:26:10 -0500 Subject: [PATCH 04/30] fix links to schema files --- rfcs/text/0001-wildcard-data-type.md | 30 ++++++++++++++-------------- 1 file changed, 15 insertions(+), 15 deletions(-) diff --git a/rfcs/text/0001-wildcard-data-type.md b/rfcs/text/0001-wildcard-data-type.md index c388940cbd..cc5b26d721 100644 --- a/rfcs/text/0001-wildcard-data-type.md +++ b/rfcs/text/0001-wildcard-data-type.md @@ -19,21 +19,21 @@ For a field to use wildcard, it will require changing the the field's defined sc | Field Set | Field(s) | | --------- | -------- | -| [`agent`](text/0001/agent.yml) | `name` | -| [`destination`](text/0001/destination.yml) | `domain`
`registered_domain` | -| [`error`](text/0001/error.yml) | `stack_trace` | -| [`file`](text/0001/file.yml) | `directory`
`path`
`target_path` | -| [`host`](text/0001/host.yml) | `hostname`
`name`
`domain` | -| [`http`](text/0001/http.yml) | `request.body.content`
`response.body.content` | -| [`os`](text/0001/os.yml) | `name`
`full` | -| [`process`](text/0001/process.yml) | `command_line`
`executable`
`name`
`title`
`working_directory`
| -| [`registry`](text/0001/registry.yml) | `key`
`path` | -| [`source`](text/0001/source.yml) | `domain`
`registered_domain` | -| [`url`](text/0001/url.yml) | `original`
`full`
`domain`
`registered_domain` | -| [`user`](text/0001/user.yml) | `name`
`full_name`
`email`
`domain` | -| [`user_agent`](text/0001/user_agent.yml) | `original` | - -The full set of schema files which will be transitioning to `wildcard` are located [here](text/0001/). +| [`agent`](0001/agent.yml) | `name` | +| [`destination`](0001/destination.yml) | `domain`
`registered_domain` | +| [`error`](0001/error.yml) | `stack_trace` | +| [`file`](0001/file.yml) | `directory`
`path`
`target_path` | +| [`host`](0001/host.yml) | `hostname`
`name`
`domain` | +| [`http`](0001/http.yml) | `request.body.content`
`response.body.content` | +| [`os`](0001/os.yml) | `name`
`full` | +| [`process`](0001/process.yml) | `command_line`
`executable`
`name`
`title`
`working_directory`
| +| [`registry`](0001/registry.yml) | `key`
`path` | +| [`source`](0001/source.yml) | `domain`
`registered_domain` | +| [`url`](0001/url.yml) | `original`
`full`
`domain`
`registered_domain` | +| [`user`](0001/user.yml) | `name`
`full_name`
`email`
`domain` | +| [`user_agent`](0001/user_agent.yml) | `original` | + +The full set of schema files which will be transitioning to `wildcard` are located [here](0001/). ### Example definition From da71db0fd4289adb8e6216625ba0976740b242bb Mon Sep 17 00:00:00 2001 From: Eric Beahan Date: Thu, 17 Sep 2020 13:43:39 -0500 Subject: [PATCH 05/30] rephrasing --- rfcs/text/0001-wildcard-data-type.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/rfcs/text/0001-wildcard-data-type.md b/rfcs/text/0001-wildcard-data-type.md index cc5b26d721..b12da6b514 100644 --- a/rfcs/text/0001-wildcard-data-type.md +++ b/rfcs/text/0001-wildcard-data-type.md @@ -15,7 +15,7 @@ Stage 2: Include new or updated yml field definitions for all of the essential f ### Identified Wildcard Fields -For a field to use wildcard, it will require changing the the field's defined schema `type` from `keyword` to `wildcard`. These are the following fields identified to transition: +For a field to use wildcard, it will require changing the the field's defined schema `type` from `keyword` to `wildcard`. The following fields are candidates for `wildcard`: | Field Set | Field(s) | | --------- | -------- | From e5ea69fa6f1be39269c141dae17aa49990f688af Mon Sep 17 00:00:00 2001 From: Eric Beahan Date: Wed, 30 Sep 2020 16:31:12 -0500 Subject: [PATCH 06/30] refactor table for better readability --- rfcs/text/0001-wildcard-data-type.md | 26 +++++++++++++------------- 1 file changed, 13 insertions(+), 13 deletions(-) diff --git a/rfcs/text/0001-wildcard-data-type.md b/rfcs/text/0001-wildcard-data-type.md index b12da6b514..922ede0b2e 100644 --- a/rfcs/text/0001-wildcard-data-type.md +++ b/rfcs/text/0001-wildcard-data-type.md @@ -19,19 +19,19 @@ For a field to use wildcard, it will require changing the the field's defined sc | Field Set | Field(s) | | --------- | -------- | -| [`agent`](0001/agent.yml) | `name` | -| [`destination`](0001/destination.yml) | `domain`
`registered_domain` | -| [`error`](0001/error.yml) | `stack_trace` | -| [`file`](0001/file.yml) | `directory`
`path`
`target_path` | -| [`host`](0001/host.yml) | `hostname`
`name`
`domain` | -| [`http`](0001/http.yml) | `request.body.content`
`response.body.content` | -| [`os`](0001/os.yml) | `name`
`full` | -| [`process`](0001/process.yml) | `command_line`
`executable`
`name`
`title`
`working_directory`
| -| [`registry`](0001/registry.yml) | `key`
`path` | -| [`source`](0001/source.yml) | `domain`
`registered_domain` | -| [`url`](0001/url.yml) | `original`
`full`
`domain`
`registered_domain` | -| [`user`](0001/user.yml) | `name`
`full_name`
`email`
`domain` | -| [`user_agent`](0001/user_agent.yml) | `original` | +| [`agent`](0001/agent.yml) | `agent.name` | +| [`destination`](0001/destination.yml) | `destination.domain`
`destination.registered_domain` | +| [`error`](0001/error.yml) | `error.stack_trace` | +| [`file`](0001/file.yml) | `file.directory`
`file.path`
`file.target_path` | +| [`host`](0001/host.yml) | `host.hostname`
`host.name`
`host.domain` | +| [`http`](0001/http.yml) | `http.request.body.content`
`http.response.body.content` | +| [`os`](0001/os.yml) | `os.name`
`os.full` | +| [`process`](0001/process.yml) | `process.command_line`
`process.executable`
`process.name`
`process.title`
`process.working_directory`
| +| [`registry`](0001/registry.yml) | `registry.key`
`registry.path` | +| [`source`](0001/source.yml) | `source.domain`
`source.registered_domain` | +| [`url`](0001/url.yml) | `url.original`
`url.full`
`url.domain`
`url.registered_domain` | +| [`user`](0001/user.yml) | `user.name`
`user.full_name`
`user.email`
`user.domain` | +| [`user_agent`](0001/user_agent.yml) | `user_agent.original` | The full set of schema files which will be transitioning to `wildcard` are located [here](0001/). From e004af7473b78d5c18ee0eb6f5f385aebb68a40d Mon Sep 17 00:00:00 2001 From: Mathieu Martin Date: Thu, 1 Oct 2020 16:46:31 -0400 Subject: [PATCH 07/30] Adjust index globs in query examples --- rfcs/text/0001-wildcard-data-type.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/rfcs/text/0001-wildcard-data-type.md b/rfcs/text/0001-wildcard-data-type.md index 922ede0b2e..883b84d91e 100644 --- a/rfcs/text/0001-wildcard-data-type.md +++ b/rfcs/text/0001-wildcard-data-type.md @@ -289,7 +289,7 @@ Each example in this section contains a partial index mapping, a partial event, ### Query -GET ecs-*/_search +GET winlogbeat-*/_search { "query": { "wildcard": { @@ -332,7 +332,7 @@ GET ecs-*/_search ### Query -GET ecs-winlogbeat-7.9.1-2020.09.16-000001/_search +GET winlogbeat-*/_search { "_source": false, "query": { @@ -374,7 +374,7 @@ GET ecs-winlogbeat-7.9.1-2020.09.16-000001/_search ### Query -GET filebeat-7.9.1-2020.09.17-000001/_search +GET filebeat-*/_search { "_source": false, "query": { From ab338c8b4eddb29a00866542edfd41c91630388c Mon Sep 17 00:00:00 2001 From: Mathieu Martin Date: Thu, 1 Oct 2020 16:51:49 -0400 Subject: [PATCH 08/30] Migrate same fields for client/server as for source/destination --- rfcs/text/0001/client.yml | 7 +++++++ rfcs/text/0001/server.yml | 7 +++++++ 2 files changed, 14 insertions(+) create mode 100644 rfcs/text/0001/client.yml create mode 100644 rfcs/text/0001/server.yml diff --git a/rfcs/text/0001/client.yml b/rfcs/text/0001/client.yml new file mode 100644 index 0000000000..14ed3a9a37 --- /dev/null +++ b/rfcs/text/0001/client.yml @@ -0,0 +1,7 @@ +--- + - name: client + fields: + - name: domain + type: wildcard + - name: registered_domain + type: wildcard diff --git a/rfcs/text/0001/server.yml b/rfcs/text/0001/server.yml new file mode 100644 index 0000000000..70c285f374 --- /dev/null +++ b/rfcs/text/0001/server.yml @@ -0,0 +1,7 @@ +--- + - name: server + fields: + - name: domain + type: wildcard + - name: registered_domain + type: wildcard From fcbdb87476ae36671799a88be0a7877a28a20d2a Mon Sep 17 00:00:00 2001 From: Mathieu Martin Date: Thu, 1 Oct 2020 16:54:10 -0400 Subject: [PATCH 09/30] Don't migrate agent.name but migrate agent.build.original --- rfcs/text/0001/agent.yml | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/rfcs/text/0001/agent.yml b/rfcs/text/0001/agent.yml index db33838ea6..d09e77111d 100644 --- a/rfcs/text/0001/agent.yml +++ b/rfcs/text/0001/agent.yml @@ -1,5 +1,5 @@ --- - name: agent fields: - - name: name + - name: build.original type: wildcard From 6ac5ecdfbd20a7cd65ae49b0787eef3d48b30226 Mon Sep 17 00:00:00 2001 From: Mathieu Martin Date: Thu, 1 Oct 2020 17:20:26 -0400 Subject: [PATCH 10/30] migrate error.type --- rfcs/text/0001/error.yml | 3 +++ 1 file changed, 3 insertions(+) diff --git a/rfcs/text/0001/error.yml b/rfcs/text/0001/error.yml index 6fab6d58c7..f2004d3fe0 100644 --- a/rfcs/text/0001/error.yml +++ b/rfcs/text/0001/error.yml @@ -4,3 +4,6 @@ - name: stack_trace index: true type: wildcard + + - name: type + type: wildcard From 5e5f4432c057c4071bc8f284606fa5e2ec0f65f1 Mon Sep 17 00:00:00 2001 From: Mathieu Martin Date: Thu, 1 Oct 2020 17:20:43 -0400 Subject: [PATCH 11/30] Migrate event.original --- rfcs/text/0001/event.yml | 6 ++++++ 1 file changed, 6 insertions(+) create mode 100644 rfcs/text/0001/event.yml diff --git a/rfcs/text/0001/event.yml b/rfcs/text/0001/event.yml new file mode 100644 index 0000000000..0b50d6f942 --- /dev/null +++ b/rfcs/text/0001/event.yml @@ -0,0 +1,6 @@ +--- +- name: event + fields: + - name: original + index: true + type: wildcard From 145721baa8fc7da7a511b5211394664cfd5eda0b Mon Sep 17 00:00:00 2001 From: Mathieu Martin Date: Thu, 1 Oct 2020 17:21:27 -0400 Subject: [PATCH 12/30] Boldly migrate geo.name. I'm sure some folk capture semi structured custom names --- rfcs/text/0001/geo.yml | 5 +++++ 1 file changed, 5 insertions(+) create mode 100644 rfcs/text/0001/geo.yml diff --git a/rfcs/text/0001/geo.yml b/rfcs/text/0001/geo.yml new file mode 100644 index 0000000000..d3445a5a2b --- /dev/null +++ b/rfcs/text/0001/geo.yml @@ -0,0 +1,5 @@ +--- + - name: geo + fields: + - name: name + type: wildcard From 479917537b1b8e5d8bbf32f84df9db1fe967b07b Mon Sep 17 00:00:00 2001 From: Mathieu Martin Date: Thu, 1 Oct 2020 17:22:21 -0400 Subject: [PATCH 13/30] Migrate only host.hostname in the host field set --- rfcs/text/0001/host.yml | 4 ---- 1 file changed, 4 deletions(-) diff --git a/rfcs/text/0001/host.yml b/rfcs/text/0001/host.yml index 79eb12001a..91f3d1bbc2 100644 --- a/rfcs/text/0001/host.yml +++ b/rfcs/text/0001/host.yml @@ -2,7 +2,3 @@ fields: - name: hostname type: wildcard - - name: name - type: wildcard - - name: domain - type: wildcard From f51ecf876366abc5f0f600cee187b63fde098aa9 Mon Sep 17 00:00:00 2001 From: Mathieu Martin Date: Thu, 1 Oct 2020 17:23:11 -0400 Subject: [PATCH 14/30] Don't migrate user.domain --- rfcs/text/0001/user.yml | 2 -- 1 file changed, 2 deletions(-) diff --git a/rfcs/text/0001/user.yml b/rfcs/text/0001/user.yml index 412ed823cc..89e182fbee 100644 --- a/rfcs/text/0001/user.yml +++ b/rfcs/text/0001/user.yml @@ -7,5 +7,3 @@ type: wildcard - name: email type: wildcard - - name: domain - type: wildcard From 48e656c2c267b221f30fcec658e27060a5eaa647 Mon Sep 17 00:00:00 2001 From: Mathieu Martin Date: Thu, 1 Oct 2020 17:27:16 -0400 Subject: [PATCH 15/30] Migrate log.logger and log.file.path --- rfcs/text/0001/log.yml | 7 +++++++ 1 file changed, 7 insertions(+) create mode 100644 rfcs/text/0001/log.yml diff --git a/rfcs/text/0001/log.yml b/rfcs/text/0001/log.yml new file mode 100644 index 0000000000..8a2f2dd397 --- /dev/null +++ b/rfcs/text/0001/log.yml @@ -0,0 +1,7 @@ +--- +- name: log + fields: + - name: file.path + type: wildcard + - name: logger + type: wildcard From c231e250d4df11430f1f7573193a46c672d27831 Mon Sep 17 00:00:00 2001 From: Mathieu Martin Date: Thu, 1 Oct 2020 17:28:19 -0400 Subject: [PATCH 16/30] Migrate organization.name and its cousin that lives in as.* --- rfcs/text/0001/as.yml | 5 +++++ rfcs/text/0001/organization.yml | 5 +++++ 2 files changed, 10 insertions(+) create mode 100644 rfcs/text/0001/as.yml create mode 100644 rfcs/text/0001/organization.yml diff --git a/rfcs/text/0001/as.yml b/rfcs/text/0001/as.yml new file mode 100644 index 0000000000..96cf45621c --- /dev/null +++ b/rfcs/text/0001/as.yml @@ -0,0 +1,5 @@ +--- +- name: as + fields: + - name: organization.name + type: wildcard diff --git a/rfcs/text/0001/organization.yml b/rfcs/text/0001/organization.yml new file mode 100644 index 0000000000..594581413b --- /dev/null +++ b/rfcs/text/0001/organization.yml @@ -0,0 +1,5 @@ +--- +- name: organization + fields: + - name: name + type: wildcard From 33b4caa5572f8be029f824ce46ab1943e3a18f0a Mon Sep 17 00:00:00 2001 From: Mathieu Martin Date: Thu, 1 Oct 2020 17:32:36 -0400 Subject: [PATCH 17/30] Migrate the certificate issuer and subject fields --- rfcs/text/0001/tls.yml | 11 +++++++++++ rfcs/text/0001/x509.yml | 7 +++++++ 2 files changed, 18 insertions(+) create mode 100644 rfcs/text/0001/tls.yml create mode 100644 rfcs/text/0001/x509.yml diff --git a/rfcs/text/0001/tls.yml b/rfcs/text/0001/tls.yml new file mode 100644 index 0000000000..4f5378a313 --- /dev/null +++ b/rfcs/text/0001/tls.yml @@ -0,0 +1,11 @@ +--- +- name: tls + fields: + - name: client.issuer + type: wildcard + - name: client.subject + type: wildcard + - name: server.issuer + type: wildcard + - name: server.subject + type: wildcard diff --git a/rfcs/text/0001/x509.yml b/rfcs/text/0001/x509.yml new file mode 100644 index 0000000000..d1c7d8af6b --- /dev/null +++ b/rfcs/text/0001/x509.yml @@ -0,0 +1,7 @@ +--- +- name: x509 + fields: + - name: issuer.distinguished_name + type: wildcard + - name: subject.distinguished_name + type: wildcard From b984f93446ff31b97a7612ba952e849fb16d35bc Mon Sep 17 00:00:00 2001 From: Mathieu Martin Date: Thu, 1 Oct 2020 17:34:34 -0400 Subject: [PATCH 18/30] Migrate registry.data.strings --- rfcs/text/0001/registry.yml | 2 ++ 1 file changed, 2 insertions(+) diff --git a/rfcs/text/0001/registry.yml b/rfcs/text/0001/registry.yml index 8fdae7149e..66f6f6b22c 100644 --- a/rfcs/text/0001/registry.yml +++ b/rfcs/text/0001/registry.yml @@ -5,3 +5,5 @@ type: wildcard - name: path type: wildcard + - name: data.strings + type: wildcard From 05cad53ccd355b501062e8b2969f119212680163 Mon Sep 17 00:00:00 2001 From: Mathieu Martin Date: Thu, 1 Oct 2020 17:36:29 -0400 Subject: [PATCH 19/30] Migrate dns.question.name and dns.answers.data --- rfcs/text/0001/dns.yml | 7 +++++++ 1 file changed, 7 insertions(+) create mode 100644 rfcs/text/0001/dns.yml diff --git a/rfcs/text/0001/dns.yml b/rfcs/text/0001/dns.yml new file mode 100644 index 0000000000..54f9ccd69a --- /dev/null +++ b/rfcs/text/0001/dns.yml @@ -0,0 +1,7 @@ +--- +- name: dns + fields: + - name: question.name + type: wildcard + - name: answers.data + type: wildcard From 721d43ebf8a1b4238a9aad35f605b9cae37ffcdb Mon Sep 17 00:00:00 2001 From: Mathieu Martin Date: Thu, 1 Oct 2020 17:37:51 -0400 Subject: [PATCH 20/30] Migrate url.path --- rfcs/text/0001/url.yml | 2 ++ 1 file changed, 2 insertions(+) diff --git a/rfcs/text/0001/url.yml b/rfcs/text/0001/url.yml index 4ff4a411ea..0d5f66c36a 100644 --- a/rfcs/text/0001/url.yml +++ b/rfcs/text/0001/url.yml @@ -5,6 +5,8 @@ type: wildcard - name: full type: wildcard + - name: path + type: wildcard - name: domain type: wildcard - name: registered_domain From 43f6c760d7344e8befc244b9707a5b2ca62b2ae3 Mon Sep 17 00:00:00 2001 From: Mathieu Martin Date: Thu, 1 Oct 2020 17:46:22 -0400 Subject: [PATCH 21/30] Adjust the table accordingly --- rfcs/text/0001-wildcard-data-type.md | 23 ++++++++++++++++------- 1 file changed, 16 insertions(+), 7 deletions(-) diff --git a/rfcs/text/0001-wildcard-data-type.md b/rfcs/text/0001-wildcard-data-type.md index 883b84d91e..3a0e3c88bd 100644 --- a/rfcs/text/0001-wildcard-data-type.md +++ b/rfcs/text/0001-wildcard-data-type.md @@ -19,21 +19,30 @@ For a field to use wildcard, it will require changing the the field's defined sc | Field Set | Field(s) | | --------- | -------- | -| [`agent`](0001/agent.yml) | `agent.name` | +| [`agent`](0001/agent.yml) | `agent.build.original` | +| [`as`](0001/as.yml) | `as.organization.name` | +| [`client`](0001/client.yml) | `client.domain`
`client.registered_domain` | | [`destination`](0001/destination.yml) | `destination.domain`
`destination.registered_domain` | -| [`error`](0001/error.yml) | `error.stack_trace` | +| [`dns`](0001/dns.yml) | `dns.question.name`
`dns.answers.data` | +| [`error`](0001/error.yml) | `error.stack_trace`
`error.type` | +| [`error`](0001/event.yml) | `event.original` | | [`file`](0001/file.yml) | `file.directory`
`file.path`
`file.target_path` | -| [`host`](0001/host.yml) | `host.hostname`
`host.name`
`host.domain` | -| [`http`](0001/http.yml) | `http.request.body.content`
`http.response.body.content` | +| [`geo`](0001/geo.yml) | `geo.name` | +| [`host`](0001/host.yml) | `host.hostname`
| +| [`http`](0001/http.yml) | `http.request.referrer`
`http.request.body.content`
`http.response.body.content` | +| [`log`](0001/log.yml) | `log.file.path`
`log.logger` | | [`os`](0001/os.yml) | `os.name`
`os.full` | | [`process`](0001/process.yml) | `process.command_line`
`process.executable`
`process.name`
`process.title`
`process.working_directory`
| -| [`registry`](0001/registry.yml) | `registry.key`
`registry.path` | +| [`registry`](0001/registry.yml) | `registry.key`
`registry.path`
`registry.data.strings` | +| [`server`](0001/server.yml) | `server.domain`
`server.registered_domain` | | [`source`](0001/source.yml) | `source.domain`
`source.registered_domain` | -| [`url`](0001/url.yml) | `url.original`
`url.full`
`url.domain`
`url.registered_domain` | +| [`tls`](0001/tls.yml) | `tls.client.issuer`
`tls.client.subject`
`tls.server.issuer`
`tls.server.subject` | +| [`url`](0001/url.yml) | `url.full`
`url.original`
`url.path`
`url.domain`
`url.registered_domain` | | [`user`](0001/user.yml) | `user.name`
`user.full_name`
`user.email`
`user.domain` | | [`user_agent`](0001/user_agent.yml) | `user_agent.original` | +| [`x509`](0001/x509.yml) | `x509.issuer.distinguished_name`
`x509.subject.distinguished_name` | -The full set of schema files which will be transitioning to `wildcard` are located [here](0001/). +The full set of schema files which will be transitioning to `wildcard` are located in directory [rfcs/text/0001/](0001/). ### Example definition From e94e8a5dabeca5a040cee8da10880c06c1f0989d Mon Sep 17 00:00:00 2001 From: Mathieu Martin Date: Thu, 1 Oct 2020 17:46:39 -0400 Subject: [PATCH 22/30] Add a section about migrating text fields to wildcard --- rfcs/text/0001-wildcard-data-type.md | 17 +++++++++++++++++ 1 file changed, 17 insertions(+) diff --git a/rfcs/text/0001-wildcard-data-type.md b/rfcs/text/0001-wildcard-data-type.md index 3a0e3c88bd..1d5fe1b196 100644 --- a/rfcs/text/0001-wildcard-data-type.md +++ b/rfcs/text/0001-wildcard-data-type.md @@ -448,6 +448,23 @@ Until now ECS has relied only on OSS licensed features, but ECS will also suppor A data shipper which uses the `wildcard` field type may need to verify that the configured output Elasticsearch destination can support it (>= 7.9.0). For example, if a future version of Beats adopts `wildcard` in index mappings, Beats would may need to gracefully handle a scenario where the targeted Elasticsearch instance doesn't support the data type. +### Text fields migrating to wildcard + +ECS currently has two `text` fields that would likely benefit from migrating to `wildcard`. +Doing so on the canonical field (as opposed to adding a multi-field) would be a breaking change. +However adding a `.wildcard` multi-field may cause confusion, as they would be the only +places where `wildcard` appears as a multi-field. + +The fields are: + +- `message` +- `error.message` + +Paradoxically, in some cases they also benefit from the `text` data type. +A prime example is Windows Event Logs' main messages, which is stored in the `message` field. + +The situation is captured here for addressing at a later stage. + ## People The following are the people that consulted on the contents of this RFC. From df4f974fca2235c33ccb9d51fffda6e41b444786 Mon Sep 17 00:00:00 2001 From: Eric Beahan Date: Fri, 2 Oct 2020 13:11:42 -0500 Subject: [PATCH 23/30] fix typo --- rfcs/text/0001-wildcard-data-type.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/rfcs/text/0001-wildcard-data-type.md b/rfcs/text/0001-wildcard-data-type.md index 1d5fe1b196..a79c353295 100644 --- a/rfcs/text/0001-wildcard-data-type.md +++ b/rfcs/text/0001-wildcard-data-type.md @@ -25,7 +25,7 @@ For a field to use wildcard, it will require changing the the field's defined sc | [`destination`](0001/destination.yml) | `destination.domain`
`destination.registered_domain` | | [`dns`](0001/dns.yml) | `dns.question.name`
`dns.answers.data` | | [`error`](0001/error.yml) | `error.stack_trace`
`error.type` | -| [`error`](0001/event.yml) | `event.original` | +| [`event`](0001/event.yml) | `event.original` | | [`file`](0001/file.yml) | `file.directory`
`file.path`
`file.target_path` | | [`geo`](0001/geo.yml) | `geo.name` | | [`host`](0001/host.yml) | `host.hostname`
| From 7a4a3e94fd7da0de22e824691223754a2487fc3e Mon Sep 17 00:00:00 2001 From: Eric Beahan Date: Fri, 2 Oct 2020 13:12:54 -0500 Subject: [PATCH 24/30] add pe.original_file_name --- rfcs/text/0001-wildcard-data-type.md | 1 + rfcs/text/0001/pe.yml | 5 +++++ 2 files changed, 6 insertions(+) create mode 100644 rfcs/text/0001/pe.yml diff --git a/rfcs/text/0001-wildcard-data-type.md b/rfcs/text/0001-wildcard-data-type.md index a79c353295..20f31070f8 100644 --- a/rfcs/text/0001-wildcard-data-type.md +++ b/rfcs/text/0001-wildcard-data-type.md @@ -32,6 +32,7 @@ For a field to use wildcard, it will require changing the the field's defined sc | [`http`](0001/http.yml) | `http.request.referrer`
`http.request.body.content`
`http.response.body.content` | | [`log`](0001/log.yml) | `log.file.path`
`log.logger` | | [`os`](0001/os.yml) | `os.name`
`os.full` | +| [`pe`](0001/pe.yml) | `pe.original_file_name` | | [`process`](0001/process.yml) | `process.command_line`
`process.executable`
`process.name`
`process.title`
`process.working_directory`
| | [`registry`](0001/registry.yml) | `registry.key`
`registry.path`
`registry.data.strings` | | [`server`](0001/server.yml) | `server.domain`
`server.registered_domain` | diff --git a/rfcs/text/0001/pe.yml b/rfcs/text/0001/pe.yml new file mode 100644 index 0000000000..52773c17a4 --- /dev/null +++ b/rfcs/text/0001/pe.yml @@ -0,0 +1,5 @@ +--- + - name: process + fields: + - name: original_final_name + type: wildcard From 560a080da385db5dab3c8af0f9a7e8dfc7319106 Mon Sep 17 00:00:00 2001 From: Eric Beahan Date: Fri, 2 Oct 2020 13:30:12 -0500 Subject: [PATCH 25/30] adding request.referrer --- rfcs/text/0001/http.yml | 2 ++ 1 file changed, 2 insertions(+) diff --git a/rfcs/text/0001/http.yml b/rfcs/text/0001/http.yml index eded72da3d..1722cdc5e7 100644 --- a/rfcs/text/0001/http.yml +++ b/rfcs/text/0001/http.yml @@ -3,5 +3,7 @@ fields: - name: request.body.content type: wildcard + - name: request.referrer + type: wildcard - name: response.body.content type: wildcard From 2ccc1bda46b7bd33c9b1dc03b3604ad15448a555 Mon Sep 17 00:00:00 2001 From: Eric Beahan Date: Fri, 2 Oct 2020 13:31:05 -0500 Subject: [PATCH 26/30] final to file --- rfcs/text/0001/pe.yml | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/rfcs/text/0001/pe.yml b/rfcs/text/0001/pe.yml index 52773c17a4..0052e51b22 100644 --- a/rfcs/text/0001/pe.yml +++ b/rfcs/text/0001/pe.yml @@ -1,5 +1,5 @@ --- - name: process fields: - - name: original_final_name + - name: original_file_name type: wildcard From 34694acfb49b1432b2dc082754872b6feb43766f Mon Sep 17 00:00:00 2001 From: Eric Beahan Date: Fri, 2 Oct 2020 14:17:22 -0500 Subject: [PATCH 27/30] continue not indexing event.original --- rfcs/text/0001/event.yml | 1 - 1 file changed, 1 deletion(-) diff --git a/rfcs/text/0001/event.yml b/rfcs/text/0001/event.yml index 0b50d6f942..07daa3ac87 100644 --- a/rfcs/text/0001/event.yml +++ b/rfcs/text/0001/event.yml @@ -2,5 +2,4 @@ - name: event fields: - name: original - index: true type: wildcard From 688efd90573bf3730d05c6f89fdee0ad6f44cfe7 Mon Sep 17 00:00:00 2001 From: Eric Beahan Date: Fri, 2 Oct 2020 14:53:35 -0500 Subject: [PATCH 28/30] Update rfcs/text/0001/pe.yml Co-authored-by: Mathieu Martin --- rfcs/text/0001/pe.yml | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/rfcs/text/0001/pe.yml b/rfcs/text/0001/pe.yml index 0052e51b22..6e729b39f4 100644 --- a/rfcs/text/0001/pe.yml +++ b/rfcs/text/0001/pe.yml @@ -1,5 +1,5 @@ --- - - name: process + - name: pe fields: - name: original_file_name type: wildcard From 497fb5008187a21cdfd57816651c3b034c45989b Mon Sep 17 00:00:00 2001 From: Eric Beahan Date: Fri, 2 Oct 2020 14:56:39 -0500 Subject: [PATCH 29/30] typo --- rfcs/text/0001-wildcard-data-type.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/rfcs/text/0001-wildcard-data-type.md b/rfcs/text/0001-wildcard-data-type.md index 20f31070f8..2ad927beef 100644 --- a/rfcs/text/0001-wildcard-data-type.md +++ b/rfcs/text/0001-wildcard-data-type.md @@ -155,7 +155,7 @@ The following table is a comparison of `wildcard` vs. `keyword` [2]: | Searched by "all fields" queries | Y | Y | | Disk costs for mostly unique values | high (see *5) | lower (see *5) | | Dist costs for mostly identical values | low (see *5) | medium (see *5) | -| Max character size for a field value | 256 for default JSON string mapping (1024 for ECS), 32766 Luence max | unlimited | +| Max character size for a field value | 256 for default JSON string mapping (1024 for ECS), 32766 Lucene max | unlimited | | Supports normalizers in mappings | Y | N | | Indexing speeds | Fast | Slower (see *6) | From cae4d8bd07764f9df07e3926c2b843e6d358ab14 Mon Sep 17 00:00:00 2001 From: Eric Beahan Date: Fri, 2 Oct 2020 14:58:56 -0500 Subject: [PATCH 30/30] setting advancement date --- rfcs/text/0001-wildcard-data-type.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/rfcs/text/0001-wildcard-data-type.md b/rfcs/text/0001-wildcard-data-type.md index 2ad927beef..448a929f4e 100644 --- a/rfcs/text/0001-wildcard-data-type.md +++ b/rfcs/text/0001-wildcard-data-type.md @@ -2,7 +2,7 @@ - Stage: **2 (draft)** -- Date: **TBD** +- Date: **2020-10-02** Wildcard is a data type for Elasticsearch string fields being introduced in Elasticsearch 7.9. Wildcard optimizes performance for queries using wildcards (`*`) and regex, allowing users to perform `grep`-like searches without the limitations of the existing text[0] and keyword[1] types.