[Draft] Potential ECS mappings in Elasticsearch #89743

ruflin · 2022-08-31T07:54:02Z

This PR is intended to have a conversation about potential mappings for ECS in Elasticsearch. It might not ever get merged into the ES code based but allows to keep a history of the conversation and changes. The core idea of the PR are as following:

Map all core ECS fields through dynamic mappings on the paths. This will make sure only the fields that are contained in each index will be part of the mapping itself
All fields except @timestamp are runtime fields by default and not indexed
long and double values are matched with a path to ensure also string values are mapped correct. For example "7" in JSON needs to be mapped to long.
subobject: false is set to flatten all the documents that are ingested. This ensures no conflicts can happen on fields like host and host.name

Implementation

Even though this PR contains the potential template that could be used for ECS it does not contain any code around the implementation. Implementation could happen in multipel phases:

ECS component template for each ECS version exists in Elasticsearch that can easily be used
ECS config flag in data stream / mapping to use ECS template. This heavily simplifies the usage

Discussions

ECS versions: Should this template be versioned? The current suggestion is yes to prevent any unexpected changes to users. At the same time not versioning it would have the benefits of users get the benefits automatically when upgrading Elasticsearch.
Fields indexed by default: Which fields should be indexed by default?
How do we handle keyword_text multifields?
What about extended ECS fields?

x-pack/plugin/core/src/main/resources/ecs-template.yml

felixbarny · 2022-08-31T07:59:54Z

x-pack/plugin/core/src/main/resources/ecs-template.yml

+      # Matching all long fields
+      # This matches the following fields: bytes, packets, port, duration, severity, pid
+      # No path matching is needed as it is assumed these fields are already in the json doc a long
+      # Elasticsearch would do this mapping automatically but it is specified here for documentation purpose


Personal preference but I think it's easier to reason about when we only add things that differ from the default. It makes the mapping more concise. I do see that it's less explicit and requires readers to know what dynamic: runtime does. Overall, IMHO, it's a better trade-off to not include redundant mappings.

The part that is important for me here is line 70 which documents which ECS fields it should match.

BTW maybe a middle ground that could be found is leave the comments in but comment out the full block for documentation purpose. My assumption is that as soon as this would get loaded into ES / cluster state, comments would be kicked out anyways.

x-pack/plugin/core/src/main/resources/ecs-template.yml

felixbarny · 2022-08-31T08:05:28Z

x-pack/plugin/core/src/main/resources/ecs-template.yml

+          # TODO: Does this also match the top level labels?
+          match: labels
+          runtime:
+            type: flattened


Runtime fields don't support flattened.

See https://www.elastic.co/guide/en/elasticsearch/reference/current/runtime-mapping-fields.html for a list of supported field types.

Just relying on dynamic: runtime seems sufficient for labels. But if we want to index them, either flattened or subobjects: false may be a good choice.

Creating too many fields doesn't seem unique to labels, there may also be many top-level fields.

Labels by nature have a lot of fields as these are for example k8s labels and not fields that are directly user created.

@jsoriano I remember you were involved by the time flattened was introduced but it seems we don't use it for labels today but some other places. Did we just never switch over?

We had this [private] https://github.com/elastic/obs-infraobs-team/issues/461 conversation. The general output was to wait till there is better general support, and evaluate case per case. Now using subobjects may be also a good choice, except on places where we are also worried about fields explosion.

if we do want to use flattened, these would be indexed though, not runtime. How often are these labels queried / aggregated on?

I expect in a k8s use case, filter queries on these could happen frequently, not sure about aggregations. But I would prefer to not have these indexed by default but also allow it in runtime.

Is the following correct: If the field is runtime, keyword vs flattened does not really make a difference. Only when indexed there is a difference between flattened and keyword. So basically as long as it is a runtime, we keep it keyword, only for index we recommend flattened. If yes, this leaves us still with the topic that the number of fields could explode because of lots of labels but this is something that could be overcome by the user based on the setting or with the idea from @felixbarny to have a soft limit for # of fields.

If the field is runtime, keyword vs flattened does not really make a difference.

Runtime fields don't support flattened. Flattened implies a specific way of indexing the fields. If we want to use runtime fields for labels, the best option is to just not map them explicitly and rely on dynamic: runtime.

felixbarny · 2022-08-31T08:08:11Z

x-pack/plugin/core/src/main/resources/ecs-template.yml

+          runtime:
+            # TODO: This is a field type that does not exist today. What ECS does is that for all name fields, it
+            # has a subfield .text to also do text indexing
+            type: text_keyword


Even if text_keyword will be created, it won't be supported for runtime fields. As runtime fields are not analyzed, there's only one type that's valid for string runtime fields, which is keyword.

What should we do about these text fields? Just not have them by default?

Currently, our only option is to map them as keyword. Therefore, we could remove this block and rely on dynamic: runtime.

x-pack/plugin/core/src/main/resources/ecs-template.yml

ruflin · 2022-08-31T08:19:36Z

x-pack/plugin/core/src/main/resources/ecs-template.yml

+      # Matching all long fields
+      # This matches the following fields: bytes, packets, port, duration, severity, pid
+      # No path matching is needed as it is assumed these fields are already in the json doc a long
+      # Elasticsearch would do this mapping automatically but it is specified here for documentation purpose


The part that is important for me here is line 70 which documents which ECS fields it should match.

x-pack/plugin/core/src/main/resources/ecs-template.yml

ruflin · 2022-08-31T08:25:57Z

x-pack/plugin/core/src/main/resources/ecs-template.yml

+          # TODO: Does this also match the top level labels?
+          match: labels
+          runtime:
+            type: flattened


Labels by nature have a lot of fields as these are for example k8s labels and not fields that are directly user created.

@jsoriano I remember you were involved by the time flattened was introduced but it seems we don't use it for labels today but some other places. Did we just never switch over?

ruflin · 2022-08-31T08:26:59Z

x-pack/plugin/core/src/main/resources/ecs-template.yml

+          runtime:
+            # TODO: This is a field type that does not exist today. What ECS does is that for all name fields, it
+            # has a subfield .text to also do text indexing
+            type: text_keyword


What should we do about these text fields? Just not have them by default?

ruflin · 2022-08-31T08:28:24Z

x-pack/plugin/core/src/main/resources/ecs-template.yml

+          match_mapping_type: string
+          match: message
+          runtime:
+            type: match_only_text


@felixbarny Based on https://www.elastic.co/guide/en/elasticsearch/reference/current/runtime-mapping-fields.html it seems this will not work?

Correct, only keyword is supported. I'd just leave out this section and rely on dynamic: runtime. It is somewhat unfortunate that runtime fields don't support full-text searches, only exact matches.

We can prioritize #76261 if needed.

Would be great if we could prioritise it as it is kind of add that one of the most common types in ES is not supported by runtime fields. Even if there would be huge limitations compared to index, I would argue this is still good enough.

x-pack/plugin/core/src/main/resources/ecs-template.yml

felixbarny · 2022-08-31T08:53:57Z

x-pack/plugin/core/src/main/resources/ecs-template.yml

+      # This matches the following fields: bytes, packets, port, duration, severity, pid
+      # No path matching is needed as it is assumed these fields are already in the json doc a long
+      # Elasticsearch would do this mapping automatically but it is specified here for documentation purpose
+      # DISCUSS: If a field "7" as string comes in, will it still be mapped correctly?


No, it would be mapped as a runtime field of type keyword. However, if users override the mapping so that the runtime field has a type of long, I think ES will coerce the type.

That makes me think we should have the path match to have a good out of box experience. I assume the following would do the trick for bytes?

- bytes_match: path_match: bytes runtime: type: long

How many bytes fields are there? One? Is it always there or often or rarely?

It is not just about bytes but all these: bytes, packets, port, duration, severity, pid. There are quite a few, I would say 20-30 (can do an exact count if it helps). If these are there or not heavily depends on the data. If you have anything related to processes, you will have always a pid, if you have access logs, like you will always have a port etc.

I assume the following would do the trick for bytes?

dynamic: runtime should be good enough for all these cases. I don't think we need to (and should) map any of these explicitly.

felixbarny · 2022-08-31T08:55:25Z

x-pack/plugin/core/src/main/resources/ecs-template.yml

+
+# Keyword fields to discuss if should be indexed or not. Proposed rule for discussion. Quorum needs to be reached
+      # for a field to be indexed. Reason is going from runtime to index is easy, otherway around could be a breaking
+      # It should also be discussed if logs-*, metrics-* and traces-* have exact same default.


I suppose each type will come with their own defaults. For example, log.level doesn't make much sense in the context of metrics and traces.

I'll start some grouping of the fields below to make the triaging easier.

x-pack/plugin/core/src/main/resources/ecs-template.yml

javanna · 2022-09-01T09:29:35Z

x-pack/plugin/core/src/main/resources/ecs-template.yml

+      # Match location fields to geopoint
+      - match_location:
+          # TODO: What is the correct matching type here?
+          match_mapping_type: object


nit: when you have name conventions, I think you can safely drop the match_mapping_type

Agree, I had it more as a safeguard. What will happen, if a field is location and it is keyword instead of an object? As it is a runtime field, ES will just ignore it as conversion fails or is this the script error case?

javanna · 2022-09-01T09:30:46Z

x-pack/plugin/core/src/main/resources/ecs-template.yml

+          match_mapping_type: string
+          match: strings
+          runtime:
+            type: wildcard


wildcard is also not supported by runtime fields, this would be keyword already with dynamic:runtime, unless you want it indexed.

Are there any plans to support wildcard?

A more general thought: There are cases like wildcard or text where runtime field will be less powerful then the indexed variant which I would argue is kind of expected. Would it be possible that instead of the person implementing the template having to think this through, ES would internally do a fallback? Lets say wildcard is internally just treated like a keyword field if used as runtime and if the user sends a query with * inside it might return a warning. The reason for the above is that with this we keep the nice property of getting from runtime to index is just moving around the definition instead of having to also select a different type.

felixbarny · 2022-09-06T09:14:24Z

x-pack/plugin/core/src/main/resources/ecs-template.yml

+      - match_ecs.version:
+          path_match: ecs.version
+          # DISCUSS: In ECS this is still a keyword field
+          type: version


This is mapped as an indexed field but is under the ### Runtime mappings section. The version type is currently not supported on runtime fields.

I don't think the ECS version should be an indexed field as I have not seen lots of query on it. I'll add it as runtime field.

Even in ECS today it is a keyword as by the time it was introduced, version did not exist in Elasticsearch. The simplest is to just map it to keyword as this is being supported by Elasticsearch and is what is in ECS. But is it future proof?

felixbarny · 2022-09-06T09:14:37Z

x-pack/plugin/core/src/main/resources/ecs-template.yml

+      # still be adjusted but not something we should have the user do by default.
+      # TODO: if path_match would be an array, this could be defined much nicer and more compact
+      - match_bytes:
+          path_match: bytes


Should these use match instead of path_match so that not only the top-level bytes field but also foo.bytes matches?

Yes, I keep mixing up match and patch_match. The idea is that it should match all *.bytes. Same is true for all the other match_*

felixbarny · 2022-09-06T09:15:59Z

x-pack/plugin/core/src/main/resources/ecs-template.yml

+          runtime:
+            type: ip
+      - match_forwarded_ip:
+          match: forwarded_ip


If we anticipate more ip fields that end with _ip

Suggested change

match: forwarded_ip

match: *_ip

I like the idea. Can't see a reason at the moment why something would be called _ip but not be an ip ...

ruflin · 2023-02-20T13:05:27Z

As a first step, this is now implemented in elastic-package with dynamic templates: elastic/integrations#5055 This allows each integration to adopt it when ready. Eventually, it should become the default in Elasticsearch and be available directly from Elasticsearch.

I'm temporarily closing this PR but I expect it to be reopened / fresh started again as soon as elastic/integrations#5055 is completed.

Initial take on future ECS mappings

62a5665

elasticsearchmachine added v8.5.0 external-contributor Pull request authored by a developer outside the Elasticsearch team labels Aug 31, 2022

ruflin commented Aug 31, 2022

View reviewed changes

x-pack/plugin/core/src/main/resources/ecs-template.yml Outdated Show resolved Hide resolved

felixbarny reviewed Aug 31, 2022

View reviewed changes

ruflin added 3 commits August 31, 2022 10:20

remove float_as_double as json does not have float

732e6b7

adjust to path_match and remove match_mapping_type for message

4e1d9b9

fix geopoint to geo_point

21da68b

ruflin commented Aug 31, 2022

View reviewed changes

update descriptoin

2963e17

felixbarny reviewed Aug 31, 2022

View reviewed changes

x-pack/plugin/core/src/main/resources/ecs-template.yml Show resolved Hide resolved

felixbarny reviewed Aug 31, 2022

View reviewed changes

ruflin commented Aug 31, 2022

View reviewed changes

x-pack/plugin/core/src/main/resources/ecs-template.yml Outdated Show resolved Hide resolved

javanna added :Search Foundations/Mapping Index mappings, including merging and defining field types and removed external-contributor Pull request authored by a developer outside the Elasticsearch team labels Sep 1, 2022

javanna self-assigned this Sep 1, 2022

javanna reviewed Sep 1, 2022

View reviewed changes

x-pack/plugin/core/src/main/resources/ecs-template.yml Show resolved Hide resolved

javanna reviewed Sep 1, 2022

View reviewed changes

ruflin added 4 commits September 5, 2022 15:58

comment out data_stream.* fields

095fa81

add more detailed mappings

325eb3b

clenaup date fields

d8d4744

cleanup double mappings and naming

2542c81

This was referenced Sep 6, 2022

Using ECS fields in Elasticsearch indices / data streams elastic/ecs#1869

Open

Optionally use ECS conventions for dynamic mappings #85692

Closed

ruflin self-assigned this Sep 6, 2022

felixbarny reviewed Sep 6, 2022

View reviewed changes

ruflin mentioned this pull request Sep 6, 2022

Pattern based ECS elastic/ecs#2047

Open

csoulios removed the v8.5.0 label Sep 21, 2022

csoulios added the v8.6.0 label Sep 21, 2022

ruflin mentioned this pull request Nov 16, 2022

Be more lenient when ingesting data elastic/integrations#4659

Closed

kingherc added v8.7.0 and removed v8.6.0 labels Nov 16, 2022

ruflin mentioned this pull request Dec 20, 2022

Inconsistencies in response codes for _bulk API request body JSON errors #92443

Open

P1llus mentioned this pull request Jan 19, 2023

Dynamic ECS mapping progress elastic/integrations#5055

Closed

rjernst added v8.8.0 and removed v8.7.0 labels Feb 8, 2023

javanna removed the v8.8.0 label Feb 16, 2023

ruflin closed this Feb 20, 2023

ruflin mentioned this pull request May 16, 2023

[Logs+] Adding ECS dynamic templates #96171

Merged

[Draft] Potential ECS mappings in Elasticsearch #89743

[Draft] Potential ECS mappings in Elasticsearch #89743

Conversation

ruflin commented Aug 31, 2022 • edited Loading

Implementation

Discussions

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ruflin commented Feb 20, 2023

ruflin commented Aug 31, 2022 •

edited

Loading