From 287a3df66f592f0b75ec1e564284b35053871fd2 Mon Sep 17 00:00:00 2001 From: Nathan Fox Date: Fri, 17 Feb 2023 10:47:53 -0500 Subject: [PATCH 01/20] save --- website/content/en/blog/load-namespacing.md | 77 +++++++++++++++++++++ 1 file changed, 77 insertions(+) create mode 100644 website/content/en/blog/load-namespacing.md diff --git a/website/content/en/blog/load-namespacing.md b/website/content/en/blog/load-namespacing.md new file mode 100644 index 0000000000000..e498a92cb4ff4 --- /dev/null +++ b/website/content/en/blog/load-namespacing.md @@ -0,0 +1,77 @@ +--- +title: Log Namespacing. Changing Vector's data model. +short: Log Namespacing +description: Improving reliability and performance across your entire observability infrastructure +authors: ["fuchsnj"] +date: "2023-??-??" +badges: + type: announcement + domains: ["data model"] +tags: [] +--- + +The Vector team has been hard at work improving the data model of events in Vector. These +changes are now available for beta testing for those who want to try it out and give feedback. +This is an opt-in feature. Nothing should change unless you specifically enable it. + +## Why + +Currently, all data for events is placed at the root of the event, regardless of where the data came +from or how it was obtained. Not only can that make it confusing to understand what a certain field +represents (eg: was the `timestamp` field generated by vector when it was ingested, or is it when +the source originally created the event) but it can easily cause data collisions. + +We have also been working on adding internal type definitions to allow end to end type checking of +data in Vector. While this is not yet fully supported, Log Namespacing removes all the +possible data collisions that made calculating the types difficult. + + +## How to enable + +TODO + +## What + +There are three distinct types of data that Vector handles. + +(Examples are from the `http_client` source) +- Event Data: The decoded event data. (eg: the decoded HTTP body) +- Source Metadata: Metadata provided by the source of the event. (eg: the HTTP headers) +- Vector Metadata: Metadata provided by Vector. (eg: the time when Vector received the event) + +### Event Data +An event in Vector now _only_ contains the event data. + +### Metadata +TODO + + + + +[aimd]: https://en.wikipedia.org/wiki/Additive_increase/multiplicative_decrease +[adaptive_concurrency]: https://github.com/vectordotdev/vector/tree/master/src/sinks/util/adaptive_concurrency +[bruce guenter]: https://github.com/bruceg +[buffer]: /docs/reference/configuration/sinks/http/#buffer +[clickhouse]: /docs/reference/configuration/sinks/clickhouse +[congestion control]: https://en.wikipedia.org/wiki/TCP_congestion_control +[controller]: https://github.com/vectordotdev/vector/blob/master/src/sinks/util/adaptive_concurrency/controller.rs#L23-L31 +[dead ends]: https://github.com/vectordotdev/vector/pull/3671 +[elasticsearch]: /docs/reference/configuration/sinks/elasticsearch +[http_test_server]: https://github.com/vectordotdev/http_test_server +[issue_325]: https://github.com/vectordotdev/vector/issues/3255 +[kubernetes]: https://kubernetes.io +[max_size]: /docs/reference/configuration/sinks/http/#buffer.max_size +[open issue]: https://github.com/vectordotdev/vector/issues/3887 +[performance under load]: https://medium.com/@NetflixTechBlog/performance-under-load-3e6fa9a60581 +[prior_art]: https://github.com/vectordotdev/vector/blob/master/rfcs/2020-04-06-1858-automatically-adjust-request-limits.md#prior-art +[rate limit]: /docs/reference/configuration/sinks/http/#rate-limits-adaptive-concurrency +[rate_limit_duration_secs]: /docs/reference/configuration/sinks/http/#request.rate_limit_duration_secs +[rate_limit_num]: /docs/reference/configuration/sinks/http/#request.rate_limit_num +[request_concurrency]: /docs/reference/configuration/sinks/http/#request.concurrency +[rfc 1858]: https://github.com/vectordotdev/vector/blob/master/rfcs/2020-04-06-1858-automatically-adjust-request-limits.md +[rust]: https://rust-lang.org +[sinks]: /docs/reference/configuration/sinks +[sources]: /docs/reference/configuration/sources +[splunk]: https://splunk.com +[transforms]: /docs/reference/configuration/transforms +[vector]: / From a204d303bc5e0816efb61331c3654087738fe2e2 Mon Sep 17 00:00:00 2001 From: Nathan Fox Date: Tue, 21 Feb 2023 16:06:54 -0500 Subject: [PATCH 02/20] save --- website/content/en/blog/load-namespacing.md | 77 ----------- website/content/en/blog/log-namespacing.md | 141 ++++++++++++++++++++ 2 files changed, 141 insertions(+), 77 deletions(-) delete mode 100644 website/content/en/blog/load-namespacing.md create mode 100644 website/content/en/blog/log-namespacing.md diff --git a/website/content/en/blog/load-namespacing.md b/website/content/en/blog/load-namespacing.md deleted file mode 100644 index e498a92cb4ff4..0000000000000 --- a/website/content/en/blog/load-namespacing.md +++ /dev/null @@ -1,77 +0,0 @@ ---- -title: Log Namespacing. Changing Vector's data model. -short: Log Namespacing -description: Improving reliability and performance across your entire observability infrastructure -authors: ["fuchsnj"] -date: "2023-??-??" -badges: - type: announcement - domains: ["data model"] -tags: [] ---- - -The Vector team has been hard at work improving the data model of events in Vector. These -changes are now available for beta testing for those who want to try it out and give feedback. -This is an opt-in feature. Nothing should change unless you specifically enable it. - -## Why - -Currently, all data for events is placed at the root of the event, regardless of where the data came -from or how it was obtained. Not only can that make it confusing to understand what a certain field -represents (eg: was the `timestamp` field generated by vector when it was ingested, or is it when -the source originally created the event) but it can easily cause data collisions. - -We have also been working on adding internal type definitions to allow end to end type checking of -data in Vector. While this is not yet fully supported, Log Namespacing removes all the -possible data collisions that made calculating the types difficult. - - -## How to enable - -TODO - -## What - -There are three distinct types of data that Vector handles. - -(Examples are from the `http_client` source) -- Event Data: The decoded event data. (eg: the decoded HTTP body) -- Source Metadata: Metadata provided by the source of the event. (eg: the HTTP headers) -- Vector Metadata: Metadata provided by Vector. (eg: the time when Vector received the event) - -### Event Data -An event in Vector now _only_ contains the event data. - -### Metadata -TODO - - - - -[aimd]: https://en.wikipedia.org/wiki/Additive_increase/multiplicative_decrease -[adaptive_concurrency]: https://github.com/vectordotdev/vector/tree/master/src/sinks/util/adaptive_concurrency -[bruce guenter]: https://github.com/bruceg -[buffer]: /docs/reference/configuration/sinks/http/#buffer -[clickhouse]: /docs/reference/configuration/sinks/clickhouse -[congestion control]: https://en.wikipedia.org/wiki/TCP_congestion_control -[controller]: https://github.com/vectordotdev/vector/blob/master/src/sinks/util/adaptive_concurrency/controller.rs#L23-L31 -[dead ends]: https://github.com/vectordotdev/vector/pull/3671 -[elasticsearch]: /docs/reference/configuration/sinks/elasticsearch -[http_test_server]: https://github.com/vectordotdev/http_test_server -[issue_325]: https://github.com/vectordotdev/vector/issues/3255 -[kubernetes]: https://kubernetes.io -[max_size]: /docs/reference/configuration/sinks/http/#buffer.max_size -[open issue]: https://github.com/vectordotdev/vector/issues/3887 -[performance under load]: https://medium.com/@NetflixTechBlog/performance-under-load-3e6fa9a60581 -[prior_art]: https://github.com/vectordotdev/vector/blob/master/rfcs/2020-04-06-1858-automatically-adjust-request-limits.md#prior-art -[rate limit]: /docs/reference/configuration/sinks/http/#rate-limits-adaptive-concurrency -[rate_limit_duration_secs]: /docs/reference/configuration/sinks/http/#request.rate_limit_duration_secs -[rate_limit_num]: /docs/reference/configuration/sinks/http/#request.rate_limit_num -[request_concurrency]: /docs/reference/configuration/sinks/http/#request.concurrency -[rfc 1858]: https://github.com/vectordotdev/vector/blob/master/rfcs/2020-04-06-1858-automatically-adjust-request-limits.md -[rust]: https://rust-lang.org -[sinks]: /docs/reference/configuration/sinks -[sources]: /docs/reference/configuration/sources -[splunk]: https://splunk.com -[transforms]: /docs/reference/configuration/transforms -[vector]: / diff --git a/website/content/en/blog/log-namespacing.md b/website/content/en/blog/log-namespacing.md new file mode 100644 index 0000000000000..ea9d3088ac5ba --- /dev/null +++ b/website/content/en/blog/log-namespacing.md @@ -0,0 +1,141 @@ +--- +title: Log Namespacing. Changing Vector's data model. +short: Log Namespacing +description: Improving reliability and performance across your entire observability infrastructure +authors: ["fuchsnj"] +date: "2023-??-??" +badges: + type: announcement + domains: ["data model"] +tags: [] +--- + +The Vector team has been hard at work improving the data model of events in Vector. These +changes are now available for beta testing for those who want to try it out and give feedback. +This is an opt-in feature. Nothing should change unless you specifically enable it. + +## Why + +Currently, all data for events is placed at the root of the event, regardless of where the data came +from or how it was obtained. Not only can that make it confusing to understand what a certain field +represents (eg: was the `timestamp` field generated by vector when it was ingested, or is it when +the source originally created the event) but it can easily cause data collisions. + +We have also been working on adding internal type definitions to allow end to end type checking of +data in Vector. While this is not yet fully supported, Log Namespacing removes all the +possible data collisions that made calculating the types difficult. + + +## How to enable + +The global Vector configuration `schema.log_namespace` can be set to `true` to enable the new +Log Namespacing feature for all components. The default is `false`. + +Every source also has a `log_namespace` config option. This will override the global setting, +so you can try out Log Namespacing on individual sources. + +The following example enables the `log_namespace` feature globally, then disables it for a single +source. + +```toml +schema.log_namespace = true + +[sources.input_with_log_namespace] +type = "demo_logs" +format = "shuffle" +lines = ["input_with_log_namespace"] +interval = 1 + +[sources.input_without_log_namespace] +type = "demo_logs" +format = "shuffle" +lines = ["input_without_log_namespace"] +interval = 1 +log_namespace = false + +[sinks.console] +type = "console" +inputs = ["input_with_log_namespace", "input_without_log_namespace"] +encoding.codec = "json" + +``` + +## Features + + +### Data Layout + +There are three distinct types of data that Vector handles. + +(Examples are from the `http_client` source) +- Event Data: The decoded event data. (eg: the decoded HTTP body) +- Source Metadata: Metadata provided by the source of the event. (eg: the HTTP headers) +- Vector Metadata: Metadata provided by Vector. (eg: the time when Vector received the event) + +#### Without Log Namespacing +All three of these are placed at the root of the event. The exact layout depends on the source, +some fields are configurable, and the [global log schema] can change the name / location of some +fields. + +Example event from the `datadog agent logs` source (with JSON decoder) + +```json +{ + "ddsource": "keaton", + "ddtags": "env:prod", + "hostname": "alpha", + "foo": "foo field", + "service": "cernan", + "source_type": "datadog_agent", + "bar": "bar field", + "status": "warning", + "timestamp": "1970-02-14T20:44:57.570Z" +} +``` + +#### With Log Namespacing +When enabled, the layout of this data is well-defined and consistent. + +Event Data (and _only_ Event Data) is placed at the root of the event (eg: `.`). +Source metadata is placed in event metadata, prefixed by the source name. (eg: `%datadog_agent`) +Vector metadata is placed in event metadata, prefixed by `vector`. (eg: `%vector`) + +Example event from the `datadog agent logs` source. (same data as the example above) + +Event root (`.`) +```json +{ + "foo": "foo field", + "bar": "bar field" +} +``` + +Source metadata fields (`%datadog_agent`) + +```json +{ + "ddsource": "keaton", + "ddtags": "env:prod", + "hostname": "alpha", + "service": "cernan", + "status": "warning", + "timestamp": "1970-02-14T20:44:57.570Z" + } +``` + +Source vector fields (`%vector`) +```json +{ +"source_type": "datadog_agent", +"ingest_timestamp": "1970-02-14T20:44:58.236Z" +} +``` + +### Global Log Schema + +The global log schema will no longer be used when Log Namespacing is enabled. + + + + +[global log schema]: /docs/reference/configuration/global-options/#log_schema From 5cc4988a2db5b18c9b30bc89dadf60f31f69c11c Mon Sep 17 00:00:00 2001 From: Nathan Fox Date: Wed, 22 Feb 2023 14:02:57 -0500 Subject: [PATCH 03/20] save --- website/content/en/blog/log-namespacing.md | 14 ++++++-- .../remap/functions/set_semantic_meaning.cue | 36 +++++++++++++++++++ 2 files changed, 48 insertions(+), 2 deletions(-) create mode 100644 website/cue/reference/remap/functions/set_semantic_meaning.cue diff --git a/website/content/en/blog/log-namespacing.md b/website/content/en/blog/log-namespacing.md index ea9d3088ac5ba..fdb64bfd544dc 100644 --- a/website/content/en/blog/log-namespacing.md +++ b/website/content/en/blog/log-namespacing.md @@ -131,11 +131,21 @@ Source vector fields (`%vector`) } ``` -### Global Log Schema +### Semantic Meaning -The global log schema will no longer be used when Log Namespacing is enabled. +Before Log Namespacing, Vector used the [global log schema] to keep certain types of information +at known locations. This is changing, and when log namespacing is enabled, the [global log schema] +will no longer be used. To replace it, a new feature called "semantic meaning" will be used instead. +This allows assigning meaning to different fields of an event, which allows sinks to access +information needed, such as timestamps, hostname, the message, etc. + +Semantic meaning will automatically be assigned by all sources. Sinks will check on startup to make +sure a meaning exists for all required fields. If a source does not provide a required field, or +a meaning needs to be manually adjusted for any reason, the VRL function [set_semantic_meaning] can +be used. [global log schema]: /docs/reference/configuration/global-options/#log_schema +[set_semantic_meaning]: /docs/reference/vrl/functions/#set_semantic_meaning diff --git a/website/cue/reference/remap/functions/set_semantic_meaning.cue b/website/cue/reference/remap/functions/set_semantic_meaning.cue new file mode 100644 index 0000000000000..0cc13a7fe1de7 --- /dev/null +++ b/website/cue/reference/remap/functions/set_semantic_meaning.cue @@ -0,0 +1,36 @@ +package metadata + +remap: functions: set_semantic_meaning: { + category: "Event" + description: """ + Sets a semantic meaning for an event. Note that this function assigns + meaning at Vector startup, and has _no_ runtime behavior. It is suggested + to put all calls to this function at the beginning of a VRL function. The function + cannot be conditionally called (eg: using an if statement cannot stop the meaning + from being assigned). + """ + + arguments: [ + { + name: "key" + description: """ + The name of the secret. + """ + required: true + type: ["string"] + }, + ] + internal_failure_reasons: [ + ] + return: types: ["string"] + + examples: [ + { + title: "Get the Datadog API key from the event metadata." + source: #""" + get_secret("datadog_api_key") + """# + return: "secret value" + }, + ] +} From cf3e1bc3235556862c872ad8c68c27a715cfbdab Mon Sep 17 00:00:00 2001 From: Nathan Fox Date: Thu, 23 Feb 2023 14:11:11 -0500 Subject: [PATCH 04/20] save --- website/content/en/blog/log-namespacing.md | 15 ++++++++----- website/cue/reference/configuration.cue | 17 +++++++++++++- .../remap/functions/set_semantic_meaning.cue | 22 +++++++++++++------ website/cue/reference/urls.cue | 1 + 4 files changed, 42 insertions(+), 13 deletions(-) diff --git a/website/content/en/blog/log-namespacing.md b/website/content/en/blog/log-namespacing.md index fdb64bfd544dc..3605faa0b3c11 100644 --- a/website/content/en/blog/log-namespacing.md +++ b/website/content/en/blog/log-namespacing.md @@ -1,12 +1,12 @@ --- title: Log Namespacing. Changing Vector's data model. short: Log Namespacing -description: Improving reliability and performance across your entire observability infrastructure +description: Introducing log namespacing. authors: ["fuchsnj"] date: "2023-??-??" badges: type: announcement - domains: ["data model"] + domains: ["log namespacing"] tags: [] --- @@ -60,7 +60,7 @@ encoding.codec = "json" ``` -## Features +## How It Works ### Data Layout @@ -100,6 +100,12 @@ Event Data (and _only_ Event Data) is placed at the root of the event (eg: `.`). Source metadata is placed in event metadata, prefixed by the source name. (eg: `%datadog_agent`) Vector metadata is placed in event metadata, prefixed by `vector`. (eg: `%vector`) +Generally sinks will only send the event data. If you want to include any metadata fields, +it's recommended to use a [remap] transform to add data to the event as needed. + +It's important to note that previously the type of an event (`.`) would always be an object +with fields. Now it is possible for event to be any type, such as a string. + Example event from the `datadog agent logs` source. (same data as the example above) Event root (`.`) @@ -145,7 +151,6 @@ a meaning needs to be manually adjusted for any reason, the VRL function [set_se be used. - - [global log schema]: /docs/reference/configuration/global-options/#log_schema [set_semantic_meaning]: /docs/reference/vrl/functions/#set_semantic_meaning +[remap]: /docs/reference/configuration/transforms/remap/ diff --git a/website/cue/reference/configuration.cue b/website/cue/reference/configuration.cue index 6fcacb5c6c3c6..976fa567c3bdf 100644 --- a/website/cue/reference/configuration.cue +++ b/website/cue/reference/configuration.cue @@ -252,12 +252,27 @@ configuration: { } } + log_namespacing: { + common: false + description: """ + Globally enables / disables log namespacing. See [Log Namespacing](\(urls.log_namespacing)) + for more details. If you want to enable individual sources, there is a config + option in the source configuration. + """ + required: false + description: "Controls if log namespacing will be enabled globally." + warnings: [] + required: false + type: bool: default: false + } + log_schema: { common: false description: """ Configures default log schema for all events. This is used by - Vector source components to assign the fields on incoming + Vector components to assign the fields on incoming events. + These values are ignored if log namespacing is enabled. (See [Log Namespacing](\(urls.log_namespacing))) """ required: false type: object: { diff --git a/website/cue/reference/remap/functions/set_semantic_meaning.cue b/website/cue/reference/remap/functions/set_semantic_meaning.cue index 0cc13a7fe1de7..6cb076a5b890e 100644 --- a/website/cue/reference/remap/functions/set_semantic_meaning.cue +++ b/website/cue/reference/remap/functions/set_semantic_meaning.cue @@ -12,25 +12,33 @@ remap: functions: set_semantic_meaning: { arguments: [ { - name: "key" + name: "target" description: """ - The name of the secret. + The path of the value that will be assigned a meaning. """ required: true - type: ["string"] + type: ["path"] + }, + { + name: "meaning" + description: """ + The name of the meaning to assign. + """ + required: true + type: ["path"] }, ] internal_failure_reasons: [ ] - return: types: ["string"] + return: types: ["null"] examples: [ { - title: "Get the Datadog API key from the event metadata." + title: "Sets custom field semantic meaning" source: #""" - get_secret("datadog_api_key") + set_semantic_meaning(.foo, "bar") """# - return: "secret value" + return: "null" }, ] } diff --git a/website/cue/reference/urls.cue b/website/cue/reference/urls.cue index babc130d6d0c8..54fb7f6233174 100644 --- a/website/cue/reference/urls.cue +++ b/website/cue/reference/urls.cue @@ -309,6 +309,7 @@ urls: { logfmt_specs: "https://pkg.go.dev/github.com/kr/logfmt#section-documentation" logstash: "https://www.elastic.co/logstash" logstash_protocol: "https://github.com/elastic/logstash-forwarder/blob/master/PROTOCOL.md" + log_namespacing_blog: "/blog/log-namespacing/" loki: "https://grafana.com/oss/loki/" loki_multi_tenancy: "\(github)/grafana/loki/blob/master/docs/operations/multi-tenancy.md" log_event_source: "\(vector_repo)/blob/master/src/event/" From c21cbb1db406f955872adf61f45512b110c4d014 Mon Sep 17 00:00:00 2001 From: Nathan Fox Date: Thu, 23 Feb 2023 14:15:34 -0500 Subject: [PATCH 05/20] fix url --- website/cue/reference/configuration.cue | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/website/cue/reference/configuration.cue b/website/cue/reference/configuration.cue index 976fa567c3bdf..7d81e2ff049d7 100644 --- a/website/cue/reference/configuration.cue +++ b/website/cue/reference/configuration.cue @@ -255,7 +255,7 @@ configuration: { log_namespacing: { common: false description: """ - Globally enables / disables log namespacing. See [Log Namespacing](\(urls.log_namespacing)) + Globally enables / disables log namespacing. See [Log Namespacing](\(urls.log_namespacing_blog)) for more details. If you want to enable individual sources, there is a config option in the source configuration. """ @@ -272,7 +272,7 @@ configuration: { Configures default log schema for all events. This is used by Vector components to assign the fields on incoming events. - These values are ignored if log namespacing is enabled. (See [Log Namespacing](\(urls.log_namespacing))) + These values are ignored if log namespacing is enabled. (See [Log Namespacing](\(urls.log_namespacing_blog))) """ required: false type: object: { From 63145175b9ab759b4dec28978c1736dd1535cf27 Mon Sep 17 00:00:00 2001 From: Nathan Fox Date: Thu, 23 Feb 2023 15:25:33 -0500 Subject: [PATCH 06/20] fix --- website/cue/reference/configuration.cue | 2 -- 1 file changed, 2 deletions(-) diff --git a/website/cue/reference/configuration.cue b/website/cue/reference/configuration.cue index 7d81e2ff049d7..39d578286065b 100644 --- a/website/cue/reference/configuration.cue +++ b/website/cue/reference/configuration.cue @@ -260,9 +260,7 @@ configuration: { option in the source configuration. """ required: false - description: "Controls if log namespacing will be enabled globally." warnings: [] - required: false type: bool: default: false } From 12c73c93207fb04b9ce35eccbc58456a8152d118 Mon Sep 17 00:00:00 2001 From: Nathan Fox Date: Thu, 23 Feb 2023 15:50:15 -0500 Subject: [PATCH 07/20] test --- website/content/en/blog/log-namespacing.md | 3 ++- website/layouts/partials/data.html | 4 ++++ 2 files changed, 6 insertions(+), 1 deletion(-) diff --git a/website/content/en/blog/log-namespacing.md b/website/content/en/blog/log-namespacing.md index 3605faa0b3c11..17bf9a27c9895 100644 --- a/website/content/en/blog/log-namespacing.md +++ b/website/content/en/blog/log-namespacing.md @@ -28,7 +28,7 @@ possible data collisions that made calculating the types difficult. ## How to enable -The global Vector configuration `schema.log_namespace` can be set to `true` to enable the new +The [global config] `schema.log_namespace` can be set to `true` to enable the new Log Namespacing feature for all components. The default is `false`. Every source also has a `log_namespace` config option. This will override the global setting, @@ -154,3 +154,4 @@ be used. [global log schema]: /docs/reference/configuration/global-options/#log_schema [set_semantic_meaning]: /docs/reference/vrl/functions/#set_semantic_meaning [remap]: /docs/reference/configuration/transforms/remap/ +[global config]: /docs/reference/configuration/global-options/#log_namespacing diff --git a/website/layouts/partials/data.html b/website/layouts/partials/data.html index 9a7b71d4fd8a1..a53d91eccd50e 100644 --- a/website/layouts/partials/data.html +++ b/website/layouts/partials/data.html @@ -257,6 +257,10 @@ +
+ Test test +
+
{{ template "logs_output" . }}
From 8661cbfe9c8707110a05a05b192324fd3600f228 Mon Sep 17 00:00:00 2001 From: Nathan Fox Date: Thu, 23 Feb 2023 16:08:28 -0500 Subject: [PATCH 08/20] add warning --- website/content/en/blog/log-namespacing.md | 4 +-- website/layouts/partials/data.html | 37 +++++++++++++++++++++- 2 files changed, 38 insertions(+), 3 deletions(-) diff --git a/website/content/en/blog/log-namespacing.md b/website/content/en/blog/log-namespacing.md index 17bf9a27c9895..fda741d074b2b 100644 --- a/website/content/en/blog/log-namespacing.md +++ b/website/content/en/blog/log-namespacing.md @@ -1,7 +1,7 @@ --- -title: Log Namespacing. Changing Vector's data model. +title: Log Namespacing: Changing Vector's data model. short: Log Namespacing -description: Introducing log namespacing. +description: Introducing log namespacing authors: ["fuchsnj"] date: "2023-??-??" badges: diff --git a/website/layouts/partials/data.html b/website/layouts/partials/data.html index a53d91eccd50e..aedd4304fab58 100644 --- a/website/layouts/partials/data.html +++ b/website/layouts/partials/data.html @@ -258,7 +258,42 @@
- Test test + The fields shown below will be different if log namespacing is enabled. + See Log Namespacing for + more details +
+ +
+ +

+ Warning + +

+
+
+
+ + + +
+
The fields shown below will be + different if log namespacing is enabled. + See Log Namespacing for + more details +
+
From 730ec6e5c3d9cd7b1aab850b448d14c0f50f0714 Mon Sep 17 00:00:00 2001 From: Nathan Fox Date: Thu, 23 Feb 2023 16:22:14 -0500 Subject: [PATCH 09/20] fix --- website/content/en/blog/log-namespacing.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/website/content/en/blog/log-namespacing.md b/website/content/en/blog/log-namespacing.md index fda741d074b2b..209e72b8c5ee7 100644 --- a/website/content/en/blog/log-namespacing.md +++ b/website/content/en/blog/log-namespacing.md @@ -1,7 +1,7 @@ --- -title: Log Namespacing: Changing Vector's data model. +title: Log Namespacing short: Log Namespacing -description: Introducing log namespacing +description: Changing Vector's data model authors: ["fuchsnj"] date: "2023-??-??" badges: From bc3198538e4b309ef72d12e39e94d860f916ebb3 Mon Sep 17 00:00:00 2001 From: Nathan Fox Date: Thu, 23 Feb 2023 16:31:12 -0500 Subject: [PATCH 10/20] cleanup --- website/layouts/partials/data.html | 6 ------ 1 file changed, 6 deletions(-) diff --git a/website/layouts/partials/data.html b/website/layouts/partials/data.html index aedd4304fab58..3feefe90cd353 100644 --- a/website/layouts/partials/data.html +++ b/website/layouts/partials/data.html @@ -257,12 +257,6 @@ -
- The fields shown below will be different if log namespacing is enabled. - See Log Namespacing for - more details -
-
From c590b7ae14f95c32e7e54f3c8bec2311d64811c5 Mon Sep 17 00:00:00 2001 From: Nathan Fox Date: Fri, 24 Feb 2023 08:51:44 -0500 Subject: [PATCH 11/20] fix docs test --- website/cue/reference/remap/functions/set_semantic_meaning.cue | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/website/cue/reference/remap/functions/set_semantic_meaning.cue b/website/cue/reference/remap/functions/set_semantic_meaning.cue index 6cb076a5b890e..fab89ec2fb307 100644 --- a/website/cue/reference/remap/functions/set_semantic_meaning.cue +++ b/website/cue/reference/remap/functions/set_semantic_meaning.cue @@ -38,7 +38,7 @@ remap: functions: set_semantic_meaning: { source: #""" set_semantic_meaning(.foo, "bar") """# - return: "null" + return: null }, ] } From f7d95a87f0f6abc954b17bf085b6659b91abdc9f Mon Sep 17 00:00:00 2001 From: Nathan Fox Date: Fri, 24 Feb 2023 09:22:02 -0500 Subject: [PATCH 12/20] fix cue formatting --- website/cue/reference/configuration.cue | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/website/cue/reference/configuration.cue b/website/cue/reference/configuration.cue index 39d578286065b..eda6d1bce483d 100644 --- a/website/cue/reference/configuration.cue +++ b/website/cue/reference/configuration.cue @@ -253,26 +253,26 @@ configuration: { } log_namespacing: { - common: false + common: false description: """ Globally enables / disables log namespacing. See [Log Namespacing](\(urls.log_namespacing_blog)) for more details. If you want to enable individual sources, there is a config option in the source configuration. """ - required: false + required: false warnings: [] type: bool: default: false } log_schema: { - common: false + common: false description: """ Configures default log schema for all events. This is used by Vector components to assign the fields on incoming events. These values are ignored if log namespacing is enabled. (See [Log Namespacing](\(urls.log_namespacing_blog))) """ - required: false + required: false type: object: { examples: [] options: { From e91a32f0f1722babd1e9d427d9938f73d49d90db Mon Sep 17 00:00:00 2001 From: Nathan Fox Date: Wed, 28 Jun 2023 13:49:03 -0400 Subject: [PATCH 13/20] blog updates --- website/content/en/blog/log-namespacing.md | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-) diff --git a/website/content/en/blog/log-namespacing.md b/website/content/en/blog/log-namespacing.md index 209e72b8c5ee7..c4a9d117036f6 100644 --- a/website/content/en/blog/log-namespacing.md +++ b/website/content/en/blog/log-namespacing.md @@ -21,9 +21,8 @@ from or how it was obtained. Not only can that make it confusing to understand w represents (eg: was the `timestamp` field generated by vector when it was ingested, or is it when the source originally created the event) but it can easily cause data collisions. -We have also been working on adding internal type definitions to allow end to end type checking of -data in Vector. While this is not yet fully supported, Log Namespacing removes all the -possible data collisions that made calculating the types difficult. +Log namespacing also unblocks powerful features being worked on, such as end-to-end type checking +of events in Vector. ## How to enable From 3965bdcbb5def8baca5528cf6b5608052f1359f0 Mon Sep 17 00:00:00 2001 From: Nathan Fox Date: Wed, 28 Jun 2023 14:02:56 -0400 Subject: [PATCH 14/20] change fake source name because spell check doesnt like it --- website/content/en/blog/log-namespacing.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/website/content/en/blog/log-namespacing.md b/website/content/en/blog/log-namespacing.md index c4a9d117036f6..948329a0176ec 100644 --- a/website/content/en/blog/log-namespacing.md +++ b/website/content/en/blog/log-namespacing.md @@ -80,7 +80,7 @@ Example event from the `datadog agent logs` source (with JSON decoder) ```json { - "ddsource": "keaton", + "ddsource": "vector", "ddtags": "env:prod", "hostname": "alpha", "foo": "foo field", @@ -119,7 +119,7 @@ Source metadata fields (`%datadog_agent`) ```json { - "ddsource": "keaton", + "ddsource": "vector", "ddtags": "env:prod", "hostname": "alpha", "service": "cernan", From ca90422c39ae295fc0e83110f0de85d3ca5e0892 Mon Sep 17 00:00:00 2001 From: Nathan Fox Date: Thu, 29 Jun 2023 12:23:09 -0400 Subject: [PATCH 15/20] Apply suggestions from code review Co-authored-by: Spencer Gilbert --- website/content/en/blog/log-namespacing.md | 23 +++++++++++----------- 1 file changed, 11 insertions(+), 12 deletions(-) diff --git a/website/content/en/blog/log-namespacing.md b/website/content/en/blog/log-namespacing.md index 948329a0176ec..6ce275d93b098 100644 --- a/website/content/en/blog/log-namespacing.md +++ b/website/content/en/blog/log-namespacing.md @@ -18,7 +18,7 @@ This is an opt-in feature. Nothing should change unless you specifically enable Currently, all data for events is placed at the root of the event, regardless of where the data came from or how it was obtained. Not only can that make it confusing to understand what a certain field -represents (eg: was the `timestamp` field generated by vector when it was ingested, or is it when +represents (eg: was the `timestamp` field generated by Vector when it was ingested, or is it when the source originally created the event) but it can easily cause data collisions. Log namespacing also unblocks powerful features being worked on, such as end-to-end type checking @@ -76,7 +76,7 @@ All three of these are placed at the root of the event. The exact layout depends some fields are configurable, and the [global log schema] can change the name / location of some fields. -Example event from the `datadog agent logs` source (with JSON decoder) +Example log event from the `datadog_agent` source (with the JSON decoder) ```json { @@ -119,20 +119,20 @@ Source metadata fields (`%datadog_agent`) ```json { - "ddsource": "vector", - "ddtags": "env:prod", - "hostname": "alpha", - "service": "cernan", - "status": "warning", - "timestamp": "1970-02-14T20:44:57.570Z" - } + "ddsource": "vector", + "ddtags": "env:prod", + "hostname": "alpha", + "service": "cernan", + "status": "warning", + "timestamp": "1970-02-14T20:44:57.570Z" +} ``` Source vector fields (`%vector`) ```json { -"source_type": "datadog_agent", -"ingest_timestamp": "1970-02-14T20:44:58.236Z" + "source_type": "datadog_agent", + "ingest_timestamp": "1970-02-14T20:44:58.236Z" } ``` @@ -149,7 +149,6 @@ sure a meaning exists for all required fields. If a source does not provide a re a meaning needs to be manually adjusted for any reason, the VRL function [set_semantic_meaning] can be used. - [global log schema]: /docs/reference/configuration/global-options/#log_schema [set_semantic_meaning]: /docs/reference/vrl/functions/#set_semantic_meaning [remap]: /docs/reference/configuration/transforms/remap/ From c77ff54ad87351be68bc43e5c5ee714669fd7afa Mon Sep 17 00:00:00 2001 From: Nathan Fox Date: Fri, 30 Jun 2023 09:01:50 -0400 Subject: [PATCH 16/20] updates --- website/content/en/blog/log-namespacing.md | 24 +++++++++++++++------- 1 file changed, 17 insertions(+), 7 deletions(-) diff --git a/website/content/en/blog/log-namespacing.md b/website/content/en/blog/log-namespacing.md index 948329a0176ec..02a09bd91882a 100644 --- a/website/content/en/blog/log-namespacing.md +++ b/website/content/en/blog/log-namespacing.md @@ -6,7 +6,7 @@ authors: ["fuchsnj"] date: "2023-??-??" badges: type: announcement - domains: ["log namespacing"] + domains: ["data model"] tags: [] --- @@ -24,7 +24,6 @@ the source originally created the event) but it can easily cause data collisions Log namespacing also unblocks powerful features being worked on, such as end-to-end type checking of events in Vector. - ## How to enable The [global config] `schema.log_namespace` can be set to `true` to enable the new @@ -61,14 +60,13 @@ encoding.codec = "json" ## How It Works - ### Data Layout There are three distinct types of data that Vector handles. -(Examples are from the `http_client` source) -- Event Data: The decoded event data. (eg: the decoded HTTP body) -- Source Metadata: Metadata provided by the source of the event. (eg: the HTTP headers) +(Examples are from the `datadog_agent` source) +- Event Data: The decoded event data. (eg: the log itself) +- Source Metadata: Metadata provided by the source of the event. (eg: hostname / tags) - Vector Metadata: Metadata provided by Vector. (eg: the time when Vector received the event) #### Without Log Namespacing @@ -105,7 +103,7 @@ it's recommended to use a [remap] transform to add data to the event as needed. It's important to note that previously the type of an event (`.`) would always be an object with fields. Now it is possible for event to be any type, such as a string. -Example event from the `datadog agent logs` source. (same data as the example above) +Example log event from the `datadog agent` source. (same data as the example above) Event root (`.`) ```json @@ -136,6 +134,18 @@ Source vector fields (`%vector`) } ``` +Here is a sample VRL script accessing different parts of an event when log namespacing is enabled. + +```coffee +event = . +field_from_event = .foo + +all_metadata = % +tags = %datadog_agent.ddtags +timestamp = %vector.ingest_timestamp + +``` + ### Semantic Meaning Before Log Namespacing, Vector used the [global log schema] to keep certain types of information From 4cdd37a6ef7bd3251b6b230525d92d8d3aed6e37 Mon Sep 17 00:00:00 2001 From: Nathan Fox Date: Fri, 30 Jun 2023 09:04:55 -0400 Subject: [PATCH 17/20] fix markdown format --- website/content/en/blog/log-namespacing.md | 5 +++++ 1 file changed, 5 insertions(+) diff --git a/website/content/en/blog/log-namespacing.md b/website/content/en/blog/log-namespacing.md index 7982c3ac1b08d..11083de86036f 100644 --- a/website/content/en/blog/log-namespacing.md +++ b/website/content/en/blog/log-namespacing.md @@ -65,11 +65,13 @@ encoding.codec = "json" There are three distinct types of data that Vector handles. (Examples are from the `datadog_agent` source) + - Event Data: The decoded event data. (eg: the log itself) - Source Metadata: Metadata provided by the source of the event. (eg: hostname / tags) - Vector Metadata: Metadata provided by Vector. (eg: the time when Vector received the event) #### Without Log Namespacing + All three of these are placed at the root of the event. The exact layout depends on the source, some fields are configurable, and the [global log schema] can change the name / location of some fields. @@ -91,6 +93,7 @@ Example log event from the `datadog_agent` source (with the JSON decoder) ``` #### With Log Namespacing + When enabled, the layout of this data is well-defined and consistent. Event Data (and _only_ Event Data) is placed at the root of the event (eg: `.`). @@ -106,6 +109,7 @@ with fields. Now it is possible for event to be any type, such as a string. Example log event from the `datadog agent` source. (same data as the example above) Event root (`.`) + ```json { "foo": "foo field", @@ -127,6 +131,7 @@ Source metadata fields (`%datadog_agent`) ``` Source vector fields (`%vector`) + ```json { "source_type": "datadog_agent", From f989f0cf1db93713355fe6ae532c724bfd226bf1 Mon Sep 17 00:00:00 2001 From: Nathan Fox Date: Fri, 30 Jun 2023 10:18:03 -0400 Subject: [PATCH 18/20] reword sentence --- website/content/en/blog/log-namespacing.md | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/website/content/en/blog/log-namespacing.md b/website/content/en/blog/log-namespacing.md index 11083de86036f..0fe360a237ded 100644 --- a/website/content/en/blog/log-namespacing.md +++ b/website/content/en/blog/log-namespacing.md @@ -62,8 +62,7 @@ encoding.codec = "json" ### Data Layout -There are three distinct types of data that Vector handles. - +When handling log events, information is categorized into one of the following groups: (Examples are from the `datadog_agent` source) - Event Data: The decoded event data. (eg: the log itself) From af93a41c3ff0d37fc8bfebabeb6f7f8d4604e6fc Mon Sep 17 00:00:00 2001 From: Nathan Fox Date: Fri, 30 Jun 2023 10:19:44 -0400 Subject: [PATCH 19/20] Update website/cue/reference/remap/functions/set_semantic_meaning.cue Co-authored-by: Spencer Gilbert --- website/cue/reference/remap/functions/set_semantic_meaning.cue | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/website/cue/reference/remap/functions/set_semantic_meaning.cue b/website/cue/reference/remap/functions/set_semantic_meaning.cue index fab89ec2fb307..d21ca5b121b27 100644 --- a/website/cue/reference/remap/functions/set_semantic_meaning.cue +++ b/website/cue/reference/remap/functions/set_semantic_meaning.cue @@ -25,7 +25,7 @@ remap: functions: set_semantic_meaning: { The name of the meaning to assign. """ required: true - type: ["path"] + type: ["string"] }, ] internal_failure_reasons: [ From 5a20ea9d46281ba8dc396265ecb7f23408a213a0 Mon Sep 17 00:00:00 2001 From: Nathan Fox Date: Fri, 30 Jun 2023 10:43:06 -0400 Subject: [PATCH 20/20] add date --- website/content/en/blog/log-namespacing.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/website/content/en/blog/log-namespacing.md b/website/content/en/blog/log-namespacing.md index 0fe360a237ded..4516b5685794b 100644 --- a/website/content/en/blog/log-namespacing.md +++ b/website/content/en/blog/log-namespacing.md @@ -3,7 +3,7 @@ title: Log Namespacing short: Log Namespacing description: Changing Vector's data model authors: ["fuchsnj"] -date: "2023-??-??" +date: "2023-06-30" badges: type: announcement domains: ["data model"]