-
Notifications
You must be signed in to change notification settings - Fork 2k
chore(docs): Add Log Namespacing docs #16571
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
287a3df
a204d30
5cc4988
cf3e1bc
c21cbb1
6314517
12c73c9
8661cbf
730ec6e
bc31985
c590b7a
f7d95a8
1faac62
e91a32f
3965bdc
ca90422
c77ff54
415c215
4cdd37a
f989f0c
af93a41
5a20ea9
40bd4d6
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,169 @@ | ||
| --- | ||
| title: Log Namespacing | ||
| short: Log Namespacing | ||
| description: Changing Vector's data model | ||
| authors: ["fuchsnj"] | ||
| date: "2023-06-30" | ||
| badges: | ||
| type: announcement | ||
| domains: ["data model"] | ||
| tags: [] | ||
| --- | ||
|
|
||
| The Vector team has been hard at work improving the data model of events in Vector. These | ||
| changes are now available for beta testing for those who want to try it out and give feedback. | ||
| This is an opt-in feature. Nothing should change unless you specifically enable it. | ||
|
|
||
| ## Why | ||
|
|
||
| Currently, all data for events is placed at the root of the event, regardless of where the data came | ||
| from or how it was obtained. Not only can that make it confusing to understand what a certain field | ||
| represents (eg: was the `timestamp` field generated by Vector when it was ingested, or is it when | ||
| the source originally created the event) but it can easily cause data collisions. | ||
|
|
||
| Log namespacing also unblocks powerful features being worked on, such as end-to-end type checking | ||
| of events in Vector. | ||
|
|
||
| ## How to enable | ||
|
|
||
| The [global config] `schema.log_namespace` can be set to `true` to enable the new | ||
| Log Namespacing feature for all components. The default is `false`. | ||
|
|
||
| Every source also has a `log_namespace` config option. This will override the global setting, | ||
| so you can try out Log Namespacing on individual sources. | ||
|
|
||
| The following example enables the `log_namespace` feature globally, then disables it for a single | ||
| source. | ||
|
|
||
| ```toml | ||
| schema.log_namespace = true | ||
|
|
||
| [sources.input_with_log_namespace] | ||
| type = "demo_logs" | ||
| format = "shuffle" | ||
| lines = ["input_with_log_namespace"] | ||
| interval = 1 | ||
|
|
||
| [sources.input_without_log_namespace] | ||
| type = "demo_logs" | ||
| format = "shuffle" | ||
| lines = ["input_without_log_namespace"] | ||
| interval = 1 | ||
| log_namespace = false | ||
|
|
||
| [sinks.console] | ||
| type = "console" | ||
| inputs = ["input_with_log_namespace", "input_without_log_namespace"] | ||
| encoding.codec = "json" | ||
|
|
||
| ``` | ||
|
|
||
| ## How It Works | ||
|
|
||
fuchsnj marked this conversation as resolved.
Show resolved
Hide resolved
|
||
| ### Data Layout | ||
|
|
||
| When handling log events, information is categorized into one of the following groups: | ||
| (Examples are from the `datadog_agent` source) | ||
|
|
||
| - Event Data: The decoded event data. (eg: the log itself) | ||
| - Source Metadata: Metadata provided by the source of the event. (eg: hostname / tags) | ||
| - Vector Metadata: Metadata provided by Vector. (eg: the time when Vector received the event) | ||
|
|
||
| #### Without Log Namespacing | ||
|
|
||
| All three of these are placed at the root of the event. The exact layout depends on the source, | ||
| some fields are configurable, and the [global log schema] can change the name / location of some | ||
| fields. | ||
|
|
||
| Example log event from the `datadog_agent` source (with the JSON decoder) | ||
|
|
||
| ```json | ||
| { | ||
| "ddsource": "vector", | ||
| "ddtags": "env:prod", | ||
| "hostname": "alpha", | ||
| "foo": "foo field", | ||
| "service": "cernan", | ||
| "source_type": "datadog_agent", | ||
| "bar": "bar field", | ||
| "status": "warning", | ||
| "timestamp": "1970-02-14T20:44:57.570Z" | ||
| } | ||
| ``` | ||
|
|
||
| #### With Log Namespacing | ||
|
|
||
| When enabled, the layout of this data is well-defined and consistent. | ||
|
|
||
| Event Data (and _only_ Event Data) is placed at the root of the event (eg: `.`). | ||
| Source metadata is placed in event metadata, prefixed by the source name. (eg: `%datadog_agent`) | ||
| Vector metadata is placed in event metadata, prefixed by `vector`. (eg: `%vector`) | ||
|
|
||
| Generally sinks will only send the event data. If you want to include any metadata fields, | ||
| it's recommended to use a [remap] transform to add data to the event as needed. | ||
|
|
||
| It's important to note that previously the type of an event (`.`) would always be an object | ||
| with fields. Now it is possible for event to be any type, such as a string. | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Currently in Remap, even with Log Namespacing, if you assign We should either change it so it assigns the value to root in the Vector namespace, or call that out in the docs here. I'm in favour of the former.
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'm working on making this change. Will have a new PR up shortly.
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. |
||
|
|
||
| Example log event from the `datadog agent` source. (same data as the example above) | ||
|
|
||
| Event root (`.`) | ||
|
|
||
| ```json | ||
| { | ||
| "foo": "foo field", | ||
| "bar": "bar field" | ||
| } | ||
| ``` | ||
|
|
||
| Source metadata fields (`%datadog_agent`) | ||
|
|
||
| ```json | ||
| { | ||
| "ddsource": "vector", | ||
| "ddtags": "env:prod", | ||
| "hostname": "alpha", | ||
| "service": "cernan", | ||
| "status": "warning", | ||
| "timestamp": "1970-02-14T20:44:57.570Z" | ||
| } | ||
| ``` | ||
|
|
||
| Source vector fields (`%vector`) | ||
|
|
||
| ```json | ||
| { | ||
| "source_type": "datadog_agent", | ||
| "ingest_timestamp": "1970-02-14T20:44:58.236Z" | ||
| } | ||
| ``` | ||
|
|
||
| Here is a sample VRL script accessing different parts of an event when log namespacing is enabled. | ||
|
|
||
| ```coffee | ||
| event = . | ||
| field_from_event = .foo | ||
|
|
||
| all_metadata = % | ||
| tags = %datadog_agent.ddtags | ||
| timestamp = %vector.ingest_timestamp | ||
|
|
||
| ``` | ||
|
|
||
| ### Semantic Meaning | ||
|
|
||
| Before Log Namespacing, Vector used the [global log schema] to keep certain types of information | ||
| at known locations. This is changing, and when log namespacing is enabled, the [global log schema] | ||
| will no longer be used. To replace it, a new feature called "semantic meaning" will be used instead. | ||
| This allows assigning meaning to different fields of an event, which allows sinks to access | ||
| information needed, such as timestamps, hostname, the message, etc. | ||
|
|
||
| Semantic meaning will automatically be assigned by all sources. Sinks will check on startup to make | ||
| sure a meaning exists for all required fields. If a source does not provide a required field, or | ||
| a meaning needs to be manually adjusted for any reason, the VRL function [set_semantic_meaning] can | ||
| be used. | ||
|
|
||
fuchsnj marked this conversation as resolved.
Show resolved
Hide resolved
|
||
| [global log schema]: /docs/reference/configuration/global-options/#log_schema | ||
| [set_semantic_meaning]: /docs/reference/vrl/functions/#set_semantic_meaning | ||
| [remap]: /docs/reference/configuration/transforms/remap/ | ||
| [global config]: /docs/reference/configuration/global-options/#log_namespacing | ||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,44 @@ | ||
| package metadata | ||
|
|
||
| remap: functions: set_semantic_meaning: { | ||
| category: "Event" | ||
| description: """ | ||
| Sets a semantic meaning for an event. Note that this function assigns | ||
| meaning at Vector startup, and has _no_ runtime behavior. It is suggested | ||
| to put all calls to this function at the beginning of a VRL function. The function | ||
| cannot be conditionally called (eg: using an if statement cannot stop the meaning | ||
| from being assigned). | ||
| """ | ||
|
|
||
| arguments: [ | ||
| { | ||
| name: "target" | ||
| description: """ | ||
| The path of the value that will be assigned a meaning. | ||
spencergilbert marked this conversation as resolved.
Show resolved
Hide resolved
|
||
| """ | ||
| required: true | ||
| type: ["path"] | ||
| }, | ||
| { | ||
| name: "meaning" | ||
| description: """ | ||
| The name of the meaning to assign. | ||
| """ | ||
| required: true | ||
| type: ["string"] | ||
| }, | ||
| ] | ||
| internal_failure_reasons: [ | ||
| ] | ||
| return: types: ["null"] | ||
spencergilbert marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
|
||
| examples: [ | ||
| { | ||
| title: "Sets custom field semantic meaning" | ||
| source: #""" | ||
| set_semantic_meaning(.foo, "bar") | ||
| """# | ||
| return: null | ||
| }, | ||
| ] | ||
| } | ||
Uh oh!
There was an error while loading. Please reload this page.