generated from amazon-archives/__template_Custom
-
Notifications
You must be signed in to change notification settings - Fork 105
Support otel metrics mapping #1397
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Closed
YANG-DB
wants to merge
66
commits into
opensearch-project:main
from
YANG-DB:Support-OTEL-Metrics-mapping
Closed
Changes from 58 commits
Commits
Show all changes
66 commits
Select commit
Hold shift + click to select a range
326faeb
Add Traces schema support for SSO which is OTEL compliant
YANG-DB 5d93e67
add basic trace samples
YANG-DB 3009d69
Merge remote-tracking branch 'origin/Support-OTEL-Trace-mapping' into…
YANG-DB 7636ac5
Merge remote-tracking branch 'origin/Support-OTEL-Trace-mapping' into…
YANG-DB 95a081a
Merge remote-tracking branch 'origin/Support-OTEL-Trace-mapping' into…
YANG-DB b0e20e0
add support for data-flow structure as part of the general span attri…
YANG-DB 6770d98
Merge remote-tracking branch 'origin/Support-OTEL-Trace-mapping' into…
YANG-DB cfcd2cb
add support for Metrics types in Simple Schema for Observability (OTE…
YANG-DB 38a8d88
add support for Metrics types in Simple Schema for Observability (OTE…
YANG-DB 8872130
Merge remote-tracking branch 'origin/Support-OTEL-Metrics-mapping' in…
YANG-DB 7783f4a
add samples for the schema validation
YANG-DB 2c3c623
Merge remote-tracking branch 'origin/Support-OTEL-Metrics-mapping' in…
YANG-DB e193ef5
fix histogram.json sample
YANG-DB 8806dd2
Merge remote-tracking branch 'origin/Support-OTEL-Metrics-mapping' in…
YANG-DB 9fe5828
add exemplar & instrumentationScope
YANG-DB a8163e2
Merge remote-tracking branch 'origin/Support-OTEL-Metrics-mapping' in…
YANG-DB 041ffe8
add dropped attribute count for instrumentation scope
YANG-DB fe27416
Merge remote-tracking branch 'origin/Support-OTEL-Metrics-mapping' in…
YANG-DB 1d67f61
add dropped attribute count for instrumentation scope
YANG-DB c82c14a
Merge remote-tracking branch 'origin/Support-OTEL-Metrics-mapping' in…
YANG-DB 833e19f
add schemaUrl support for the outer most level
YANG-DB 2cbe27c
Merge remote-tracking branch 'origin/Support-OTEL-Metrics-mapping' in…
YANG-DB 436ab33
add template section for mapping document
YANG-DB 23c0cf5
Merge remote-tracking branch 'origin/Support-OTEL-Metrics-mapping' in…
YANG-DB 24aead1
change time to @timestamp
YANG-DB cd9775b
Merge remote-tracking branch 'origin/Support-OTEL-Metrics-mapping' in…
YANG-DB 9e58033
rename resources to resource
YANG-DB 55b8a6e
Merge remote-tracking branch 'origin/Support-OTEL-Metrics-mapping' in…
YANG-DB 1514a77
add specific info and support for data-stream API
YANG-DB 08d99f8
add specific info and support for data-stream API
YANG-DB 8e8191d
Merge remote-tracking branch 'origin/Support-OTEL-Metrics-mapping' in…
YANG-DB 0fa5808
add `instrumentationScope.attributes.identification` for explicitly i…
YANG-DB 1eef6f9
Merge remote-tracking branch 'origin/Support-OTEL-Metrics-mapping' in…
YANG-DB dd9df78
add support for observedTimestamp
YANG-DB d0d842f
add creation of metrics.mapping template & default data-stream indice…
YANG-DB 62903bd
Merge remote-tracking branch 'origin/main' into Support-OTEL-Metrics-…
YANG-DB bd76420
add lifeCycle component for the Plugin support eager actions once clu…
YANG-DB 515d6e2
Merge remote-tracking branch 'origin/Support-OTEL-Metrics-mapping' in…
YANG-DB 818dec1
move IT test under the REST section
YANG-DB bfe925c
Merge remote-tracking branch 'origin/Support-OTEL-Metrics-mapping' in…
YANG-DB fffe526
add ClusterPlugin interface for notification of readiness for index a…
YANG-DB 5f5d41a
Merge remote-tracking branch 'origin/Support-OTEL-Metrics-mapping' in…
YANG-DB 8bdb55b
fix tests and update @After IT cleanUp phase
YANG-DB 5a2947e
Merge remote-tracking branch 'origin/Support-OTEL-Metrics-mapping' in…
YANG-DB f593ecf
update document with context
YANG-DB f8d4caf
Merge remote-tracking branch 'origin/Support-OTEL-Metrics-mapping' in…
YANG-DB 8072b27
add additional context with reference for RFC
YANG-DB 8b0f996
Merge remote-tracking branch 'origin/Support-OTEL-Metrics-mapping' in…
YANG-DB 15edfac
remove auto creation of default index
YANG-DB da8b916
Merge remote-tracking branch 'origin/Support-OTEL-Metrics-mapping' in…
YANG-DB f98ad79
add README.md for the schema folder
YANG-DB e54587b
Merge remote-tracking branch 'origin/Support-OTEL-Metrics-mapping' in…
YANG-DB 442519d
fix readme.md index naming references
YANG-DB 84d499d
Merge remote-tracking branch 'origin/Support-OTEL-Metrics-mapping' in…
YANG-DB 03e1845
update according to PR comments
YANG-DB 824c809
Merge remote-tracking branch 'origin/Support-OTEL-Metrics-mapping' in…
YANG-DB 83e5e74
update according to PR comments
YANG-DB e09ddf5
Merge remote-tracking branch 'origin/Support-OTEL-Metrics-mapping' in…
YANG-DB 15921be
update comments
YANG-DB f9a722a
Merge remote-tracking branch 'origin/Support-OTEL-Metrics-mapping' in…
YANG-DB 051d573
remove default data_stream related things
YANG-DB 392892f
Merge remote-tracking branch 'origin/Support-OTEL-Metrics-mapping' in…
YANG-DB 0554e11
fix documentation related default references
YANG-DB f1ff9c2
Merge remote-tracking branch 'origin/Support-OTEL-Metrics-mapping' in…
YANG-DB 9044213
fix linting issues
YANG-DB 3ac00f5
Merge remote-tracking branch 'origin/Support-OTEL-Metrics-mapping' in…
YANG-DB File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,98 @@ | ||
| # Simple Schema for Observability | ||
|
|
||
| ## Background | ||
| Observability is the ability to measure a system’s current state based on the data it generates, such as logs, metrics, and traces. Observability relies on telemetry derived from instrumentation that comes from the endpoints and services. | ||
|
|
||
| Observability telemetry signals (logs, metrics, traces) arriving from the system would contain all the necessary information needed to observe and monitor. | ||
|
|
||
| Modern application can have a complicated distributed architecture that combines cloud native and microservices layers. Each layer produces telemetry signals that may have different structure and information. | ||
|
|
||
| Using Observability telemetry schema we can organize, correlate and investigate system behavior in a standard and well-defined manner. | ||
|
|
||
| Observability telemetry schema defines the following components - logs, traces and metrics. | ||
|
|
||
| Logs provide comprehensive system details, such as a fault and the specific time when the fault occurred. By analyzing the logs, one can troubleshoot code and identify where and why the error occurred. | ||
|
|
||
| Traces represent the entire journey of a request or action as it moves through all the layers of a distributed system. Traces allow you to profile and observe systems, especially containerized applications, serverless architectures, or microservices architecture. | ||
|
|
||
| Metrics provide a numerical representation of data that can be used to determine a service or component’s overall behaviour over time. | ||
|
|
||
| In many occasions, correlation between the logs, traces and metrics is mandatory to be able to monitor and understand how the system is behaving. In addition, the distributed nature of the application produces multiple formats of telemetry signals arriving from different components ( network router, web server, database) | ||
|
|
||
| For such correlation to be possible the industry has formulated several protocols ([OTEL](https://github.com/open-telemetry), [ECS](https://github.com/elastic/ecs), [OpenMetrics](https://github.com/OpenObservability/OpenMetrics)) for communicating these signals - the Observability schemas. | ||
|
|
||
| --- | ||
| ## Schema Aware Components | ||
|
|
||
| The role of the Observability plugin is intended to allow maximum flexibility and not imposing a strict Index structure of the data source. Nevertheless, the modern nature of distributed application and the vast amount of telemetry producers is changing this perception. | ||
|
|
||
| Today most of the Observability solutions (splunk, datadog, dynatrace) recommend using a consolidated schema to represent the entire variance of log/trace/metrics producers. | ||
|
|
||
| This allows monitoring, incidents investigation and corrections process to become simpler, maintainable and reproducible. | ||
|
|
||
| A Schema-Aware visualization component is a component which assumes the existence of specific index/indices name patterns and expects these indices to have a specific structure - a schema. | ||
|
|
||
| As an example we can see that Trace-Analytics is schema-aware since it directly assumes the traces & serviceMap indices exist and expects them to follow a specific schema. | ||
|
|
||
| This definition doesn’t change the existing status of visualization components which are not “Schema Aware” but it only regulates which Visual components would benefit using a schema and which will be agnostic of its content. | ||
|
|
||
| Operation Panel for example, are not “schema aware” since they don’t assume in advanced the existence of a specific index nor do they expect the index they display to have a specific structure. | ||
|
|
||
| ## Data Model | ||
|
|
||
| Simple Schema for Observability needs to allow ingestion of both (OTEL/ECS) formats and internally consolidate them to best of its capabilities for presenting a unified Observability platform. | ||
|
|
||
| The data model is highly coupled with the visual components, for example - the Application visual component & Trace analytics are directly coupled with all the Observability schemas (Logs, Traces, Spans). | ||
|
|
||
| ## Observability index naming | ||
|
|
||
| The Observability indices would follow the recommended immutable data stream ingestion pattern using the [data_stream concepts](https://opensearch.org/docs/latest/opensearch/data-streams/) | ||
|
|
||
| Index pattern will follow the next naming structure `sso_{type}`-`{dataset}`-`{namespace}` | ||
|
|
||
| - **type** - indicated the observability high level types "logs", "metrics", "traces" (prefixed by the `sso_` schema convention ) | ||
| - **dataset** - The field can contain anything that classify the source of the data - such as `nginx.access` (If none specified "**default** " will be used). | ||
| - **namespace** - A user defined namespace. Mainly useful to allow grouping of data such as production grade, geography classification | ||
|
|
||
| This strategy allows the two degrees of naming freedom: dataset and namespace. For example a customer may want to route the nginx logs from two geographical areas into two different indices: | ||
|
|
||
| - `sso_logs-nginx-us` | ||
| - `sso_logs-nginx-eu` | ||
|
|
||
| This type of distinction also allows for creation of crosscutting queries by setting the next index query pattern `sso_logs-nginx-*` or by using a geographic based crosscutting query `sso_logs-*-eu`. | ||
|
|
||
| ## Data index routing | ||
| The [ingestion component](https://github.com/opensearch-project/data-prepper) which is responsible for ingesting the Observability signals should route the data into the relevant indices. | ||
| The `sso_{type}-{dataset}-{namespace}` combination dictates the target index, `{type}` is prefixed with the `sso_` prefix into one of the supported type: | ||
|
|
||
| - Traces - `sso_traces` | ||
| - Metrics - `sso_metrics` | ||
| - Logs - `sso_logs` | ||
|
|
||
| For example if within the ingested log contains the following section: | ||
| ```json | ||
| { | ||
| ... | ||
| "attributes": { | ||
| "data_stream": { | ||
| "type": "span", | ||
| "dataset": "mysql", | ||
| "namespace": "prod" | ||
| } | ||
| } | ||
| } | ||
| ``` | ||
| This indicates that the target index for this observability signal should be `sso_traces`-`mysql`-`prod` index that follows uses the traces schema mapping. | ||
|
|
||
| If the `data_stream` information if not present inside the signal, the default index should be used. | ||
|
|
||
| ## Observability Index templates | ||
| With the expectation of multiple Observability data providers and the need to consolidate all to a single common schema - the Observability plugin will take the following responsibilities : | ||
|
|
||
| - Define and create all the signals index templates upon loading | ||
| - Create default data_stream for each signal type upon explicit request | ||
| - this is not done eagerly since the customer may want to change some template index settings before generating the default indices | ||
| - Publish a versioned schema file (Json Schema) for each signal type for general validation usage by any 3rd party | ||
|
|
||
| ### Note | ||
| It is important to mention here that these new capabilities would not change or prevent existing customer usage of the system and continue to allow proprietary usage. | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,105 @@ | ||
| # Metrics Schema Support | ||
|
|
||
| Observability refers to the ability to monitor and diagnose systems and applications in real-time, in order to understand how they are behaving and identify potential issues. | ||
| Metrics present a critical component of observability, providing quantifiable data about the performance and behavior of systems and applications. | ||
| The importance of supporting metrics structured schema lies in the fact that it enables better analysis and understanding of system behavior. | ||
|
|
||
| A structured schema provides a clear, consistent format, making it easier for observability tools to process and aggregate the data. | ||
| This in turn makes it easier for engineers to understand the performance and behavior of their systems, and quickly identify potential issues. | ||
|
|
||
| When metrics are unstructured, it can be difficult for observability tools to extract meaningful information from them. | ||
| For example, if the data for a particular metric is not consistently recorded in the same format, it can be difficult to compare and analyze performance data over time. | ||
| Similarly, if metrics are not consistently named or categorized, it can be difficult to understand their context and significance. | ||
|
|
||
| With a structured schema in place, observability tools can automatically extract and aggregate data, making it easier to understand system behavior at a high level. | ||
| This can help teams quickly identify performance bottlenecks, track changes in system behavior over time, and make informed decisions about system performance optimization. | ||
|
|
||
| ## Details | ||
| The next section provides the Simple Schema for Observability support which conforms with the OTEL specification. | ||
|
|
||
| - metrics.mapping presents the template mapping for creating the Simple Schema for Observability index | ||
| - metrics.schema presents the json schema validation for verification of a metrics document conforms to the mapping structure | ||
|
|
||
| ## Metrics | ||
| see [OTEL metrics convention](https://opentelemetry.io/docs/reference/specification/metrics/) | ||
| see [OTEL metrics protobuf](https://github.com/open-telemetry/opentelemetry-proto/tree/main/opentelemetry/proto/metrics/v1) | ||
|
|
||
| Simple Schema for Observability conforms with OTEL metrics protocol which defines the next data model: | ||
|
|
||
| #### Timestamp field | ||
| As part of the data-stream definition the `@timestamp` is mandatory, if the field is not present in the original signal populate this field using `ObservedTimestamp` as value. | ||
|
|
||
| ### Instrumentation scope | ||
| This is a logical unit of the application with which the emitted telemetry can be associated. It is typically the developer’s choice to decide what denotes a reasonable instrumentation scope. | ||
| The most common approach is to use the instrumentation library as the scope, however other scopes are also common, e.g. a module, a package, or a class can be chosen as the instrumentation scope. | ||
|
|
||
| The instrumentation scope may have zero or more additional attributes that provide additional information about the scope. As an example the field | ||
| `instrumentationScope.attributes.identification` is presented will be used to determine the resource origin of the signal and can be used to filter accordingly | ||
|
|
||
| ### Overview | ||
| Metrics are a specific kind of telemetry data. They represent a snapshot of the current state for a set of data. | ||
| Metrics are distinct from logs or events, which focus on records or information about individual events. | ||
|
|
||
| Metrics expresses all system states as numerical values; counts, current values and such. | ||
| Metrics tend to aggregate data temporally, while this can lose information, the reduction in overhead is an engineering trade-off commonly chosen in many modern monitoring systems. | ||
|
|
||
| Time series are a record of changing information over time. While time series can support arbitrary strings or binary data, only numeric data is in our scope. | ||
| Common examples of metric time series would be network interface counters, device temperatures, BGP connection states, and alert states. | ||
|
|
||
| ### Metric streams | ||
| In a similar way to the data_stream attribute field representing the category of a trace, the metric streams are grouped into individual Metric objects, identified by: | ||
|
|
||
| - The originating Resource attributes | ||
| - The instrumentation Scope (e.g., instrumentation library name, version) | ||
| - The metric stream’s name | ||
|
|
||
| ### Metrics | ||
| Metric object is defined by the following properties: | ||
|
|
||
| - The data point type (e.g. Sum, Gauge, Histogram ExponentialHistogram, Summary) | ||
| - The metric stream’s unit | ||
| - The data point properties, where applicable: AggregationTemporality, Monotonic | ||
|
|
||
| The description is also present in the metrics object but is not part of the identification fields | ||
| _- The metric stream’s description_ | ||
|
|
||
|
|
||
| ### Data Types | ||
|
|
||
| **Values:** Metric values in MUST be either floating points or integers. | ||
YANG-DB marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
|
||
| **Attributes:** Labels are key-value pairs consisting of string as keys and Any type as values (strings, object, array) | ||
|
|
||
| **MetricPoint:** Each MetricPoint consists of a set of values, depending on the MetricFamily type. | ||
|
|
||
| **Metric** Metrics are defined by a unique attributes (dimensions) within a MetricFamily. | ||
|
|
||
| --- | ||
|
|
||
| Metrics MUST contain a list of one or more MetricPoints. Metrics with the same name for a given MetricFamily SHOULD have the same set of label names in their LabelSet. | ||
|
|
||
| * Metrics.name: String value representation of the matrix purpose | ||
| * Metrics.type: Valid values are "gauge", "counter","histogram", and "summary". | ||
| * Metrics.Unit: specifies MetricFamily units. | ||
|
|
||
| ## Metric Types | ||
|
|
||
| ### Gauge | ||
| Gauges are current measurements, such as bytes of memory currently used or the number of items in a queue. For gauges the absolute value is what is of interest to a user. | ||
| **_A MetricPoint in a Metric with the type gauge MUST have a single value._** | ||
| Gauges MAY increase, decrease, or stay constant over time. Even if they only ever go in one direction, they might still be gauges and not counters. | ||
|
|
||
| ### Counter | ||
| Counters measure discrete events. Common examples are the number of HTTP requests received, CPU seconds spent, or bytes sent. For counters how quickly they are increasing over time is what is of interest to a user. | ||
| **_A MetricPoint in a Metric with the type Counter MUST have one value called Total._** | ||
|
|
||
| ### Histogram / Exponential-Histogram | ||
| Histograms measure distributions of discrete events. Common examples are the latency of HTTP requests, function runtimes, or I/O request sizes. | ||
| **_A Histogram MetricPoint MUST contain at least one bucket_**, and SHOULD contain Sum, and Created values. Every bucket MUST have a threshold and a value. | ||
|
|
||
| ### Summary | ||
| Summaries also measure distributions of discrete events and MAY be used when Histograms are too expensive and/or an average event size is sufficient. | ||
| **_A Summary MetricPoint MAY consist of a Count, Sum, Created, and a set of quantiles._** | ||
| Semantically, Count and Sum values are counters & MUST be an integer. | ||
|
|
||
|
|
||
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.