diff --git a/CHANGELOG.md b/CHANGELOG.md index 1c681117836..9bdb8919686 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -14,6 +14,7 @@ New: Updates: +- Versioning and stability guarantees for OpenTelemetry clients([#1291](https://github.com/open-telemetry/opentelemetry-specification/pull/1291)) - Additional Cassandra semantic attributes ([#1217](https://github.com/open-telemetry/opentelemetry-specification/pull/1217)) - OTEL_EXPORTER environment variable replaced with OTEL_TRACE_EXPORTER and diff --git a/README.md b/README.md index 0d7e558c1db..23be119f567 100644 --- a/README.md +++ b/README.md @@ -25,6 +25,7 @@ Technical committee holds regular meetings, notes are held - [Overview](specification/overview.md) - [Glossary](specification/glossary.md) +- [Versioning and stability for OpenTelemetry clients](specification/versioning-and-stability.md) - [Library Guidelines](specification/library-guidelines.md) - [Package/Library Layout](specification/library-layout.md) - [General error handling guidelines](specification/error-handling.md) @@ -44,7 +45,7 @@ Technical committee holds regular meetings, notes are held - About the Project - [Timeline](#project-timeline) - [Notation Conventions and Compliance](#notation-conventions-and-compliance) - - [Versioning](#versioning) + - [Versioning the Specification](#versioning-the-specification) - [Acronym](#acronym) - [Contributions](#contributions) - [License](#license) @@ -64,7 +65,7 @@ The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "S An implementation of the [specification](./specification/overview.md) is not compliant if it fails to satisfy one or more of the "MUST", "MUST NOT", "REQUIRED", "SHALL", or "SHALL NOT" requirements defined in the [specification](./specification/overview.md). Conversely, an implementation of the [specification](./specification/overview.md) is compliant if it satisfies all the "MUST", "MUST NOT", "REQUIRED", "SHALL", and "SHALL NOT" requirements defined in the [specification](./specification/overview.md). -## Versioning +## Versioning the Specification Changes to the [specification](./specification/overview.md) are versioned according to [Semantic Versioning 2.0](https://semver.org/spec/v2.0.0.html) and described in [CHANGELOG.md](CHANGELOG.md). Layout changes are not versioned. Specific implementations of the specification should specify which version they implement. diff --git a/internal/img/api-lifecycle.png b/internal/img/api-lifecycle.png new file mode 100644 index 00000000000..04efb594779 Binary files /dev/null and b/internal/img/api-lifecycle.png differ diff --git a/internal/img/architecture.png b/internal/img/architecture.png new file mode 100644 index 00000000000..66216f6cb61 Binary files /dev/null and b/internal/img/architecture.png differ diff --git a/internal/img/long-term-support.png b/internal/img/long-term-support.png new file mode 100644 index 00000000000..2e41399e28c Binary files /dev/null and b/internal/img/long-term-support.png differ diff --git a/specification/error-handling.md b/specification/error-handling.md index fbf828320c4..7ec15579053 100644 --- a/specification/error-handling.md +++ b/specification/error-handling.md @@ -62,7 +62,7 @@ The mechanism by which end users set or register a custom error handler should f ### Examples These are examples of how end users might register custom error handlers. -Examples are for illustration purposes only. Language library authors +Examples are for illustration purposes only. OpenTelemetry client authors are free to deviate from these provided that their design matches the requirements outlined above. #### Go diff --git a/specification/glossary.md b/specification/glossary.md index a17b032505e..e03830b959c 100644 --- a/specification/glossary.md +++ b/specification/glossary.md @@ -8,11 +8,21 @@ Some other fundamental terms are documented in the [overview document](overview. +- [User Roles](#user-roles) + * [Application Owner](#application-owner) + * [Library Author](#library-author) + * [Instrumentation Author](#instrumentation-author) + * [Plugin Author](#plugin-author) - [Common](#common) + * [Signals](#signals) + * [Packages](#packages) + * [ABI Compatibility](#abi-compatibility) * [In-band and Out-of-band Data](#in-band-and-out-of-band-data) * [Manual Instrumentation](#manual-instrumentation) * [Automatic Instrumentation](#automatic-instrumentation) * [Telemetry SDK](#telemetry-sdk) + * [Constructors](#constructors) + * [SDK Plugins](#sdk-plugins) * [Exporter Library](#exporter-library) * [Instrumented Library](#instrumented-library) * [Instrumentation Library](#instrumentation-library) @@ -28,8 +38,46 @@ Some other fundamental terms are documented in the [overview document](overview. +## User Roles + +### Application Owner + +The maintainer of an application or service, responsible for configuring and managing the lifecycle of the OpenTelemetry SDK. + +### Library Author + +The maintainer of a shared library which is depended upon by many applications, and targeted by OpenTelemetry instrumentation. + +### Instrumentation Author + +The maintainer of OpenTelemetry instrumentation written against the OpenTelemetry API. +This may be instrumentation written within application code, within a shared library, or within an instrumentation library. + +### Plugin Author + +The maintainer of an OpenTelemetry SDK Plugin, written against OpenTelemetry SDK plugin interfaces. + ## Common +### Signals + +OpenTelemetry is structured around signals, or categories of telemetry. +Metrics, logs, traces, and baggage are examples of signals. +Each signal represents a coherent, stand-alone set of functionality. +Each signal follows a separate lifecycle, defining its current stability level. + +### Packages + +In this specification, the term **package** describes a set of code which represents a single dependency, which may be imported into a program independently from other packages. +This concept may map to a different term in some languages, such as "module." +Note that in some languages, the term "package" refers to a different concept. + +### ABI Compatibility + +An ABI (application binary interface) is an interface which defines interactions between software components at the machine code level, for example between an application executable and a compiled binary of a shared object library. ABI compatibility means that a new compiled version of a library may be correctly linked to a target executable without the need for that executable to be recompiled. + +ABI compatibility is important for some languages, especially those which provide a form of machine code. For other languages, ABI compatibility may not be a relevant requirement. + @@ -66,9 +114,17 @@ Denotes the library that implements the *OpenTelemetry API*. See [Library Guidelines](library-guidelines.md#sdk-implementation) and [Library resource semantic conventions](resource/semantic_conventions/README.md#telemetry-sdk). +### Constructors + +Constructors are public code used by Application Owners to initialize and configure the OpenTelemetry SDK and contrib packages. Examples of constructors include configuration objects, environment variables, and builders. + +### SDK Plugins + +Plugins are libraries which extend the OpenTelemetry SDK. Examples of plugin interfaces are the `SpanProcessor`, `Exporter`, and `Sampler` interfaces. + ### Exporter Library -Libraries which are compatible with the [Telemetry SDK](#telemetry-sdk) and provide functionality to emit telemetry to consumers. +Exporters are SDK Plugins which implement the `Exporter` interface, and emit telemetry to consumers. ### Instrumented Library diff --git a/specification/library-guidelines.md b/specification/library-guidelines.md index 9b285d54e01..de0b881e253 100644 --- a/specification/library-guidelines.md +++ b/specification/library-guidelines.md @@ -1,52 +1,53 @@ -# OpenTelemetry Language Library Design Principles +# OpenTelemetry Client Design Principles -This document defines common principles that will help designers create language libraries that are easy to use, are uniform across all supported languages, yet allow enough flexibility for language-specific expressiveness. +This document defines common principles that will help designers create OpenTelemetry clients that are easy to use, are uniform across all supported languages, yet allow enough flexibility for language-specific expressiveness. -The language libraries are expected to provide full features out of the box and allow for innovation and experimentation through extensibility points. +OpenTelemetry clients are expected to provide full features out of the box and allow for innovation and experimentation through extensibility. -The document does not attempt to describe a language library API. For API specs see [specification](../README.md). +Please read the [overview](overview.md) first, to understand the fundamental architecture of OpenTelemetry. -_Note to Language Library Authors:_ OpenTelemetry specification, API and SDK implementation guidelines are work in progress. If you notice incomplete or missing information, contradictions, inconsistent styling and other defects please let specification writers know by creating an issue in this repository or posting in [Gitter](https://gitter.im/open-telemetry/opentelemetry-specification). As implementors of the specification you will often have valuable insights into how the specification can be improved. The Specification SIG and members of Technical Committee highly value your opinion and welcome your feedback. +This document does not attempt to describe the details or functionality of the OpenTelemetry client API. For API specs see the [API specifications](../README.md). + +_Note to OpenTelemetry client Authors:_ OpenTelemetry specification, API and SDK implementation guidelines are work in progress. If you notice incomplete or missing information, contradictions, inconsistent styling and other defects please let specification writers know by creating an issue in this repository or posting in [Gitter](https://gitter.im/open-telemetry/opentelemetry-specification). As implementors of the specification you will often have valuable insights into how the specification can be improved. The Specification SIG and members of Technical Committee highly value your opinion and welcome your feedback. ## Requirements 1. The OpenTelemetry API must be well-defined and clearly decoupled from the implementation. This allows end users to consume API only without also consuming the implementation (see points 2 and 3 for why it is important). -2. Third party libraries and frameworks that add instrumentation to their code will have a dependency only on the API of OpenTelemetry language library. The developers of third party libraries and frameworks do not care (and cannot know) what specific implementation of OpenTelemetry is used in the final application. +2. Third party libraries and frameworks that add instrumentation to their code will have a dependency only on the API of OpenTelemetry client. The developers of third party libraries and frameworks do not care (and cannot know) what specific implementation of OpenTelemetry is used in the final application. 3. The developers of the final application normally decide how to configure OpenTelemetry SDK and what extensions to use. They should be also free to choose to not use any OpenTelemetry implementation at all, even though the application and/or its libraries are already instrumented. The rationale is that third-party libraries and frameworks which are instrumented with OpenTelemetry must still be fully usable in the applications which do not want to use OpenTelemetry (so this removes the need for framework developers to have "instrumented" and "non-instrumented" versions of their framework). 4. The SDK must be clearly separated into wire protocol-independent parts that implement common logic (e.g. batching, tag enrichment by process information, etc.) and protocol-dependent telemetry exporters. Telemetry exporters must contain minimal functionality, thus enabling vendors to easily add support for their specific protocol. 5. The SDK implementation should include the following exporters: + - OTLP. - Jaeger. - Zipkin. - Prometheus. - - OpenTelemetry Protocol (when the protocol is specified and approved). - Standard output (or logging) to use for debugging and testing as well as an input for the various log proxy tools. - In-memory (mock) exporter that accumulates telemetry data in the local memory and allows to inspect it (useful for e.g. unit tests). Note: some of these support multiple protocols (e.g. gRPC, Thrift, etc). The exact list of protocols to implement in the exporters is TBD. - Other vendor-specific exporters (exporters that implement vendor protocols) should not be included in language libraries and should be placed elsewhere (the exact approach for storing and maintaining vendor-specific exporters will be defined in the future). + Other vendor-specific exporters (exporters that implement vendor protocols) should not be included in OpenTelemetry clients and should be placed elsewhere (the exact approach for storing and maintaining vendor-specific exporters will be defined in the future). -## Language Library Generic Design +## OpenTelemetry Client Generic Design -Here is a generic design for a language library (arrows indicate calls): +Here is a generic design for an OpenTelemetry client (arrows indicate calls): -![Language Library Design Diagram](../internal/img/library-design.png) +![OpenTelemetry client Design Diagram](../internal/img/library-design.png) ### Expected Usage -The OpenTelemetry Language Library is composed of 2 packages: API package and SDK package. -In this specification, _package_ is used as a conceptual separation and does not prescribe the exact structure of the artifacts making up the language implementations. -Whether the API and SDK packages are bundled as two all-in-one artifacts or split across multiple ones (e.g. one for api-trace, one for api-metric, one for sdk-trace, one for sdk-metric) is considered an implementation detail as long as the API artifact(s) stay separate from the SDK artifact(s). +The OpenTelemetry client is composed of 4 types of [packages](glossary.md#packages): API packages, SDK packages, a Semantic Conventions package, and plugin packages. +The API and the SDK are split into multiple packages, based on signal type (e.g. one for api-trace, one for api-metric, one for sdk-trace, one for sdk-metric) is considered an implementation detail as long as the API artifact(s) stay separate from the SDK artifact(s). -Third-party libraries and frameworks that want to be instrumented in OpenTelemetry-compatible way will have a dependency on the API package. The developers of these third-party libraries will add calls to telemetry API to produce telemetry data. +Libraries, frameworks, and applications that want to be instrumented with OpenTelemetry take a dependency only on the API packages. The developers of these third-party libraries will make calls to the API to produce telemetry data. -Applications that use third-party libraries that are instrumented with OpenTelemetry API will have a choice to enable or not enable the actual delivery of telemetry data. The application can also call telemetry API directly to produce additional telemetry data. +Applications that use third-party libraries that are instrumented with OpenTelemetry API control whether or not to install the SDK and generate telemetry data. When the SDK is not installed, the API calls should be no-ops which generate minimal overhead. -In order to enable telemetry the application must take a dependency on the OpenTelemetry SDK, which implements the delivery of the telemetry. The application must also configure exporters so that the SDK knows where and how to deliver the telemetry. The details of how exporters are enabled and configured are language specific. +In order to enable telemetry the application must take a dependency on the OpenTelemetry SDK. The application must also configure exporters and other plugins so that telemetry can be correctly generated and delivered to their analysis tool(s) of choice. The details of how plugins are enabled and configured are language specific. ### API and Minimal Implementation @@ -102,9 +103,9 @@ The end-user application may decide to take a dependency on alternative implemen SDK provides flexibility and extensibility that may be used by many implementations. Before developing an alternative implementation, please, review extensibility points provided by OpenTelemetry. -An example use case for alternate implementations is automated testing. A mock implementation can be plugged in during automated tests. For example it can store all generated telemetry data in memory and provide a capability to inspect this stored data. This will allow the tests to verify that the telemetry is generated correctly. Language Library authors are encouraged to provide such mock implementation. +An example use-case for alternate implementations is automated testing. A mock implementation can be plugged in during automated tests. For example, it can store all generated telemetry data in memory and provide a capability to inspect this stored data. This will allow the tests to verify that the telemetry is generated correctly. OpenTelemetry client authors are encouraged to provide such a mock implementation. -Note that mocking is also possible by using SDK and a Mock `Exporter` without needed to swap out the entire SDK. +Note that mocking is also possible by using SDK and a Mock `Exporter` without needing to swap out the entire SDK. The mocking approach chosen will depend on the testing goals and at which point exactly it is desirable to intercept the telemetry data path during the test. @@ -112,11 +113,11 @@ The mocking approach chosen will depend on the testing goals and at which point API and SDK packages must use semantic version numbering. API package version number and SDK package version number are decoupled and can be different (and they both can be also different from the Specification version number that they implement). API and SDK packages MUST be labeled with their own version number. -This decoupling of version numbers allows language library authors to make API and SDK package releases independently without the need to coordinate and match version numbers with the Specification. +This decoupling of version numbers allows OpenTelemetry client authors to make API and SDK package releases independently without the need to coordinate and match version numbers with the Specification. -Because API and SDK package version numbers are not coupled, every API and SDK package release MUST clearly mention the Specification version number that they implement. In addition, if a particular version of SDK package is only compatible with a specific version of API package, then this compatibility information must be also published by language library authors. Language library authors MUST include this information in the release notes. For example, the SDK package release notes may say: "SDK 0.3.4, use with API 0.1.0, implements OpenTelemetry Specification 0.1.0". +Because API and SDK package version numbers are not coupled, every API and SDK package release MUST clearly mention the Specification version number that they implement. In addition, if a particular version of SDK package is only compatible with a specific version of API package, then this compatibility information must be also published by OpenTelemetry client authors. OpenTelemetry client authors MUST include this information in the release notes. For example, the SDK package release notes may say: "SDK 0.3.4, use with API 0.1.0, implements OpenTelemetry Specification 0.1.0". -_TODO: How should third party library authors who use OpenTelemetry for instrumentation guide their end users to find the correct SDK package?_ +_TODO: How should third-party library authors who use OpenTelemetry for instrumentation guide their end users to find the correct SDK package?_ ### Performance and Blocking diff --git a/specification/metrics/sdk.md b/specification/metrics/sdk.md index b65ee88a895..5e131e75d6d 100644 --- a/specification/metrics/sdk.md +++ b/specification/metrics/sdk.md @@ -169,7 +169,7 @@ failed or timed out. `Shutdown` SHOULD complete or abort within some timeout. `Shutdown` can be implemented as a blocking API or an asynchronous API which notifies the caller -via a callback or an event. Language library authors can decide if they want to +via a callback or an event. OpenTelemetry client authors can decide if they want to make the shutdown timeout configurable. #### SDK: Instrument Registration diff --git a/specification/overview.md b/specification/overview.md index 05a9ce2b32f..a542c2b04b9 100644 --- a/specification/overview.md +++ b/specification/overview.md @@ -8,37 +8,91 @@ Table of Contents -- [Distributed Tracing](#distributed-tracing) - * [Trace](#trace) - * [Span](#span) +- [OpenTelemetry Client Architecture](#opentelemetry-client-architecture) + * [API](#api) + * [SDK](#sdk) + * [Semantic Conventions](#semantic-conventions) + * [Contrib Packages](#contrib-packages) + * [Versioning and Stability](#versioning-and-stability) +- [Tracing Signal](#tracing-signal) + * [Traces](#traces) + * [Spans](#spans) * [SpanContext](#spancontext) * [Links between spans](#links-between-spans) -- [Metrics](#metrics) +- [Metric Signal](#metric-signal) * [Recording raw measurements](#recording-raw-measurements) + [Measure](#measure) + [Measurement](#measurement) * [Recording metrics with predefined aggregation](#recording-metrics-with-predefined-aggregation) * [Metrics data model and SDK](#metrics-data-model-and-sdk) -- [Logs](#logs) +- [Log Signal](#log-signal) * [Data model](#data-model) -- [Baggage](#baggage) +- [Baggage Signal](#baggage-signal) - [Resources](#resources) - [Context Propagation](#context-propagation) - [Propagators](#propagators) - [Collector](#collector) - [Instrumentation Libraries](#instrumentation-libraries) -- [Semantic Conventions](#semantic-conventions) -This document provides an overview of the pillars of telemetry that -OpenTelemetry supports and defines important fundamental terms. +This document provides an overview of the OpenTelemetry project and defines important fundamental terms. Additional term definitions can be found in the [glossary](glossary.md). -## Distributed Tracing +## OpenTelemetry Client Architecture + +![Cross cutting concerns](../internal/img/architecture.png) + +At the highest architectural level, OpenTelemetry clients are organized into [**signals**](glossary.md#signals). +Each signal provides a specialized form of observability. For example, tracing, metrics, and baggage are three separate signals. +Signals share a common subsystem – **context propagation** – but they function independently from each other. + +Each signal provides a mechanism for software to describe itself. A codebase, such as web framework or a database client, takes a dependency on various signals in order to describe itself. OpenTelemetry instrumentation code can then be mixed into the other code within that codebase. +This makes OpenTelemetry a **cross-cutting concern** - a piece of software which is mixed into many other pieces of software in order to provide value. Cross-cutting concerns, by their very nature, violate a core design principle – separation of concerns. As a result, OpenTelemetry client design requires extra care and attention to avoid creating issues for the codebases which depend upon these cross-cutting APIs. + +OpenTelemetry clients are designed to separate the portion of each signal which must be imported as cross-cutting concerns from the portions which can be managed independently. OpenTelemetry clients are also designed to be an extensible framework. +To accomplish these goals, each signal consists of four types of packages: API, SDK, Semantic Conventions, and Contrib. + +### API + +API packages consist of the cross-cutting public interfaces used for instrumentation. Any portion of an OpenTelemetry client which is imported into third-party libraries and application code is considered part of the API. + +### SDK + +The SDK is the implementation of the API provided by the OpenTelemetry project. Within an application, the SDK is installed and managed by the [application owner](glossary.md#application-owner). +Note that the SDK includes additional public interfaces which are not considered part of the API package, as they are not cross-cutting concerns. These public interfaces are defined as [constructors](glossary.md#constructors) and [plugin interfaces](glossary.md#sdk-plugins). +Application owners use the SDK constructors; [plugin authors](glossary.md#plugin-author) use the SDK plugin interfaces. +[Instrumentation authors](glossary.md#instrumentation-author) MUST NOT directly reference any SDK package of any kind, only the API. + +### Semantic Conventions + +The **Semantic Conventions** define the keys and values which describe commonly observed concepts, protocols, and operations used by applications. + +* [Resource Conventions](resource/semantic_conventions/README.md) +* [Span Conventions](trace/semantic_conventions/README.md) +* [Metrics Conventions](metrics/semantic_conventions/README.md) + +### Contrib Packages + +The OpenTelemetry project maintains integrations with popular OSS projects which have been identified as important for observing modern web services. +Example API integrations include instrumentation for web frameworks, database clients, and message queues. +Example SDK integrations include plugins for exporting telemetry to popular analysis tools and telemetry storage systems. + +Some plugins, such as OTLP Exporters and TraceContext Propagators, are required by the OpenTelemetry specification. These required plugins are included as part of the SDK. + +Plugins and instrumentation packages which are optional and separate from the SDK are referred to as **Contrib** packages. +**API Contrib** refers to packages which depend solely upon the API; **SDK Contrib** refers to packages which also depend upon the SDK. + +The term Contrib specifically refers to the collection of plugins and instrumentation maintained by the OpenTelemetry project; it does not refer to third-party plugins hosted elsewhere. + +### Versioning and Stability + +OpenTelemetry values stability and backwards compatibility. Please see the [versioning and stability guide](./versioning-and-stability.md) for details. + +## Tracing Signal A distributed trace is a set of events, triggered as a result of a single logical operation, consolidated across various components of an application. A @@ -48,7 +102,7 @@ to start an action on a website - in this example, the trace will represent calls made between the downstream services that handled the chain of requests initiated by this button being pressed. -### Trace +### Traces **Traces** in OpenTelemetry are defined implicitly by their **Spans**. In particular, a **Trace** can be thought of as a directed acyclic graph (DAG) of @@ -86,9 +140,10 @@ Temporal relationships between Spans in a single Trace [Span E·······] [Span F··] ``` -### Span +### Spans -Each **Span** encapsulates the following state: +A span represents an operation within a transaction. Each **Span** encapsulates +the following state: - An operation name - A start and finish timestamp @@ -149,7 +204,7 @@ represents a single parent scenario, in many cases the parent **Span** fully encloses the child **Span**. This is not the case in scatter/gather and batch scenarios. -## Metrics +## Metric Signal OpenTelemetry allows to record raw measurements or metrics with predefined aggregation and set of labels. @@ -224,14 +279,14 @@ validation and sanitization of the Metrics data. Instead, pass the data to the backend, rely on the backend to perform validation, and pass back any errors from the backend. -## Logs +## Log Signal ### Data model [Log Data Model](logs/data-model.md) defines how logs and events are understood by OpenTelemetry. -## Baggage +## Baggage Signal In addition to trace propagation, OpenTelemetry provides a simple mechanism for propagating name/value pairs, called `Baggage`. `Baggage` is intended for @@ -327,15 +382,3 @@ name itself. Examples include: * opentelemetry-instrumentation-flask (Python) * @opentelemetry/instrumentation-grpc (Javascript) - -## Semantic Conventions - -OpenTelemetry defines standard names and values of Resource attributes and -Span attributes. - -* [Resource Conventions](resource/semantic_conventions/README.md) -* [Span Conventions](trace/semantic_conventions/README.md) -* [Metrics Conventions](metrics/semantic_conventions/README.md) - -The type of the attribute SHOULD be specified in the semantic convention -for that attribute. See more details about [Attributes](./common/common.md#attributes). diff --git a/specification/performance.md b/specification/performance.md index ed32064e526..46fb2960f41 100644 --- a/specification/performance.md +++ b/specification/performance.md @@ -1,6 +1,6 @@ # Performance and Blocking of OpenTelemetry API -This document defines common principles that will help designers create language libraries that are safe to use. +This document defines common principles that will help designers create OpenTelemetry clients that are safe to use. ## Key principles @@ -17,27 +17,27 @@ See also [Concurrency and Thread-Safety](library-guidelines.md#concurrency-and-t Incomplete asynchronous I/O tasks or background tasks may consume memory to preserve their state. In such a case, there is a tradeoff between dropping some tasks to prevent memory starvation and keeping all tasks to prevent information loss. -If there is such tradeoff in language library, it should provide the following options to end-user: +If there is such tradeoff in OpenTelemetry client, it should provide the following options to end-user: - **Prevent information loss**: Preserve all information but possible to consume many resources - **Prevent blocking**: Dropping some information under overwhelming load and show warning log to inform when information loss starts and when recovered - Should provide option to change threshold of the dropping - Better to provide metric that represents effective sampling ratio - - Language library might provide this option for Logging + - OpenTelemetry client might provide this option for Logging ### End-user application should be aware of the size of logs Logging could consume much memory by default if the end-user application emits too many logs. This default behavior is intended to preserve logs rather than dropping it. To make resource usage bounded, the end-user should consider reducing logs that are passed to the exporters. -Therefore, the language library should provide a way to filter logs to capture by OpenTelemetry. End-user applications may want to log so much into log file or stdout (or somewhere else) but not want to send all of the logs to OpenTelemetry exporters. +Therefore, the OpenTelemetry client should provide a way to filter logs to capture by OpenTelemetry. End-user applications may want to log so much into log file or stdout (or somewhere else) but not want to send all of the logs to OpenTelemetry exporters. -In a documentation of the language library, it is a good idea to point out that too many logs consume many resources by default then guide how to filter logs. +In a documentation of the OpenTelemetry client, it is a good idea to point out that too many logs consume many resources by default then guide how to filter logs. ### Shutdown and explicit flushing could block -The language library could block the end-user application when it shut down. On shutdown, it has to flush data to prevent information loss. The language library should support user-configurable timeout if it blocks on shut down. +The OpenTelemetry client could block the end-user application when it shut down. On shutdown, it has to flush data to prevent information loss. The OpenTelemetry client should support user-configurable timeout if it blocks on shut down. -If the language library supports an explicit flush operation, it could block also. But should support a configurable timeout. +If the OpenTelemetry client supports an explicit flush operation, it could block also. But should support a configurable timeout. ## Documentation diff --git a/specification/protocol/otlp.md b/specification/protocol/otlp.md index 8255abf4380..6a93c7063da 100644 --- a/specification/protocol/otlp.md +++ b/specification/protocol/otlp.md @@ -89,7 +89,7 @@ that is not yet acknowledged by the server. Sequential operation is recommended when simplicity of implementation is desirable and when the client and the server are connected via very low-latency network, such as for example when the client is an instrumented application and -the server is a OpenTelemetry Collector running as a local daemon (agent). +the server is an OpenTelemetry Collector running as a local daemon (agent). The implementations that need to achieve high throughput SHOULD support concurrent Unary calls to achieve higher throughput. The client SHOULD send new diff --git a/specification/trace/sdk.md b/specification/trace/sdk.md index c1ffe1c6040..afc57e49d02 100644 --- a/specification/trace/sdk.md +++ b/specification/trace/sdk.md @@ -252,7 +252,7 @@ failed or timed out. `Shutdown` SHOULD complete or abort within some timeout. `Shutdown` can be implemented as a blocking API or an asynchronous API which notifies the caller -via a callback or an event. Language library authors can decide if they want to +via a callback or an event. OpenTelemetry client authors can decide if they want to make the shutdown timeout configurable. `Shutdown` MUST be implemented at least by invoking `Shutdown` within all internal processors. @@ -403,7 +403,7 @@ failed or timed out. `Shutdown` SHOULD complete or abort within some timeout. `Shutdown` can be implemented as a blocking API or an asynchronous API which notifies the caller -via a callback or an event. Language library authors can decide if they want to +via a callback or an event. OpenTelemetry client authors can decide if they want to make the shutdown timeout configurable. #### ForceFlush() @@ -419,7 +419,7 @@ invocation, but before the `Processor` exports the completed spans. `ForceFlush` SHOULD complete or abort within some timeout. `ForceFlush` can be implemented as a blocking API or an asynchronous API which notifies the caller -via a callback or an event. Language library authors can decide if they want to +via a callback or an event. OpenTelemetry client authors can decide if they want to make the flush timeout configurable. ### Built-in span processors @@ -520,7 +520,7 @@ call to `Shutdown` subsequent calls to `Export` are not allowed and should return a `Failure` result. `Shutdown` should not block indefinitely (e.g. if it attempts to flush the data -and the destination is unavailable). Language library authors can decide if they +and the destination is unavailable). OpenTelemetry client authors can decide if they want to make the shutdown timeout configurable. ### Further Language Specialization @@ -541,7 +541,7 @@ telemetry data generation. #### Examples These are examples on what the `Exporter` interface can look like in specific -languages. Examples are for illustration purposes only. Language library authors +languages. Examples are for illustration purposes only. OpenTelemetry client authors are free to deviate from these provided that their design remain true to the spirit of `Exporter` concept. diff --git a/specification/versioning-and-stability.md b/specification/versioning-and-stability.md new file mode 100644 index 00000000000..ee0d212767c --- /dev/null +++ b/specification/versioning-and-stability.md @@ -0,0 +1,258 @@ +# Versioning and stability for OpenTelemetry clients + + + + +- [Design goals](#design-goals) +- [Signal lifecycle](#signal-lifecycle) + * [Experimental](#experimental) + * [Stable](#stable) + + [API Stability](#api-stability) + + [SDK Stability](#sdk-stability) + + [Contrib Stability](#contrib-stability) + + [NOT DEFINED: Telemetry Stability](#not-defined-telemetry-stability) + + [NOT DEFINED: Semantic Conventions Stability](#not-defined-semantic-conventions-stability) + * [Deprecated](#deprecated) + * [Removed](#removed) + * [A note on replacing signals](#a-note-on-replacing-signals) +- [Version numbers](#version-numbers) + * [Major version bump](#major-version-bump) + * [Minor version bump](#minor-version-bump) + * [Patch version bump](#patch-version-bump) +- [Long Term Support](#long-term-support) + * [API support](#api-support) + * [SDK Support](#sdk-support) + * [Contrib Support](#contrib-support) +- [OpenTelemetry GA](#opentelemetry-ga) + + + +This document defines the stability guarantees offered by the OpenTelemetry clients, along with the rules and procedures for meeting those guarantees. + +In this document, the terms "OpenTelemetry" and "language implementations" both specifically refer to the OpenTelemetry clients. +These terms do not refer to the specification or the Collector in this document. + +Each language implementation MUST take these versioning and stability requirements, and produce a language-specific document which details how these requirements will be met. +This document SHALL be placed in the root of each repo and named `VERSIONING`. + +## Design goals + +Versioning and stability procedures are designed to meet the following goals. + +**Ensure that application owners stay up to date with the latest release of the SDK.** +We want all users to stay up to date with the latest version of the OpenTelemetry SDK. +We do not want to create hard breaks in support, of any kind, which leave users stranded on older versions. +It MUST always be possible to upgrade to the latest minor version of the OpenTelemetry SDK, without creating compilation or runtime errors. + +**Never create a dependency conflict between packages which rely on different versions of OpenTelemetry. Avoid breaking all stable public APIs.** +Backwards compatibility is a strict requirement. +Instrumentation APIs cannot create a version conflict, ever. Otherwise, the OpenTelemetry API cannot be embedded in widely shared libraries, such as web frameworks. +Code written against older versions of the API MUST work with all newer versions of the API. +Transitive dependencies of the API cannot create a version conflict. The OpenTelemetry API cannot depend on a particular package if there is any chance that any library or application may require a different, incompatible version of that package. +A library that imports the OpenTelemetry API should never become incompatible with other libraries due to a version conflict in one of OpenTelemetry's dependencies. +Theoretically, APIs can be deprecated and eventually removed, but this is a process measured in years and we have no plans to do so. + +**Allow for multiple levels of package stability within the same release of an OpenTelemetry component.** +Provide maintainers a clear process for developing new, experimental [signals](glossary.md#signals) alongside stable signals. +Different packages within the same release may have different levels of stability. +This means that an implementation wishing to release stable tracing today MUST ensure that experimental metrics are factored out in such a way that breaking changes to metrics API do not destabilize the trace API packages. + +## Signal lifecycle + +The development of each signal follows a lifecycle: experimental, stable, deprecated, removed. + +The infographic below shows an example of the lifecycle of an API component. + +![API Lifecycle](../internal/img/api-lifecycle.png) + +### Experimental + +Signals start as **experimental**, which covers alpha, beta, and release candidate versions of the signal. +While signals are experimental, breaking changes and performance issues MAY occur. +Components SHOULD NOT be expected to be feature-complete. +In some cases, the experiment MAY be discarded and removed entirely. +Long-term dependencies SHOULD NOT be taken against experimental signals. + +OpenTelemetry clients MUST be designed in a manner that allows experimental signals to be created without breaking the stability guarantees of existing signals. + +OpenTelemetry clients MUST NOT be designed in a manner that breaks existing users when a signal transitions from experimental to stable. This would punish users of the release candidate, and hinder adoption. + +Terms which denote stability, such as "experimental," MUST NOT be used as part of a directory or import name. +Package **version numbers** MAY include a suffix, such as -alpha, -beta, -rc, or -experimental, to differentiate stable and experimental packages. + +### Stable + +Once an experimental signal has gone through rigorous beta testing, it MAY transition to **stable**. +Long-term dependencies MAY now be taken against this signal. + +All signal components MAY become stable together, or MAY transition to stability component-by-component. The API MUST become stable before the other components. + +Once a signal component is marked as stable, the following rules MUST apply until the end of that signal’s existence. + +#### API Stability + +Backward-incompatible changes to API packages MUST NOT be made unless the major version number is incremented. +All existing API calls MUST continue to compile and function against all future minor versions of the same major version. + +Languages which ship binary artifacts SHOULD offer [ABI compatibility](glossary.md#abi-compatibility) for API packages. + +#### SDK Stability + +Public portions of SDK packages MUST remain backwards compatible. +There are two categories of public features: **plugin interfaces** and **constructors**. +Examples of plugins include the SpanProcessor, Exporter, and Sampler interfaces. +Examples of constructors include configuration objects, environment variables, and SDK builders. + +Languages which ship binary artifacts SHOULD offer [ABI compatibility](glossary.md#abi-compatibility) for SDK packages. + +#### Contrib Stability + +**NOTE: Until telemetry stability is defined, Contrib instrumentation MUST NOT be marked as stable. See below.** + +Plugins, instrumentation, and other contrib packages SHOULD be kept up to date and compatible with the latest versions of the API, SDK, and Semantic Conventions. +If a release of the API, SDK, or Semantic Conventions contains changes which are relevant to a contrib package, that package SHOULD be updated and released in a timely fashion. +The goal is to ensure users can update to the latest version of OpenTelemetry, and not be held back by the plugins that they depend on. + +Public portions of contrib packages (constructors, configuration, interfaces) SHOULD remain backwards compatible. + +Languages which ship binary artifacts SHOULD offer [ABI compatibility](glossary.md#abi-compatibility) for contrib packages. + +**Exception:** Contrib packages MAY break stability when a required downstream dependency breaks stability. +For example, a database integration may break stability if the required database client breaks stability. +However, it is strongly RECOMMENDED that older contrib packages remain stable. +A new, incompatible version of an integration SHOULD be released as a separate contrib package, rather than break the existing contrib package. + +#### NOT DEFINED: Telemetry Stability + +**Telemetry stability guarantees are TBD.** + +Changes to telemetry produced by OpenTelemetry instrumentation SHOULD avoid breaking analysis tools, such as dashboards and alerts. +However, it is not clear at this time what type of instrumentation changes (for example, adding additional spans and labels) would actually cause a breaking change. + +#### NOT DEFINED: Semantic Conventions Stability + +Telemetry stability, including semantic conventions, is not currently defined. The following practices are recommended. + +Semantic Conventions SHOULD NOT be removed once they are added. +New conventions MAY be added to replace usage of older conventions, but the older conventions SHOULD NOT be removed. +Older conventions SHOULD be marked as deprecated when they are replaced by newer conventions. + +### Deprecated + +Signals MAY eventually be replaced. When this happens, they are marked as deprecated. + +Signals SHALL only be marked as deprecated when the replacement becomes stable. +Deprecated code MUST abide by the same support guarantees as stable code. + +### Removed + +Support is ended by the removal of a signal from the release. +The release MUST make a major version bump when this happens. + +### A note on replacing signals + +Note that we currently have no plans for creating a major version of OpenTelemetry past v1.0. + +For clarity, it is still possible to create new, backwards incompatible versions of existing signals without actually moving to v2.0 and breaking support. + +For example, imagine we develop a new, better tracing API - let's call it AwesomeTrace. +We will never mutate the current tracing API into AwesomeTrace. +Instead, AwesomeTrace would be added as an entirely new signal which coexists and interoperates with the current tracing signal. +This would make adding AwesomeTrace a minor version bump, *not* v2.0. +v2.0 would mark the end of support for current tracing, not the addition of AwesomeTrace. +And we don't want to ever end that support, if we can help it. + +This is not actually a theoretical example. +OpenTelemetry already supports two tracing APIs: OpenTelemetry and OpenTracing. +We invented a new tracing API, but continue to support the old one. + +## Version numbers + +OpenTelemetry clients follow [Semantic Versioning 2.0.0](https://semver.org/spec/v2.0.0.html), with the following clarifications. + +OpenTelemetry clients have four components: API, SDK, Semantic Conventions, and Contrib. + +For the purposes of versioning, all code within a component MUST treated as if it were part of a single package, and versioned with the same version number, +except for Contrib, which may be a collection of packages versioned separately. + +* All stable API packages MUST version together, across all signals. +Stable signals MUST NOT have separate version numbers. +There is one version number that applies to all signals that are included in the API release that is labeled with that particular version number. +* SDK packages for all signals MUST version together, across all signals. +Signals MUST NOT have separate version numbers. +There is one version number that applies to all signals that are included in the SDK release that is labeled with that particular version number. +* Semantic Conventions are a single package with a single version number. +* Each contrib package MAY have it's own version number. +* The API, SDK, Semantic Conventions, and contrib components have independent version numbers. +For example, the latest version of `opentelemetry-python-api` MAY be at v1.2.3 while the latest version of `opentelemetry-python-sdk` is at v2.3.1. +* Different language implementations have independent version numbers. +For example, it is fine to have `opentelemetry-python-api` at v1.2.8 when `opentelemetry-java-api` is at v1.3.2. +* Language implementations have version numbers which are independent of the specification they implement. +For example, it is fine for v1.8.2 of `opentelemetry-python-api` to implement v1.1.1 of the specification. + +**Exception:** in some languages, package managers may react poorly to experimental packages having a version higher than 0.X. +In these cases, experimental signals MAY version independently from stable signals, in order to retain a 0.X version number. +When a signal becomes stable, the version MUST be bumped to match the other stable signals in the release. + +### Major version bump + +Major version bumps MUST occur when there is a breaking change to a stable interface, the removal of a deprecated signal, or a drop in support for a language or runtime version. +Major version bumps SHOULD NOT occur for changes which do not result in a drop in support of some form. + +### Minor version bump + +Most changes to OpenTelemetry clients result in a minor version bump. + +* New backward-compatible functionality added to any component. +* Breaking changes to internal SDK components. +* Breaking changes to experimental signals. +* New experimental signals are added. +* Experimental signals become stable. +* Stable signals are deprecated. + +### Patch version bump + +Patch versions make no changes which would require recompilation or potentially break application code. +The following are examples of patch fixes. + +* Bug fixes which don't require minor version bump per rules above. +* Security fixes. +* Documentation. + +Currently, the OpenTelemetry project does NOT have plans to backport bug and security fixes to prior minor versions of the SDK. +Security and bug fixes MAY only be applied to the latest minor version. +We are committed to making it feasible for end users to stay up to date with the latest version of the OpenTelemetry SDK. + +## Long Term Support + +![long term support](../internal/img/long-term-support.png) + +### API support + +Major versions of the API MUST be supported for a minimum of **three years** after the release of the next major API version. +API support is defined as follows. + +* API stability, as defined above, MUST be maintained. + +* A version of the SDK which supports the latest minor version of the last major version of the API will continue to be maintained during LTS. +Bug and security fixes MUST be backported. Additional feature development is NOT RECOMMENDED. + +* Contrib packages available when the API is versioned MUST continue to be maintained for the duration of LTS. +Bug and security fixes will be backported. +Additional feature development is NOT RECOMMENDED. + +### SDK Support + +SDK stability, as defined above, will be maintained for a minimum of **one year** after the release of the next major SDK version. + +### Contrib Support + +Contrib stability, as defined above, will be maintained for a minimum of **one year** after the release of the next major version of a contrib package. + +## OpenTelemetry GA + +The term “OpenTelemetry GA” refers to the point at which OpenTracing and OpenCensus will be fully deprecated. +The **minimum requirements** for declaring GA are as followed. + +* A stable version of both tracing and metrics MUST be released in at least four languages. +* CI/CD, performance, and integration tests MUST be implemented for these languages.