Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for partial success in an OTLP export response [2] #2696

Merged
3 changes: 3 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -64,6 +64,9 @@ release.

### OpenTelemetry Protocol

- Add support for partial success in an OTLP export response
([#2696](https://github.com/open-telemetry/opentelemetry-specification/pull/2696)).

### SDK Configuration

- Mark `OTEL_METRIC_EXPORT_INTERVAL`, `OTEL_METRIC_EXPORT_TIMEOUT`
Expand Down
139 changes: 116 additions & 23 deletions specification/protocol/otlp.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,14 +16,18 @@ nodes such as collectors and telemetry backends.
* [OTLP/gRPC](#otlpgrpc)
+ [OTLP/gRPC Concurrent Requests](#otlpgrpc-concurrent-requests)
+ [OTLP/gRPC Response](#otlpgrpc-response)
- [Success](#success)
- [Partial Success](#partial-success)
- [Failures](#failures)
+ [OTLP/gRPC Throttling](#otlpgrpc-throttling)
+ [OTLP/gRPC Service and Protobuf Definitions](#otlpgrpc-service-and-protobuf-definitions)
+ [OTLP/gRPC Default Port](#otlpgrpc-default-port)
* [OTLP/HTTP](#otlphttp)
+ [OTLP/HTTP Request](#otlphttp-request)
+ [OTLP/HTTP Response](#otlphttp-response)
- [Success](#success)
- [Failures](#failures)
- [Success](#success-1)
- [Partial Success](#partial-success-1)
- [Failures](#failures-1)
- [Bad Data](#bad-data)
- [OTLP/HTTP Throttling](#otlphttp-throttling)
- [All Other Responses](#all-other-responses)
Expand All @@ -35,7 +39,7 @@ nodes such as collectors and telemetry backends.
- [Known Limitations](#known-limitations)
* [Request Acknowledgements](#request-acknowledgements)
+ [Duplicate Data](#duplicate-data)
* [Partial Success](#partial-success)
+ [Partial Success Retry](#partial-success-retry)
- [Future Versions and Interoperability](#future-versions-and-interoperability)
- [Glossary](#glossary)
- [References](#references)
Expand Down Expand Up @@ -145,16 +149,59 @@ was not delivered.

#### OTLP/gRPC Response

The server may respond with either a success or an error to the requests.
The response MUST be the appropriate serialized Protobuf message (see below for
the specific message to use in the [Success](#success),
[Partial Success](#partial-success) and [Failure](#failures) cases).

##### Success
joaopgrassi marked this conversation as resolved.
Show resolved Hide resolved

The success response indicates telemetry data is successfully accepted by the
server.

The success response indicates telemetry data is successfully processed by the
server. If the server receives an empty request (a request that does not carry
If the server receives an empty request (a request that does not carry
any telemetry data) the server SHOULD respond with success.

Success response is returned via
[Export*ServiceResponse](https://github.com/open-telemetry/opentelemetry-proto)
message (`ExportTraceServiceResponse` for traces, `ExportMetricsServiceResponse`
for metrics, `ExportLogsServiceResponse` for logs).
On success, the server response MUST be a Protobuf-encoded
joaopgrassi marked this conversation as resolved.
Show resolved Hide resolved
[Export<signal>ServiceResponse](https://github.com/open-telemetry/opentelemetry-proto)
message (`ExportTraceServiceResponse` for traces,
`ExportMetricsServiceResponse` for metrics and
`ExportLogsServiceResponse` for logs).

The server MUST leave the `partial_success` field unset
in case of a successful response.

##### Partial Success

If the request is only partially accepted
(i.e. when the server accepts only parts of the data and rejects the rest), the
server response MUST be a Protobuf-encoded
[Export<signal>ServiceResponse](https://github.com/open-telemetry/opentelemetry-proto)
message (`ExportTraceServiceResponse` for traces,
`ExportMetricsServiceResponse` for metrics and
`ExportLogsServiceResponse` for logs).
joaopgrassi marked this conversation as resolved.
Show resolved Hide resolved

Additionally, the server MUST initialize the `partial_success` field
(`ExportTracePartialSuccess` message for traces,
`ExportMetricsPartialSuccess` message for metrics and
`ExportLogsPartialSuccess` message for logs), and it MUST set the respective
joaopgrassi marked this conversation as resolved.
Show resolved Hide resolved
`rejected_spans`, `rejected_data_points` or `rejected_log_records` field with
the number of spans/data points/log records it rejected.

A `partial_success` with the `rejected_<signal>` field holding a `0` value
is an invalid result. In such cases, senders MUST ignore the `partial_success`
field and handle such response from a server in the same way as defined in the
[Success](#success) section.
joaopgrassi marked this conversation as resolved.
Show resolved Hide resolved

The server SHOULD populate the `error_message` field with a human-readable
error message in English. The message should explain why the
server rejected parts of the data, and might offer guidance on how users
can address the issues.
The protocol does not attempt to define the structure of the error message.

The client MUST NOT retry the request when it receives a partial success
response where the `partial_success` is populated.

##### Failures

When an error is returned by the server it falls into 2 broad categories:
retryable and not-retryable:
Expand Down Expand Up @@ -382,8 +429,9 @@ numbers or strings are accepted when decoding.

#### OTLP/HTTP Response

Response body MUST be the appropriate serialized Protobuf message (see below for
the specific message to use in the Success and Failure cases).
The response body MUST be the appropriate serialized Protobuf message (see below for
the specific message to use in the [Success](#success-1),
[Partial Success](#partial-success-1) and [Failure](#failures-1) cases).

The server MUST set "Content-Type: application/x-protobuf" header if the
response body is binary-encoded Protobuf payload. The server MUST set
Expand All @@ -397,13 +445,51 @@ header.

##### Success

On success the server MUST respond with `HTTP 200 OK`. Response body MUST be
Protobuf-encoded `ExportTraceServiceResponse` message for traces,
`ExportMetricsServiceResponse` message for metrics and
`ExportLogsServiceResponse` message for logs.
The success response indicates telemetry data is successfully accepted by the
server.

If the server receives an empty request (a request that does not carry
any telemetry data) the server SHOULD respond with success.

The server SHOULD respond with success no sooner than after successfully
decoding and validating the request.
On success, the server MUST respond with `HTTP 200 OK`. The response body MUST be
a Protobuf-encoded [Export<signal>ServiceResponse](https://github.com/open-telemetry/opentelemetry-proto)
message (`ExportTraceServiceResponse` for traces,
`ExportMetricsServiceResponse` for metrics and
`ExportLogsServiceResponse` for logs).

The server MUST leave the `partial_success` field unset
in case of a successful response.

##### Partial Success

If the request is only partially accepted
(i.e. when the server accepts only parts of the data and rejects the rest), the
server MUST respond with `HTTP 200 OK`. The response body MUST be
a Protobuf-encoded [Export<signal>ServiceResponse](https://github.com/open-telemetry/opentelemetry-proto)
message (`ExportTraceServiceResponse` for traces,
`ExportMetricsServiceResponse` for metrics and
`ExportLogsServiceResponse` for logs).

Additionally, the server MUST initialize the `partial_success` field
(`ExportTracePartialSuccess` message for traces,
`ExportMetricsPartialSuccess` message for metrics and
`ExportLogsPartialSuccess` message for logs), and it MUST set the respective
`rejected_spans`, `rejected_data_points` or `rejected_log_records` field with
the number of spans/data points/log records it rejected.

A `partial_success` with the `rejected_<signal>` field holding a `0` value
is an invalid result. In such cases, senders MUST ignore the `partial_success`
field and handle such response from a server in the same way as defined in the
[Success](#success-1) section.

The server SHOULD populate the `error_message` field with a human-readable
error message in English. The message should explain why the
server rejected parts of the data, and might offer guidance on how users
can address the issues.
The protocol does not attempt to define the structure of the error message.

The client MUST NOT retry the request when it receives a partial success
response where the `partial_success` is populated.

##### Failures

Expand Down Expand Up @@ -520,12 +606,19 @@ received yet. The client will typically choose to re-send such data to guarantee
delivery, which may result in duplicate data on the server side. This is a
deliberate choice and is considered to be the right tradeoff for telemetry data.

### Partial Success
#### Partial Success Retry

The partial success defined by the protocol is neither designed nor intended
joaopgrassi marked this conversation as resolved.
Show resolved Hide resolved
to be used as a mechanism for clients to automatically retry an export request.

Servers should return a partial success response when they fully understand that
resending the same bundle of telemetry would lead to the same error again,
thus preventing retry loops.

The protocol does not attempt to communicate partial reception success from the
server to the client (i.e. when part of the data can be received by the server
and part of it cannot). Attempting to do so would complicate the protocol and
implementations significantly and is left out as a possible future area of work.
The protocol does not attempt to define how clients should automatically retry
a partially successful request.
Attempting to do so would complicate the protocol and implementations
significantly and is left out as a possible future area of work.
joaopgrassi marked this conversation as resolved.
Show resolved Hide resolved

## Future Versions and Interoperability

Expand Down