Skip to content

PoC for batching in PeriodicReader#7930

Closed
dashpole wants to merge 1 commit intoopen-telemetry:mainfrom
dashpole:metric_batch_poc
Closed

PoC for batching in PeriodicReader#7930
dashpole wants to merge 1 commit intoopen-telemetry:mainfrom
dashpole:metric_batch_poc

Conversation

@dashpole
Copy link
Copy Markdown
Contributor

Batching logic is based on the collector's batchprocessor: https://github.com/open-telemetry/opentelemetry-collector/blob/587b90b9ecc1db959ee9104d5bf993591f80ca43/processor/batchprocessor/splitmetrics.go

PoC for open-telemetry/opentelemetry-specification#4895

This PR adds metric.WithMaxExportBatchSize to go.opentelemetry.io/sdk/metric, and causes the SDK to split batches before passing them to the exporter.

One potential issue is that I was hoping the export timeout could be applied individually to each batch, rather than to multiple serial export calls. But currently, we apply the timeout to collect + export. I've changed it to apply the timeout individually to collect, and to each export, but I'm curious how acceptable other maintainers think this kind of change would be. In practice, I suspect Collect() is not a notable source of latency or timeouts.

The spec for the timeout says:

exportTimeoutMillis - how long the export can run before it is cancelled. The default value is 30000 (milliseconds).

It would strike me as a bit odd if we did split metrics into batches to have the timeout still applied to the entire collect + multiple exports based on the spec.

@dashpole dashpole added the Skip Changelog PRs that do not require a CHANGELOG.md entry label Feb 20, 2026
@codecov
Copy link
Copy Markdown

codecov Bot commented Feb 20, 2026

Codecov Report

❌ Patch coverage is 57.59494% with 67 lines in your changes missing coverage. Please review.
✅ Project coverage is 81.5%. Comparing base (64f28b0) to head (a93066b).
⚠️ Report is 64 commits behind head on main.

Files with missing lines Patch % Lines
sdk/metric/splitmetrics.go 48.8% 64 Missing and 1 partial ⚠️
sdk/metric/periodic_reader.go 93.5% 1 Missing and 1 partial ⚠️
Additional details and impacted files

Impacted file tree graph

@@           Coverage Diff           @@
##            main   #7930     +/-   ##
=======================================
- Coverage   81.7%   81.5%   -0.2%     
=======================================
  Files        304     305      +1     
  Lines      23283   23430    +147     
=======================================
+ Hits       19032   19116     +84     
- Misses      3864    3926     +62     
- Partials     387     388      +1     
Files with missing lines Coverage Δ
sdk/metric/periodic_reader.go 86.9% <93.5%> (+1.5%) ⬆️
sdk/metric/splitmetrics.go 48.8% <48.8%> (ø)

... and 1 file with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@dashpole dashpole closed this Mar 9, 2026
github-merge-queue Bot pushed a commit to open-telemetry/opentelemetry-specification that referenced this pull request Mar 17, 2026
…ader (#4895)

Fixes
#4852

## Prior Art

The Trace SDK and Logging SDK both support a `maxExportBatchSize`
parameter to limit the number of spans/logs exported in a batch. The
collector's exporter helper and batch processor support a
`send_batch_max_size` configuration option, which (by default) applies
to the number spans, logs, or metric data points. In all cases, the
configured timeout applies to a single request.

## Requirements

* Apply a limit to the number of metric data points exported in a single
OTLP batch.
* Maintain existing ordering of metric data points. Batching must not
result in metric data from a subsequent Collect to be exported prior to
data from the earlier Collect call.
* Apply the timeout to individual requests, not to multiple requests
* The batch size must apply to a single exporter, and if multiple
exporters are used, each must be able to have its own batch size.

## Non-goals

* Introduce any parallelism into the metric export path
* Limit by bytes, or anything else

## Proposal

Add `maxExportBatchSize` to the periodic exporting MetricReader. The
periodic exporting MetricReader splits the batch of metric data points
received from Collect, if necessary, and then serially invokes `Export`
on each split batch with the configured timeout.

## Alternatives considered

### maxExportBatchSize for all MetricReaders

Instead of applying to only periodic readers, the batch size could apply
to all readers. This alternative is not chosen because

* Splitting batches is only required for push exporters.
* It makes more sense to group the batching configuration with timeout
configuration (which is on the periodic exporting MetricReader).

### maxExportBatchSize on OTLP exporters

Instead of being on the periodic exporting MetricReader, we could add
this configuration on the OTLP http and grpc exporters. This alternative
is not chosen because:

* The timeout should apply to individual batches, not to many, split
batches in order to match behavior of other SDKs and the collector. This
is only possible if batches are split _before_ the exporter, since the
Periodic MetricReader applies the timeout.
* It is more helpful to provide this functionality for all exporters, so
it doesn't need to be re-implemented or copied.

Prototypes:

* Go: open-telemetry/opentelemetry-go#7930

* [x] Links to the prototypes (when adding or changing features)
* [x]
[`CHANGELOG.md`](https://github.com/open-telemetry/opentelemetry-specification/blob/main/CHANGELOG.md)
file updated for non-trivial changes
* [x] [Spec compliance
matrix](https://github.com/open-telemetry/opentelemetry-specification/blob/main/spec-compliance-matrix/template.yaml)
updated if necessary
dashpole added a commit that referenced this pull request Apr 9, 2026
Adds experimental support for maxExportBatchSize using the
`OTEL_GO_X_METRIC_EXPORT_BATCH_SIZE=<size>` environment variable.

Previous prototype:
#7930

This preserves existing behavior for timeouts when batching is not used,
but individually applies the timeout to export calls when batching is
used.
pellared pushed a commit to pellared/opentelemetry-go that referenced this pull request Apr 23, 2026
…try#8071)

Adds experimental support for maxExportBatchSize using the
`OTEL_GO_X_METRIC_EXPORT_BATCH_SIZE=<size>` environment variable.

Previous prototype:
open-telemetry#7930

This preserves existing behavior for timeouts when batching is not used,
but individually applies the timeout to export calls when batching is
used.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Skip Changelog PRs that do not require a CHANGELOG.md entry

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant