sdk/log: self observability: batch log processor metrics by mahendrabishnoi2 · Pull Request #7124 · open-telemetry/opentelemetry-go

mahendrabishnoi2 · 2025-08-04T06:33:02Z

Adds support for below metrics for sdk/log#BatchProcessor

otel.sdk.processor.log.processed
otel.sdk.processor.log.queue.size
otel.sdk.processor.log.queue.capacity

These metrics are defined at https://github.com/open-telemetry/semantic-conventions/blob/v1.36.0/docs/otel/sdk-metrics.md and are experimental. Because of this, these use global MeterProvider and are behind a feature flag OTEL_GO_X_OBSERVABILITY.

Observability Implementation Checklist

Based on the project Observability guidelines, ensure the following are completed:

Environment Variable Activation

Observability features are disabled by default
Features are activated through the OTEL_GO_X_OBSERVABILITY environment variable
Use consistent pattern with x.Observability.Enabled() check ¹
Follow established experimental feature pattern ²³

Encapsulation

Instrumentation is encapsulated within a dedicated struct (e.g., Instrumentation)
Instrumentation is not mixed into the instrumented component
Instrumentation code is in its own file or package if complex/reused
Instrumentation setup doesn't bloat the main component code

Initialization

Initialization is only done when observability is enabled
Setup is explicit and side-effect free
Return errors from initialization when appropriate
Use the global Meter provider (e.g., otel.GetMeterProvider())
Include proper meter configuration with:
Instrumentation package name is used for the Meter
Instrumentation version (e.g. Version)
Schema URL (e.g. SchemaURL)

Performance

Little to no overhead when observability is disabled
Expensive operations are only executed when observability is enabled
When enabled, instrumentation code paths are optimized to reduce allocation/computation overhead

Attribute and Option Allocation Management

Use sync.Pool for attribute slices and options with dynamic attributes
Pool objects are properly reset before returning to pool
Pools are scoped for maximum efficiency while ensuring correctness

Caching

Static attribute sets known at compile time are pre-computed and cached
Common attribute combinations use lookup tables/maps

Benchmarking

Benchmarks provided for all instrumentation code
Benchmark scenarios include both enabled and disabled observability
Benchmark results show impact on allocs/op, B/op, and ns/op (use b.ReportAllocs() in benchmarks)

Error Handling and Robustness

Errors are reported back to caller when possible
Partial failures are handled gracefully
Use partially initialized components when available
Return errors to caller instead of only using otel.Handle()
Use otel.Handle() only when component cannot report error to user

Context Propagation

Observability measurements receive the context from the function being measured (don't break context propagation by using context.Background())

Semantic Conventions Compliance

All metrics follow OpenTelemetry Semantic Conventions
Use the otelconv convenience package for metric semantic conventions
Component names follow semantic conventions
Use package path scope type as stable identifier for non-standard components
Component names are stable unique identifiers
Use global counter for uniqueness if necessary
Component ID counter is resettable for deterministic testing

Testing

Use deterministic testing with isolated state
Restore previous state after tests (t.Cleanup())
Isolate meter provider for testing
Use t.Setenv() for environment variable testing
Reset component ID counter for deterministic component names
Test order doesn't affect results

- setup and teardown for self observability. Includes setting up counters for queue capacity, size, log processed. Also includes callback registration for queue cap and size - doc comments pending - has some todo comments that I plan to address by discussion on PR review

…ation - increment logProcessedCounter on push to exporter or drops from queue - pending: flush methods where we are pushing to exporter

codecov · 2025-08-04T15:19:10Z

Codecov Report

❌ Patch coverage is 95.55556% with 4 lines in your changes missing coverage. Please review.
✅ Project coverage is 82.9%. Comparing base (468c62d) to head (170e336).

Files with missing lines	Patch %	Lines
sdk/log/batch.go	89.4%	1 Missing and 1 partial ⚠️
sdk/log/internal/observ/batch_log_processor.go	96.6%	1 Missing and 1 partial ⚠️

Additional details and impacted files

@@          Coverage Diff          @@
##            main   #7124   +/-   ##
=====================================
  Coverage   82.9%   82.9%           
=====================================
  Files        314     316    +2     
  Lines      24911   24995   +84     
=====================================
+ Hits       20661   20738   +77     
- Misses      3878    3882    +4     
- Partials     372     375    +3

Files with missing lines	Coverage Δ
sdk/log/exporter.go	`96.5% <100.0%> (+0.2%)`	⬆️
sdk/log/internal/counter/counter.go	`100.0% <100.0%> (ø)`
sdk/log/batch.go	`99.0% <89.4%> (-1.0%)`	⬇️
sdk/log/internal/observ/batch_log_processor.go	`96.6% <96.6%> (ø)`

... and 1 file with indirect coverage changes

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

…eflect what it means for a log to be processed

MrAlias · 2025-09-15T20:55:32Z

@mahendrabishnoi2 please take a look at #7017. I have updated the issue with a check list of items from our project Observability guidelines that need to be completed in this PR.

MrAlias · 2025-10-07T17:53:41Z

@mahendrabishnoi2 I just wanted to check in with this PR. Are you still able to update this PR or open a new PR to address the instrumentation requirements?

mahendrabishnoi2 · 2025-10-07T18:00:22Z

@mahendrabishnoi2 I just wanted to check in with this PR. Are you still able to update this PR or open a new PR to address the instrumentation requirements?

Yes @MrAlias. I can finish it in 2-3 days as per the updated guidelines. It would be great if I could get some feedback on the approach I'm taking for this PR. Since we need to record the metric just before logs are passed to exporter, I have crated another wrapper (similar to timeoutExporter or chunkedExporter) to achieve this. Just trying to avoid the rework.
If this approach seems fine, I can go ahead and updated it as per guidelines, add tests.

…w guidelines

…ty and integrate it with processor.

dashpole

Not sure if you are still working on this, but this would be relevant if someone else picks this up.

mahendrabishnoi2 · 2026-04-10T02:41:30Z

Not sure if you are still working on this, but this would be relevant if someone else picks this up.

I'll raise an updated PR today. Was just waiting for confirmation on my approach as I mentioned in issue as well.
Thanks @dashpole

dashpole · 2026-04-10T12:50:30Z

Please remove WIP from the title and mark it ready for review when it is ready for others to take a look

mahendrabishnoi2 · 2026-04-12T04:38:56Z

goos: darwin
goarch: arm64
pkg: go.opentelemetry.io/otel/sdk/log/internal/observ
cpu: Apple M1 Pro
                         │ blp_bench.txt │
                         │    sec/op     │
BLP/Processed-8             0.4520n ± 3%
BLP/ProcessedQueueFull-8    0.4463n ± 1%
BLP/Callback-8               2.692µ ± 8%
geomean                      8.159n

                         │ blp_bench.txt  │
                         │      B/op      │
BLP/Processed-8              0.000 ± 0%
BLP/ProcessedQueueFull-8     0.000 ± 0%
BLP/Callback-8             2.814Ki ± 0%
geomean                                 ¹
¹ summaries must be >0 to compute geomean

                         │ blp_bench.txt │
                         │   allocs/op   │
BLP/Processed-8             0.000 ± 0%
BLP/ProcessedQueueFull-8    0.000 ± 0%
BLP/Callback-8              26.00 ± 0%
geomean                                ¹
¹ summaries must be >0 to compute geomean

dashpole · 2026-04-14T15:41:25Z

 - Add experimental support for splitting metric data across multiple batches in `go.opentelemetry.io/otel/sdk/metric`.
  Set `OTEL_GO_X_METRIC_EXPORT_BATCH_SIZE=<max_size>` to enable for all periodic readers.
  See `go.opentelemetry.io/otel/sdk/metric/internal/x` for feature documentation. (#8071)
+- Add experimental observability metrics in `go.opentelemetry.io/otel/sdk/log`. (#7124)


Suggested change

- Add experimental observability metrics in `go.opentelemetry.io/otel/sdk/log`. (#7124)

- Add experimental observability metrics to the BatchProcessor in `go.opentelemetry.io/otel/sdk/log`. (#7124)

Copilot

Pull request overview

This PR adds experimental self-observability metrics for sdk/log.BatchProcessor (BatchLogProcessor) behind the OTEL_GO_X_OBSERVABILITY feature flag, using the global MeterProvider, aligning with the semantic conventions for OTel SDK metrics.

Changes:

Introduces sdk/log/internal/observ.BLP instrumentation emitting otel.sdk.processor.log.processed, otel.sdk.processor.log.queue.size, and otel.sdk.processor.log.queue.capacity.
Wires the instrumentation into BatchProcessor (export path + dropped-records path) and adds/updates tests and benchmarks.
Adds a generated internal counter utility to produce deterministic component IDs in tests.

Reviewed changes

Copilot reviewed 12 out of 12 changed files in this pull request and generated 4 comments.

Show a summary per file

File	Description
sdk/log/batch.go	Creates BLP instrumentation in `NewBatchProcessor`, records queue-full drops, shuts down instrumentation.
sdk/log/exporter.go	Adds `metricsExporter` wrapper to count processed log records right before export.
sdk/log/batch_test.go	Adds tests asserting metrics presence/absence and correctness when observability is enabled/disabled.
sdk/log/internal/observ/batch_log_processor.go	New BLP instrumentation implementation (queue size/capacity callback + processed counter).
sdk/log/internal/observ/batch_log_processor_test.go	New tests/benchmarks for BLP instrumentation behavior.
sdk/log/internal/observ/observ_test.go	Shared test helpers + erroring meter provider/meter fakes.
sdk/log/internal/counter/counter.go	Generated atomic counter for unique IDs (resettable for tests).
sdk/log/internal/counter/counter_test.go	Generated tests for the counter behavior and concurrency-safety.
sdk/log/internal/gen.go	Adds go:generate targets for the new internal counter.
sdk/log/internal/x/README.md	Documents the newly added BatchProcessor observability metrics.
sdk/log/logger_bench_test.go	Updates benchmark MeterProvider setup to drop internal observ scope metrics via a view.
CHANGELOG.md	Adds a changelog entry for the new experimental log SDK observability metrics.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-04-14T17:57:15Z

+	}, err
+}
+
+func (b *BLP) Shutdown() error {


NewBLP can return a non-nil *BLP even when meter.RegisterCallback(...) fails. In that case reg is likely nil, and Shutdown() will panic when calling b.reg.Unregister(). Consider initializing reg to a no-op registration (e.g., from go.opentelemetry.io/otel/metric/noop) and only overwriting it on success, or guard Shutdown() with a nil check so it becomes safe/idempotent even when callback registration failed.

Suggested change

func (b *BLP) Shutdown() error {

func (b *BLP) Shutdown() error {

if b == nil || b.reg == nil {

return nil

}

Here is how Codex found the same issue:

If meter.RegisterCallback returns an error, reg is left nil here but NewBLP still returns a non-nil *BLP, so NewBatchProcessor installs it and later (*BLP).Shutdown calls b.reg.Unregister() on a nil registration. In any process that enables OTEL_GO_X_OBSERVABILITY with a custom or partially configured MeterProvider, shutdown turns a handled instrumentation error into a panic.

I have not added b == nil check as this is internal api and on all callsites, we already check for b.inst == nil.
Do let me know if this check also needs to be added.

Copilot · 2026-04-14T17:57:16Z

+	var err error
+	b.inst, err = observ.NewBLP(
+		counter.NextExporterID(),
+		func() int64 { return int64(b.q.Len()) },
+		int64(cfg.maxQSize.Value),
+	)
+	if err != nil {
+		otel.Handle(err)
+	}
+
+	// Wrap exporter with metrics recording if observability is enabled.
+	// This must be the innermost wrapper (closest to user exporter) to record
+	// metrics just before calling the actual exporter.
+	if b.inst != nil {
+		exporter = newMetricsExporter(exporter, b.inst)
+	}


observ.NewBLP(...) may return a non-nil b.inst together with a non-nil err (best-effort init). Right now the code logs the error via otel.Handle(err) but still wraps the exporter and later calls b.inst.Shutdown(), which can lead to runtime panics if the instrumentation is only partially initialized (e.g., callback registration failed). Consider treating any non-nil error as a signal to disable this instrumentation instance (b.inst = nil), or adjust NewBLP so it only returns a non-nil instance when it is safe to use/shutdown.

AI agents found issues.

pellared · 2026-04-14T18:06:42Z

 	// Flush remaining queued before exporter shutdown.
 	err := b.exporter.Export(ctx, b.q.Flush())
+	if b.inst != nil {
+		err = errors.Join(err, b.inst.Shutdown())


When records have overflowed the ring buffer since the last poll iteration, Shutdown closes pollKill and can exit the poll goroutine without ever consuming q.Dropped(). Because ProcessedQueueFull is only driven from that goroutine, an application that shuts down under backpressure can silently miss otel.sdk.processor.log.processed{error.type="queue_full"} increments for the last dropped log records, so the new observability metric under-reports exactly the shutdown path users care about most.

I think that addressing this would be very hard and I would rather create an issue for tracking this.

…ding

Copilot

Pull request overview

Copilot reviewed 12 out of 12 changed files in this pull request and generated 3 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-04-16T02:38:57Z

+	var err error
+	b.inst, err = observ.NewBLP(
+		counter.NextExporterID(),
+		func() int64 { return int64(b.q.Len()) },
+		int64(cfg.maxQSize.Value),
+	)


counter.NextExporterID() is evaluated unconditionally before observ.NewBLP(...) can check whether observability is enabled, so the global ID counter is incremented (atomic op) even when OTEL_GO_X_OBSERVABILITY is disabled. This adds avoidable overhead in the default (disabled) path and makes component IDs depend on how many batch processors were created while disabled. Consider gating the ID generation and NewBLP call behind the feature-flag check (or moving ID generation inside NewBLP after it confirms observability is enabled).

Copilot · 2026-04-16T02:38:57Z

+
+func TestBatchProcessorMetricsDisabled(t *testing.T) {
+	t.Setenv("OTEL_GO_X_OBSERVABILITY", "false")
+


This test mutates the global exporter ID counter via counter.SetExporterID(...) but does not restore the previous value. Since Go test execution order across tests in the package is not guaranteed, this can leak state into other tests that create batch processors and makes failures order-dependent. Consider saving the previous counter value and restoring it in t.Cleanup(...).

Suggested change

origExporterID := counter.ExporterID()

t.Cleanup(func() { counter.SetExporterID(origExporterID) })

similar pattern as other package tests, setting to 0

Copilot · 2026-04-16T02:38:57Z

+func TestBatchProcessorMetrics(t *testing.T) {
+	counter.SetExporterID(blpComponentID)
+
+	t.Setenv("OTEL_GO_X_OBSERVABILITY", "true")


This test mutates the global exporter ID counter via counter.SetExporterID(...) but does not restore the previous value, which can leak global state into other tests in the package depending on execution order. Consider saving the previous counter value and restoring it in t.Cleanup(...).

same as above

mahendrabishnoi2 added 3 commits August 4, 2025 11:25

- added hook to call configureSelfObservability on BatchProcessor cre…

9de991b

…ation - increment logProcessedCounter on push to exporter or drops from queue - pending: flush methods where we are pushing to exporter

- run make precommit

8d1edbf

pellared mentioned this pull request Aug 11, 2025

Logs SDK observability - batch processor metrics #7017

Open

46 tasks

mahendrabishnoi2 added 6 commits August 17, 2025 11:27

Merge branch 'main' into logs-batch-processor-metrics

ec0b55e

flatten setup for self-observability to make it transparent

385fd9f

add metricsExporter, a wrapper to record successful log processed metric

5c405d7

integrate with metricsExporter when self observability is enabled

1366caa

don't record metric when a log is added to queue, update comment to r…

b2dd092

…eflect what it means for a log to be processed

update CHANGELOG.md and README.md (sdk/log/internal/x/README.md)

4f24dff

mahendrabishnoi2 mentioned this pull request Aug 28, 2025

REQUEST: New membership for mahendrabishnoi2 open-telemetry/community#2955

Closed

6 tasks

mahendrabishnoi2 added 12 commits October 12, 2025 16:10

Merge branch 'main' into logs-batch-processor-metrics

c4bf81a

self observability -> observability

4cbaff1

instrumentation implementation in a separate observ package as per ne…

9d42e5c

…w guidelines

remove the wrapped exporter

3a35f8e

use newly created BLP abstraction for observability

96de62a

re-add metricsExporter with newly created struct (BLP) fo observabili…

001ddb6

…ty and integrate it with processor.

use generated counter package for component names

9371ede

test cases for BLP

f3a214a

make precommit

a931308

Merge branch 'main' into logs-batch-processor-metrics

df7b8f0

update CHANGELOG.md

7c2fd2b

remove duplicate ScopeName

7238dd4

dashpole reviewed Apr 9, 2026

View reviewed changes

Comment thread sdk/log/internal/observ/batch_log_processor.go Outdated

Comment thread sdk/log/internal/observ/batch_log_processor.go Outdated

Comment thread sdk/log/internal/observ/batch_log_processor.go Outdated

Merge branch 'main' into logs-batch-processor-metrics

3f2c046

mahendrabishnoi2 added 2 commits April 10, 2026 11:30

fix ci failures - add view to ignore batch log processor metrics

055f5ec

make precommit

eebda4b

mahendrabishnoi2 added 3 commits April 12, 2026 09:20

minor refactoring in benchmark tests to reduce verbosity/duplicate code

84e81c4

minor refactoring in benchmark tests to reduce verbosity/duplicate code

68025cc

Merge branch 'main' into logs-batch-processor-metrics

c82c8db

mahendrabishnoi2 marked this pull request as ready for review April 12, 2026 04:39

mahendrabishnoi2 requested review from MrAlias, XSAM, dmathieu, flc1125 and pellared as code owners April 12, 2026 04:39

mahendrabishnoi2 changed the title ~~[WIP] sdk/log: self observability: batch log processor metrics~~ sdk/log: self observability: batch log processor metrics Apr 12, 2026

mahendrabishnoi2 requested a review from dashpole April 12, 2026 04:39

dashpole approved these changes Apr 14, 2026

View reviewed changes

pellared previously approved these changes Apr 14, 2026

View reviewed changes

pellared requested a review from Copilot April 14, 2026 17:51

Copilot started reviewing on behalf of pellared April 14, 2026 17:52 View session

Copilot AI reviewed Apr 14, 2026

View reviewed changes

pellared reviewed Apr 14, 2026

View reviewed changes

mahendrabishnoi2 added 2 commits April 16, 2026 07:43

Merge branch 'main' into logs-batch-processor-metrics

81872b0

address review comments: fix nil reg panic in Shutdown, changelog wor…

52a7300

…ding

mahendrabishnoi2 requested a review from Copilot April 16, 2026 02:31

Copilot started reviewing on behalf of mahendrabishnoi2 April 16, 2026 02:32 View session

mahendrabishnoi2 mentioned this pull request Apr 16, 2026

sdk/log: BatchProcessor may under-report otel.sdk.processor.log.processed{error.type="queue_full"} on shutdown #8213

Open

Copilot AI reviewed Apr 16, 2026

View reviewed changes

mahendrabishnoi2 requested a review from pellared April 16, 2026 03:53

Merge branch 'main' into logs-batch-processor-metrics

170e336

	- Add experimental observability metrics in `go.opentelemetry.io/otel/sdk/log`. (#7124)
	- Add experimental observability metrics to the BatchProcessor in `go.opentelemetry.io/otel/sdk/log`. (#7124)


		func TestBatchProcessorMetricsDisabled(t *testing.T) {
		t.Setenv("OTEL_GO_X_OBSERVABILITY", "false")



	origExporterID := counter.ExporterID()
	t.Cleanup(func() { counter.SetExporterID(origExporterID) })

Conversation

mahendrabishnoi2 commented Aug 4, 2025 • edited by dashpole Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Observability Implementation Checklist

Environment Variable Activation

Encapsulation

Initialization

Performance

Attribute and Option Allocation Management

Caching

Benchmarking

Error Handling and Robustness

Context Propagation

Semantic Conventions Compliance

Testing

Footnotes

Uh oh!

codecov Bot commented Aug 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

MrAlias commented Sep 15, 2025

Uh oh!

MrAlias commented Oct 7, 2025

Uh oh!

mahendrabishnoi2 commented Oct 7, 2025

Uh oh!

dashpole left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

mahendrabishnoi2 commented Apr 10, 2026

Uh oh!

dashpole commented Apr 10, 2026

Uh oh!

mahendrabishnoi2 commented Apr 12, 2026

Uh oh!

dashpole Apr 14, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Apr 14, 2026

Choose a reason for hiding this comment

Uh oh!

pellared Apr 14, 2026

Choose a reason for hiding this comment

Uh oh!

mahendrabishnoi2 Apr 16, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Copilot AI Apr 14, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

pellared Apr 14, 2026

Choose a reason for hiding this comment

Uh oh!

mahendrabishnoi2 Apr 24, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI Apr 16, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 16, 2026

Choose a reason for hiding this comment

Uh oh!

mahendrabishnoi2 Apr 16, 2026

Choose a reason for hiding this comment

Uh oh!

mahendrabishnoi2 commented Aug 4, 2025 •

edited by dashpole

Loading

codecov Bot commented Aug 4, 2025 •

edited

Loading