Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Expose metric for log export failure #6709 #6779

Open
wants to merge 2 commits into
base: main
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -141,8 +141,10 @@ private static final class Worker implements Runnable {
private static final Logger logger = Logger.getLogger(Worker.class.getName());

private final LongCounter processedLogsCounter;
private final LongCounter logsExportFailureCounter;
private final Attributes droppedAttrs;
private final Attributes exportedAttrs;
private final Attributes exportFailureAttrs;

private final LogRecordExporter logRecordExporter;
private final long scheduleDelayNanos;
Expand Down Expand Up @@ -197,6 +199,12 @@ private Worker(
"The number of logs processed by the BatchLogRecordProcessor. "
+ "[dropped=true if they were dropped due to high throughput]")
.build();
logsExportFailureCounter =
meter
.counterBuilder("logsExportFailure")
Copy link
Member

@jack-berg jack-berg Oct 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While this seems like a small change, I'm reluctant to make it because there have been some attempts to standardize the SDKs' internal telemetry (e.g. OTEP#238).

The problem with continuing the pattern of these current metrics is that the structure doesn't conform to our semantic convention recommendations.

  • The unit is wrong - should probably be {export} instead of 1
  • The metric name doesn't include a namespace
  • The attributes don't have a namespace

Extending the instrumentation extends bad patterns. Fixing the bad patterns exposes our users to breaking changes, only to have more later if / when semantic conventions emerge. So we appear to be stuck. I'll bring it up at next week's java SIG to see if can reach any conclusion.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jack-berg agree with you here that current pattern (existing as well as any proposed metric in future) doesn't conform to semantic recommendations like metric name having namespace, well defined units, etc.
So we are stuck between extending new instrumentations/ rectifying' the existing instrumentations with bad semantics AND the recommended ones. Do let us know how the discussions go with this. As this will be applicable in general, not just here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Author

@harshitrjpt harshitrjpt Oct 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jack-berg Thanks. I think this does address the requirement. I tried finding if something already exists for exporter in general, as this is a generic need for any kind of exporter not just BatchLogExporter.
I enabled 'OTEL_EXPORTER_METRICS_ENABLED' and got this output. Let me check with the original reporter of the issue.

ScopeMetrics #2
ScopeMetrics SchemaURL:
InstrumentationScope io.opentelemetry.exporters.otlp-grpc
Metric #0
Descriptor:
     -> Name: otlp.exporter.exported
     -> Description:
     -> Unit:
     -> DataType: Sum
     -> IsMonotonic: true
     -> AggregationTemporality: Cumulative
NumberDataPoints #0
Data point attributes:
     -> success: Bool(false)
     -> type: Str(log)
StartTimestamp: 2024-10-14 10:06:54.40763 +0000 UTC
Timestamp: 2024-10-14 10:10:54.418291 +0000 UTC
Value: 9
Metric #1
Descriptor:
     -> Name: otlp.exporter.seen
     -> Description:
     -> Unit:
     -> DataType: Sum
     -> IsMonotonic: true
     -> AggregationTemporality: Cumulative
NumberDataPoints #0
Data point attributes:
     -> type: Str(log)
StartTimestamp: 2024-10-14 10:06:54.40763 +0000 UTC
Timestamp: 2024-10-14 10:10:54.418291 +0000 UTC
Value: 9

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These metrics should be enabled by default if using autoconfigure. Note if not using autoconfigure, you need to carefully order the initialization so that the configured meter provider can be passed to the OTLP exporters for spans and logs to collect internal telemetry.

I'm not sure what OTEL_EXPORTER_METRICS_ENABLED is a reference to. Its not a property that's used in this repository.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My bad, I didn't backup the entire collector logs and misinterpreted that these metrics need to be enabled. These are present by default.

.setUnit("1")
.setDescription("Logs export failure in BatchLogRecordProcessor.")
.build();
droppedAttrs =
Attributes.of(
LOG_RECORD_PROCESSOR_TYPE_LABEL,
Expand All @@ -209,6 +217,8 @@ private Worker(
LOG_RECORD_PROCESSOR_TYPE_VALUE,
LOG_RECORD_PROCESSOR_DROPPED_LABEL,
false);
exportFailureAttrs =
Attributes.of(LOG_RECORD_PROCESSOR_TYPE_LABEL, LOG_RECORD_PROCESSOR_TYPE_VALUE);

this.batch = new ArrayList<>(this.maxExportBatchSize);
}
Expand Down Expand Up @@ -324,6 +334,7 @@ private void exportCurrentBatch() {
processedLogsCounter.add(batch.size(), exportedAttrs);
} else {
logger.log(Level.FINE, "Exporter failed");
logsExportFailureCounter.add(1, exportFailureAttrs);
}
} catch (RuntimeException e) {
logger.log(Level.WARNING, "Exporter threw an Exception", e);
Expand Down
Loading