Add support to record errors from lambda handlers #4677

austindrenski · 2023-12-06T14:55:42Z

Currently, otellambda doesn't update the instrumentation-provided span based on the result of the wrapped handler.

I was surprised by this, as most of my OTel instrumentation work involves open-telemetry/opentelemetry-dotnet, so this may well be a sensible default for golang, but given the breadth and scale of the OTel projects, I figured this might also just be an oversight, so wanted to open a PR and contribute some code.

This PR adds two config methods:

WithRecordError(bool) which instructs the wrappedHandler to call span.RecordError(...) if wrappedHandler.Handler.Invoke(...) returns an error.
WithSetStatus(bool) which instructs the wrappedHandler to call span.SetStatus(...) if wrappedHandler.Handler.Invoke(...) returns an error.

Off-topic, but in the dotnet OTel libraries, it's also common to provide enrichment delegates which can be used to filter/manipulate the instrumentation-provided span in other ways (e.g. setting custom attributes/tags based on the exception being thrown). I considered adding similar functionality here, but decided to hold off on doing anything more until the code owners have had an opportunity to weigh in.

Background

based on go.opentelemetry.io/contrib/instrumentation/github.com/aws/aws-lambda-go/otellambda v0.46.1

Sample code based on version `0.0.112` in the above screencap

func main() {
	lambda.Start(otellambda.InstrumentHandler(handle))
}

func handle(ctx context.Context, data []byte) error {
	ctx, span := trace.SpanFromContext(ctx).TracerProvider().Tracer("...").Start(ctx, "...")

	// lots of internal logic
	// ...
	// err := ...

	if err != nil {
		span.RecordError(err)
		span.SetStatus(codes.Error, err.Error())
		return err
	}

	return nil
}

Sample code based on version `0.0.113` in the above screencap

Note that our handler is no longer creating its own internal span. We can work around this by defining our own wrappingHandler, but the path of least resistance is to just work with the instrumentation-provided span.

func main() {
	lambda.Start(otellambda.InstrumentHandler(handle))
}

func handle(ctx context.Context, data []byte) error {
-	ctx, span := trace.SpanFromContext(ctx).TracerProvider().Tracer("...").Start(ctx, "...")
+	span := trace.SpanFromContext(ctx)

	// lots of internal logic
	// ...
	// err := ...

	if err != nil {
		span.RecordError(err)
		span.SetStatus(codes.Error, err.Error())
		return err
	}

	return nil
}

Sample code based on this patch

Now we can create an internal span in our handler and choose whether errors recorded in this layer should also be passed back up to the instrumentation-provided span to be recorded there as well.

(Example here might be overly simplified, but underlying all of the required obfuscation there is a real-world business case for this. I can provide more details upon request if required.)

func main() {
-	lambda.Start(otellambda.InstrumentHandler(handle))
+	lambda.Start(otellambda.InstrumentHandler(handle, otellambda.WithRecordError(true), otellambda.WithSetStatus(true)))
}

func handle(ctx context.Context, data []byte) error {
-	span := trace.SpanFromContext(ctx)
+	ctx, span := trace.SpanFromContext(ctx).TracerProvider().Tracer("...").Start(ctx, "...")

	// lots of internal logic
	// ...
	// err := ...

	if err != nil {
		span.RecordError(err)
		span.SetStatus(codes.Error, err.Error())
		return err
	}

	return nil
}

dmathieu · 2023-12-06T15:32:08Z

Why would someone explicitly not want to record errors? (do we need the option? How about recording the error all the time?)

austindrenski · 2023-12-06T16:32:15Z

Why would someone explicitly not want to record errors? (do we need the option? How about recording the error all the time?)

Great question. I think there are two parts to the answer:

IMO we could elide WithSetStatus(...) and take the stance that otellambda will always call span.SetStatus(codes.Error, err.Error()) if the wrapped handler returns an error.

That's actually what I wrote initially, treating it was an outright bug to not propagate the error status to the parent span (it certainly caused me a headache in production!), but I eventually pulled back and hid it behind a config option to avoid breaking any legitimate workflows (doubtful as I am that they even exist).

That said, I'm more than happy to elide WitSetStatus(...) if everyone's amenable. Would indeed offer a better happy path out-of-the-box.
However, it's a slightly different story for WithRecordError(...). At least in the dotnet world, there's an effort to avoid adding exception events to a span without user opt-in.

There are better explanations out there, but IIRC they mostly boil down to (A) more span events == more data to export, (B) recording an exception == serialization == allocations.

(Should note that dotnet exceptions by default include the message + the stack trace, so recording an exception as a span event can have a significant impact on span size. This might matter less in golang where err.Error() seems to (often? always?) return just the error message.)

pellared · 2023-12-07T20:33:43Z

Does this comment applies to this PR ?

How about simply adding span.SetStatus(codes.Error, err.Error()) when an error is returned?

I think this is also inline with #4677 (comment).

austindrenski · 2023-12-08T00:57:17Z

Does this comment applies to this PR ?

I've had some real-world use cases where we wanted to record that an error occurred without setting the span status to error (or returning an error for that matter).

If you'll allow a super contrived example, imagine you're (for whatever reason) hand-rolling HTTP backoff-and-retry code wherein you may want to record the individual errors that occurred caused each of the eventual retries.

I think the value proposition is clearer in languages where the SDK opts to include stack traces by default, but for golang, it looks like we have to explicitly opt-in each time:

span.RecordError(err, trace.WithStackTrace(true))

which now makes me think we should just expose this at the same time, e.g.,

func WithRecordError(recordError bool, recordStackTrace bool) Option {
	return optionFunc(func(c *config) {
		c.RecordError = recordError
		c.RecordStackTrace = recordStackTrace
	})
}

	response, err := h.handler.Invoke(ctx, payload)
	if err != nil {
		if h.instrumentor.configuration.RecordError {
-			span.RecordError(err)
+			span.RecordError(err, span.WithStackTrace(h.instrumentation.RecordStackTrace))
		}
		if h.instrumentor.configuration.SetError {
			span.SetStatus(codes.Error, err.Error())
		}
		return nil, err
	}

How about simply adding span.SetStatus(codes.Error, err.Error()) when an error is returned?

I think this is also inline with #4677 (comment).

I'm definitely in favor of just calling span.SetStatus(codes.Error, err.Error()) by default. Since no one else has chimed in opposed to it so far, I'll go ahead and update the PR to reflect that change.

Aneurysm9 · 2023-12-11T22:43:52Z

instrumentation/github.com/aws/aws-lambda-go/otellambda/config.go

+// error returned by the wrapped lambda on the instrumentation-provided span.
+//
+// By default, returned errors are not recorded.
+func WithRecordError(recordError bool, recordStackTrace bool) Option {


Should this be two separate options? We don't typically accept multiple distinct configuration values in options and I don't think there's any need to check whether c.RecordError is set when setting c.RecordStackTrace as the stack trace config is only ever used when the instrumentation is configured to record errors.

Aneurysm9 · 2023-12-11T22:51:33Z

instrumentation/github.com/aws/aws-lambda-go/otellambda/wrapHandler.go

+		if h.instrumentor.configuration.RecordError {
+			span.RecordError(err, trace.WithStackTrace(h.instrumentor.configuration.RecordStackTrace))
+		}
+		span.SetStatus(codes.Error, err.Error())


I'm not sure this is correct. There are situations where an error may be returned from the handler but that should not result in the span status being set to error. For instance, if the handler is supporting an HTTP-based service and an error would indicate a 4xx status code, the span status MUST be left unset. Similarly, the status message shouldn't be set when it can be inferred from other data such as the HTTP status code.

Given that the instrumented handler has access to this span and can set the status as appropriate I would omit setting it here.

austindrenski requested a review from Aneurysm9 as a code owner December 6, 2023 14:55

austindrenski requested a review from a team December 6, 2023 14:55

Add support to record errors from lambda handlers

01f0f99

austindrenski force-pushed the aws-lambda-go-with-handler-set-error-on-error branch from 0257125 to 01f0f99 Compare December 6, 2023 14:56

Set error status by default + expose stackrace

e9c44c3

Aneurysm9 reviewed Dec 11, 2023

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support to record errors from lambda handlers #4677

Add support to record errors from lambda handlers #4677

austindrenski commented Dec 6, 2023

dmathieu commented Dec 6, 2023

austindrenski commented Dec 6, 2023

pellared commented Dec 7, 2023

austindrenski commented Dec 8, 2023

Aneurysm9 Dec 11, 2023

Aneurysm9 Dec 11, 2023

Add support to record errors from lambda handlers #4677

Are you sure you want to change the base?

Add support to record errors from lambda handlers #4677

Conversation

austindrenski commented Dec 6, 2023

Background

Sample code based on version 0.0.112 in the above screencap

Sample code based on version 0.0.113 in the above screencap

Sample code based on this patch

dmathieu commented Dec 6, 2023

austindrenski commented Dec 6, 2023

pellared commented Dec 7, 2023

austindrenski commented Dec 8, 2023

Aneurysm9 Dec 11, 2023

Choose a reason for hiding this comment

Aneurysm9 Dec 11, 2023

Choose a reason for hiding this comment

Sample code based on version `0.0.112` in the above screencap

Sample code based on version `0.0.113` in the above screencap