Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support to record errors from lambda handlers #4677

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

austindrenski
Copy link

Currently, otellambda doesn't update the instrumentation-provided span based on the result of the wrapped handler.

I was surprised by this, as most of my OTel instrumentation work involves open-telemetry/opentelemetry-dotnet, so this may well be a sensible default for golang, but given the breadth and scale of the OTel projects, I figured this might also just be an oversight, so wanted to open a PR and contribute some code.

This PR adds two config methods:

  1. WithRecordError(bool) which instructs the wrappedHandler to call span.RecordError(...) if wrappedHandler.Handler.Invoke(...) returns an error.
  2. WithSetStatus(bool) which instructs the wrappedHandler to call span.SetStatus(...) if wrappedHandler.Handler.Invoke(...) returns an error.

Off-topic, but in the dotnet OTel libraries, it's also common to provide enrichment delegates which can be used to filter/manipulate the instrumentation-provided span in other ways (e.g. setting custom attributes/tags based on the exception being thrown). I considered adding similar functionality here, but decided to hold off on doing anything more until the code owners have had an opportunity to weigh in.

Background

based on go.opentelemetry.io/contrib/instrumentation/github.com/aws/aws-lambda-go/otellambda v0.46.1

sanitized screencap of Datadog showing a trace's flame graph and error list

sanitized screencap of Datadog showing an APM error graph showing a 0% error rate before and a non-zero error rate after

Sample code based on version 0.0.112 in the above screencap

func main() {
	lambda.Start(otellambda.InstrumentHandler(handle))
}

func handle(ctx context.Context, data []byte) error {
	ctx, span := trace.SpanFromContext(ctx).TracerProvider().Tracer("...").Start(ctx, "...")

	// lots of internal logic
	// ...
	// err := ...

	if err != nil {
		span.RecordError(err)
		span.SetStatus(codes.Error, err.Error())
		return err
	}

	return nil
}

Sample code based on version 0.0.113 in the above screencap

Note that our handler is no longer creating its own internal span. We can work around this by defining our own wrappingHandler, but the path of least resistance is to just work with the instrumentation-provided span.

func main() {
	lambda.Start(otellambda.InstrumentHandler(handle))
}

func handle(ctx context.Context, data []byte) error {
-	ctx, span := trace.SpanFromContext(ctx).TracerProvider().Tracer("...").Start(ctx, "...")
+	span := trace.SpanFromContext(ctx)

	// lots of internal logic
	// ...
	// err := ...

	if err != nil {
		span.RecordError(err)
		span.SetStatus(codes.Error, err.Error())
		return err
	}

	return nil
}

Sample code based on this patch

Now we can create an internal span in our handler and choose whether errors recorded in this layer should also be passed back up to the instrumentation-provided span to be recorded there as well.

(Example here might be overly simplified, but underlying all of the required obfuscation there is a real-world business case for this. I can provide more details upon request if required.)

func main() {
-	lambda.Start(otellambda.InstrumentHandler(handle))
+	lambda.Start(otellambda.InstrumentHandler(handle, otellambda.WithRecordError(true), otellambda.WithSetStatus(true)))
}

func handle(ctx context.Context, data []byte) error {
-	span := trace.SpanFromContext(ctx)
+	ctx, span := trace.SpanFromContext(ctx).TracerProvider().Tracer("...").Start(ctx, "...")

	// lots of internal logic
	// ...
	// err := ...

	if err != nil {
		span.RecordError(err)
		span.SetStatus(codes.Error, err.Error())
		return err
	}

	return nil
}

@austindrenski austindrenski requested a review from a team December 6, 2023 14:55
@austindrenski austindrenski force-pushed the aws-lambda-go-with-handler-set-error-on-error branch from 0257125 to 01f0f99 Compare December 6, 2023 14:56
@dmathieu
Copy link
Member

dmathieu commented Dec 6, 2023

Why would someone explicitly not want to record errors? (do we need the option? How about recording the error all the time?)

@austindrenski
Copy link
Author

Why would someone explicitly not want to record errors? (do we need the option? How about recording the error all the time?)

Great question. I think there are two parts to the answer:

  1. IMO we could elide WithSetStatus(...) and take the stance that otellambda will always call span.SetStatus(codes.Error, err.Error()) if the wrapped handler returns an error.

    That's actually what I wrote initially, treating it was an outright bug to not propagate the error status to the parent span (it certainly caused me a headache in production!), but I eventually pulled back and hid it behind a config option to avoid breaking any legitimate workflows (doubtful as I am that they even exist).

    That said, I'm more than happy to elide WitSetStatus(...) if everyone's amenable. Would indeed offer a better happy path out-of-the-box.

  2. However, it's a slightly different story for WithRecordError(...). At least in the dotnet world, there's an effort to avoid adding exception events to a span without user opt-in.

    There are better explanations out there, but IIRC they mostly boil down to (A) more span events == more data to export, (B) recording an exception == serialization == allocations.

    (Should note that dotnet exceptions by default include the message + the stack trace, so recording an exception as a span event can have a significant impact on span size. This might matter less in golang where err.Error() seems to (often? always?) return just the error message.)

@pellared
Copy link
Member

pellared commented Dec 7, 2023

Does this comment applies to this PR ?

How about simply adding span.SetStatus(codes.Error, err.Error()) when an error is returned?

I think this is also inline with #4677 (comment).

@austindrenski
Copy link
Author

Does this comment applies to this PR ?

I've had some real-world use cases where we wanted to record that an error occurred without setting the span status to error (or returning an error for that matter).

If you'll allow a super contrived example, imagine you're (for whatever reason) hand-rolling HTTP backoff-and-retry code wherein you may want to record the individual errors that occurred caused each of the eventual retries.

I think the value proposition is clearer in languages where the SDK opts to include stack traces by default, but for golang, it looks like we have to explicitly opt-in each time:

span.RecordError(err, trace.WithStackTrace(true))

which now makes me think we should just expose this at the same time, e.g.,

func WithRecordError(recordError bool, recordStackTrace bool) Option {
	return optionFunc(func(c *config) {
		c.RecordError = recordError
		c.RecordStackTrace = recordStackTrace
	})
}
	response, err := h.handler.Invoke(ctx, payload)
	if err != nil {
		if h.instrumentor.configuration.RecordError {
-			span.RecordError(err)
+			span.RecordError(err, span.WithStackTrace(h.instrumentation.RecordStackTrace))
		}
		if h.instrumentor.configuration.SetError {
			span.SetStatus(codes.Error, err.Error())
		}
		return nil, err
	}

How about simply adding span.SetStatus(codes.Error, err.Error()) when an error is returned?

I think this is also inline with #4677 (comment).

I'm definitely in favor of just calling span.SetStatus(codes.Error, err.Error()) by default. Since no one else has chimed in opposed to it so far, I'll go ahead and update the PR to reflect that change.

// error returned by the wrapped lambda on the instrumentation-provided span.
//
// By default, returned errors are not recorded.
func WithRecordError(recordError bool, recordStackTrace bool) Option {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be two separate options? We don't typically accept multiple distinct configuration values in options and I don't think there's any need to check whether c.RecordError is set when setting c.RecordStackTrace as the stack trace config is only ever used when the instrumentation is configured to record errors.

if h.instrumentor.configuration.RecordError {
span.RecordError(err, trace.WithStackTrace(h.instrumentor.configuration.RecordStackTrace))
}
span.SetStatus(codes.Error, err.Error())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure this is correct. There are situations where an error may be returned from the handler but that should not result in the span status being set to error. For instance, if the handler is supporting an HTTP-based service and an error would indicate a 4xx status code, the span status MUST be left unset. Similarly, the status message shouldn't be set when it can be inferred from other data such as the HTTP status code.

Given that the instrumented handler has access to this span and can set the status as appropriate I would omit setting it here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants