-
Notifications
You must be signed in to change notification settings - Fork 192
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support other propagators in Python Layer #124
Support other propagators in Python Layer #124
Conversation
parent_context = get_global_textmap().extract( | ||
lambda_event.get("headers", {}).update( | ||
{"X-Amzn-Trace-Id": environ.get("_X_AMZN_TRACE_ID", "")} | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How will this work if you have both TraceContextPropagator
and AwsPropagator
? Will the context be a fix of both? With 2 spans?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You should always use the xray propagator format when reading from the environment variable, and use the configured propagator if reading from HTTP headers when the environment variable wasn't sampled
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm I think I've seen in other Python code that they're able to do just get_global_textmap().extract(headers)
. But we can't do that and have the behavior you mentioned if the user sets something like OTEL_PROPAGATORS=xray,tracecontext
for their environment variables.
Maybe the answer is to read the OTEL_PROPAGATORS
variable directly and instantiate the propagators? But that doesn't feel right... Would probably need to investigate the behavior of when we have multiple propagators.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The env variable only ever fills in with AWS format so we can't really rely on configured propagator to read from it. In other languages we directly reference the awsxraypropagator through code without using the global can't you do that here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes that's what we intend to do here! We first try the XRay
propagator, but if that fails, we try the global propagator. That's what I interpret the specs on how to parent a Lambda trace saying we should do?
After updating the PR it looks like this:
parent_context = None
xray_env_var = os.environ.get("_X_AMZN_TRACE_ID")
if xray_env_var:
parent_context = AwsXRayFormat().extract(
{"X-Amzn-Trace-Id": xray_env_var}
)
event_span_attributes = {}
if not parent_context or not parent_context.trace_flags.sampled:
lambda_event = args[0]
# ... Try the global propagator
lambda_event, lambda_context = args[0], args[1] | ||
|
||
parent_context = get_global_textmap().extract( | ||
lambda_event.get("headers", {}).update( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we know headers will always be located at lambda_event.headers. If a spec guaranteed all lambda entry points follow this convention we could do this, but I don't believe such a spec exists. See how eventContextExtractor in JS Instrumentation handles this by requiring user to provide header extraction function. Note JS requires user to fully extract into context which is unnecessary for here since you are using global propagator to extract. Here could require user to just return headers from event - similar to WIP Golang instrumentation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm I see what you're saying. Yeah this guide I found online gives an example on how to populate "headers"
in API Gateway, but it's not a general solution.
I think we could do something like JS for the manual instrumentation case, but not the auto instrumentation case. In auto instrumentation the AwsLambdaInstrumentor
will run without hooks and later on it will be too late to add the hooks.
So either we force users to manual instrument AwsLambdaInstrumentor
in their code before their lambda_handler
, or we define a mapping "ourselves" here as events["headers"]
and explain to users how they can add them...
I think we can add the hooks in a subsequent PR, the question here is just do we want to give users the expectation that we will looks for HTTP headers in the event["headers"]
? I think it could for us even though there is no spec, as long as we document it somewhere here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So event["headers"]
may or may not be pointing into the actual customer event. If you invoke lambda directly (cli/sdk) event
is your custom event type so yes we could point to one place the header must be, however, if lambda is invoked via some other AWS service (API Gateway, SQS, etc) event
is some AWS specified type and your custom event is within that type somewhere.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
First of all, many thanks for looking into this.
My 2c:
Lambda handlers always accept 2 arguments: event and context. Event is the message delivered by a Carrier
(API GW, SQS, etc) and should have both, data and metadata about the message that can include the context. Propagators define the rules about how the context should be propagated with a particular Carrier
.
So, it should be safe to assume that event
should contain the required metadata with the context. As the default implementation, it should be safe to assume that event["headers"]
will contain the context data and it may be 95% of all cases where people use API-GW and SQS triggered lambdas. In other cases, the users can provide their own extractor and injector implementations, as JS lambda implementation does.
In the case where there is no data in the message (i.e. event["header"]
) that carries the context for the defined propagator, it should default to the span with no parent and start a new trace, as opposed to defaulting to AWS X-Ray propagator.
Also, specifying AWS X-Ray propagator should be a choice, i.e. the user should explicitly define it in OTEL_PROPAGATORS=aws_xray
env variable.
Hope it makes sense.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess we need to find out the different branching paths for what an "event could be". For now, I followed the spec where it says how to determine the parent span in Lambda to guide a "guess" for what API Gateway would look like, but I haven't tested it so I wouldn't be comfortable merging this just yet. But if we test it and it works (In the X-Ray case, the not X-Ray case, and the not any propagator case), and people agree with it, we can make this incremental improvement.
@michaelhyatt in trying to follow this spec on how to determine the parent span in Lambda I made several updates to this PR. It still requires serious testing though:
- test with API Gateway configured in its 2 possible forms
- test when multiple combinations of propagators are set via
OTEL_PROPAGATORS
I can't test it this week but if we get a chance to test it in the near future and people are okay with checking event["headers"]
as a default we can probably make this incremental improvement to unblock you.
It makes sense @NathanielRN The only thing I wanted to add is highlight the behaviour when
|
I didn't see the document mention |
I was under the impression that I interpreted the requirement for X-Ray tracing to be switched off as turning the |
Thanks for that prompt! I did some investigation and found out that you're right about this, so I made a PR in the specifications to follow up on this: open-telemetry/opentelemetry-specification#1867
Yes I agree with you that sounds like the ideal customer experience! |
@NathanielRN I am curious if the fix made it into |
@michaelhyatt Sorry about that, yes this change hasn't been merged in. Haven't had bandwidth to test this solution even though I think it should work, once it gets tested by publishing a built version of this layer we can merge this change. |
5544d57
to
38bf31c
Compare
38bf31c
to
1e56587
Compare
1e56587
to
6d55f27
Compare
9ee1abe
to
e625603
Compare
e625603
to
da03b2f
Compare
Hey all, I've updated this PR to have the minimal changes we would need to support this feature 🙂 With these changes, we can parent the context of subsequent spans using the HTTP headers in the lambda event. Further, using Tests have also been added to validate this. Please let me know what you think! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, just some small points
|
||
Args: | ||
**kwargs: Optional arguments | ||
``tracer_provider``: a TracerProvider, defaults to global | ||
``event_context_extractor``: a method which takes the Lambda | ||
Event as input and provides the object which contains the | ||
context as output (usually HTTP headers) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIUC, it returns the Context itself, not HTTP headers, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for that catch! This was outdated, I've updated it now 🙂
): | ||
return parent_context | ||
|
||
logger.debug( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
X-Ray being disabled isn't a failure so we shouldn't log it as such. I guess we don't need this log message anyways
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought it could be useful as a "debug" log and so labeled it as such, but I agree it's not necessary so I'll remove it! 🙂
python/src/otel/otel_sdk/opentelemetry/instrumentation/aws_lambda/__init__.py
Outdated
Show resolved
Hide resolved
python/src/otel/otel_sdk/opentelemetry/instrumentation/aws_lambda/__init__.py
Outdated
Show resolved
Hide resolved
try: | ||
headers = lambda_event["headers"] | ||
except (TypeError, KeyError): | ||
logger.warning("Failed to extract context from Lambda Event.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can add more on what a user should do if they see this error. Either X-Ray needs to be enabled or the function must be serving API Gateway as a HTTP proxy. Otherwise, this instrumentation should not be included. Something like that
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That makes sense to me. What do you think about
"Extracting context from Lambda Event failed: either enable X-Ray active tracing or configure API Gateway to trigger this Lambda function as a pure proxy. Otherwise, generated spans will have an invalid (empty) parent context."
? I'm wondering if there's a use case where someone would want instrumentation and they're okay with the empty parent context. That's why I though "instrumentation should not be included" to not fit as well.
As does the warning
level makes sense?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm - I guess there is a use case where the function does not use X-Ray, and is not API Gateway but meant to always be a root - still it would be a bit weird to have no ability to parent at all but it's conceivable. Your message seems fine then, and yeah it should be debug level if there is a conceivable use case for it rather than always being in error.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah like if they just trigger it from the console is the easiest case I can think of. Maybe that Lambda triggers another Lambda 😝
I've made the updates 👍
…bda/__init__.py Co-authored-by: Anuraag Agrawal <[email protected]>
efe66e7
to
930d98c
Compare
…bda/__init__.py Co-authored-by: Anuraag Agrawal <[email protected]>
We want to support other propagators besides the X-Ray propagator in the opentelemetry python lambda layer.
These changes allow that, by making the
AwsLambdaInstrumentor
class_X_AMZN_TRACE_ID
environment variable FIRST and then, if that fails, from the Lambda event HTTP headers (either using the defaultevent_context_extractor
or the user provided one)event_context_extractor
uses the global propagator to extract and parent spans with the context in the HTTP headersFixes #123