-
Notifications
You must be signed in to change notification settings - Fork 599
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
gRPC client + botocore + datadog exporter instrumentation error #262
Comments
Occasionally not all of the headers available to botocore are strings. It does allow this, using the `add_header` method rather than `__setitem__`. According to the docs for `botocore.compat.HTTPHeaders`: ``` If a parameter value contains non-ASCII characters it can be specified as a three-tuple of (charset, language, value), in which case it will be encoded according to RFC2231 rules. Otherwise it will be encoded using the utf-8 charset and a language of '' ``` Whereas using `__setitem__` does not do this conversion. Fixes open-telemetry#262
Hi! I think some more information would be helpful there. Specifically:
|
Those are all things I was wondering too - all tests I tried so far ended up showing only strings here, but there was still an exception being thrown at some point, and just printing the I was mostly using the Datadog exporter and propagator, but I've also been testing with Jaeger and had the same result. My guess was not the propagator but the gRPC client adding a bad header - I've been working on that (see #269) but don't see anything there that'd be strange. Do you have any suggestions on how to get the information you're looking for? Absolutely happy to keep debugging, but I'm not sure where to start. |
Can this be reproduced locally? if so, you can add a pdb statement or add print statements to the code path in the traceback (usually in site-packages/...) If this is only reproduced deployed somewhere, You could potentially add some additional log / print statements in your application code. In your instrumented grpc service, before you call dynamodb, you can do:
That will at least print all of the state of the span, which we can use to look through the propagator code to see how that would be converted to a header value.
Got it. taking a quick look through those, I don't see a code path that would leave to something invalid. I agree that maybe something funky is being introduced. Curious: have you also tried this with opentelemetry disabled / removed? wondering if this is an external issue. The traceback is fairly generic with regards to just adding headers. |
Oh I didn't think of printing the span out, let me try that. I'm running the two services locally, so I can hack whatever I need to add debugging. I did try a bunch of printing, but couldn't get anything useful before the exception was raised.
The actual code is stuff we use in production, I've just been trying to add opentelemetry to it, but keep running into parts that aren't working well or unimplemented (see my other PRs) - so I know the code itself works without instrumentation, and as I said in the initial report, leaving off either the gRPC client instrumentation or the botocore instrumentation in the target server makes the bug go away. It's definitely a quirk of this specific combination, I'm still betting the gRPC client somehow, but I'm just not seeing it yet. Thanks for the suggestion - I'll keep digging! |
Have you tried adding a pdb statement to this snippet, within a try except?
You should be able to extract all the information you need from that pdb session. prints work too, just a bit more time consuming. |
So here I am, going back to work on this issue.... and I'm having trouble recreating it. I'm working with a branch pulled off master with the other PRs I've got open (#260, #261, #269) with fixes on gRPC stuff merged in and thus far I haven't seen it. So maybe it was because I had a bit of a convoluted branch in order to continue to make progress while PRs were still open. I'll do some more testing just to be sure. |
Sounds good! Feel free to update whenever you have something (or not). |
I figured out the root cause - thanks @toumorokoshi - that |
I found what's going on, it's here: https://github.com/open-telemetry/opentelemetry-python-contrib/blob/master/exporter/opentelemetry-exporter-datadog/src/opentelemetry/exporter/datadog/propagator.py#L102 - the value at this key can be |
Yeah that's the best I've got - the span looks like this:
If I skip setting that value on the propagator's |
This is often `None`, but tags are always strings, and so things get broken when spans get passed along to other client calls. Fixes open-telemetry#262
This is often `None`, but tags are always strings, and so things get broken when spans get passed along to other client calls. Fixes open-telemetry#262
This is often `None`, but tags are always strings, and so things get broken when spans get passed along to other client calls. Fixes open-telemetry#262
This is often `None`, but tags are always strings, and so things get broken when spans get passed along to other client calls. Fixes open-telemetry#262
This is often `None`, but tags are always strings, and so things get broken when spans get passed along to other client calls. Fixes open-telemetry#262
Been digging a lot. I think I've narrowed down this issue, but I'm not entirely sure what the intent is. My test case (which duplicates how I use this in production) is all gRPC-based: basically a client which makes a call on a service which then makes its own call to another service. That second call is going to have trace info from the first, and that's where the issue happens. The value of Seems that this code comes from this PR: open-telemetry/opentelemetry-python#705 and there was much discussion about it. Can someone explain what this is to do? A better question though might be what we should do on line 73 when this is
I believe this is the root of the issue I've been having here. Thoughts? |
Ah one more thing I should point out - the "another service" I mentioned earlier does a DynamoDB call |
Thanks for the digging! I think you did find an issue.
I believe that really is the crux here: we just need an if condition to add that field. TraceState should be a I'm took a look through our AWS instrumentations and I don't see any logic that would imply that TraceState would be serialized by boto anyway, but regardless that DataDog propagator should be fixed. The best thing would be to get a
There's something being added to the header there. |
Not sure this is very useful:
|
Looking through those, it;s unlikely that any of those variables caused the exception that you're encountering. Note the traceback:
Something in the "values" list is neither a string, nor a bytes object. I would just throw a try/except around that line in the urllib3/connect.py to print the variables, or the pdb statement so you can look:
|
This is often `None`, but tags are always strings, and so things get broken when spans get passed along to other client calls. Fixes open-telemetry#262
Ok let me catch you up to current: At this point, with the latest
The code which causes this is from this PR: open-telemetry/opentelemetry-python#705 . I'm pretty sure the value being set there is actually I'm going to close the PR I made for this, since it doesn't fix anything at this point, but there IS still a bug here, I'm just not sure where it is yet, perhaps @majorgreys might know more? |
Hey @alertedsnake! I've filed a PR for the datadog issue: it's not causing the problem you're seeing with boto: TraceState will not accept a "None" value, and the result is that For understanding the boto error, I'd refer to my suggestion above. |
That looks a lot like the PR for this which I closed because it didn't fix the problem at the source, it just prevented it from propagating: If you guys think this is the best way to fix it, I guess that works for me. |
This is often `None`, but tags are always strings, and so things get broken when spans get passed along to other client calls. Fixes open-telemetry#262
This is often `None`, but tags are always strings, and so things get broken when spans get passed along to other client calls. Fixes open-telemetry#262
This one is a bit complex - when using the gRPC client instrumentation to make a call to an instrumented gRPC service that then makes a call via boto to the DynamoDB API, the combination of instrumentation leads to this error:
Without either client instrumentation, this call works perfectly.
Steps to reproduce
As described above - I don't have a simplified sample, but I'll try to produce one.
What is the expected behavior?
No exception :)
What is the actual behavior?
The above exception.
Additional context
I'm definitely working on this, but any suggestions would certainly be welcome, I'm not quite sure what the invalid header is or where it might get set.
The text was updated successfully, but these errors were encountered: