Skip to content
This repository has been archived by the owner on Jul 11, 2022. It is now read-only.

3.8.0 doesn't report spans in Python 3 #137

Closed
worrel opened this issue Mar 8, 2018 · 16 comments · Fixed by #154
Closed

3.8.0 doesn't report spans in Python 3 #137

worrel opened this issue Mar 8, 2018 · 16 comments · Fixed by #154

Comments

@worrel
Copy link

worrel commented Mar 8, 2018

I have a set of apps using the Go and Python jaeger client libs and the jaeger all-in-one docker image. They're all running in a docker-compose stack - so the apps can see the jaeger container & reach it via its hostname 'jaeger'. I also have the jaeger ports exposed on localhost so I can run sanity checks on the python code locally (e.g. jaeger all-in-one's docker ps looks like 0.0.0.0:5775->5775/udp, 5778/tcp, 0.0.0.0:6831-6832->6831-6832/udp, 14268/tcp, 0.0.0.0:16686->16686/tcp)

With 3.7.1 I was able to report spans correctly - running the python code locally while relying on the default reporter config (localhost:5775 in 3.7.1) and in docker (by setting local_agent/reporting_host to 'jaeger').

However, I had to upgrade to 3.8.0 to propagate spans from the Go app to Python app (because we're on Python 3 & 'extract' was relying on a Python 2 collection check. In 3.8.0 the default reporter config is localhost:6831.

On 3.8.0 I'm unable to get any spans reported correctly from Python app, whether locally (relying on default config) or in docker (setting reporting_host to 'jaeger'). I'm also unable to get it to work changing the port back to 5775 (both locally or in docker).

The logger still prints that it's reporting spans e.g.

2018-03-08 12:23:23,226 Reporting span 346237e4f6ff27ab:346237e4f6ff27ab:0:1.....

but nothing makes it to the collector.

I see elsewhere that (#59) that Python3 is perhaps not supported - is that the root cause here? It seems like 3.8.0 was trying to address that somewhat. Is there anything else I need to change when upgrading from 3.7.1 to 3.8.0?

@yurishkuro
Copy link
Member

should have mentioned this in the release notes - the new version finally has support for sending native jaeger.thrift format, that goes to a different port

DEFAULT_REPORTING_PORT = 6831

@yurishkuro
Copy link
Member

Please let me know if this solves the issue.

@worrel
Copy link
Author

worrel commented Mar 8, 2018

Hi @yurishkuro! I figured the port change meant something like that, but it wasn't clear whether I have to make any other config changes to have Python client output jaeger thrift, or should it just work?

As I said above, I'm using jaeger all-in-one docker image with both the 5775 zipkin thrift and 6831 jaeger thrift ports exposed. But in 3.8.0, regardless of whether I leave config as default, or explicitly set the port to either 5775 or 6831, nothing gets shown in the Jaeger UI (whereas in 3.7.1 I see spans recorded when using 5775).

@yurishkuro
Copy link
Member

I just ran all-in-one

$ go run ./cmd/standalone/main.go

and ran the tutorial

$ python -m lesson01.solution.hello Bryan
Initializing Jaeger Tracer with UDP reporter
Using sampler ConstSampler(True)
opentracing.tracer initialized to <jaeger_client.tracer.Tracer object at 0x105904610>[app_name=hello-world]
Hello, Bryan!
Reporting span ef0a8a10e78fb8dc:ef0a8a10e78fb8dc:0:1 hello-world.say-hello

The span was reported correctly
image

There is no special configuration for the tracer: https://github.com/yurishkuro/opentracing-tutorial/blob/master/python/lib/tracing.py

@worrel
Copy link
Author

worrel commented Mar 8, 2018

@yurishkuro is that using Python 2 or 3? I suspect this might be the issue. I'm using Python 3.

If I run with 3.7.1 and in a separate terminal do tshark -f udp I see something like

7  1.952025    127.0.0.1 → 127.0.0.1    UDP 1070 63169 → 5775 Len=1038
8  18.404288    127.0.0.1 → 127.0.0.1    UDP 1084 63388 → 5775 Len=1052

and correspondingly the spans make it to the collector/UI.

If I run with 3.8.0 (changing the port to 6831), I see no UDP packets logged at all

@yurishkuro
Copy link
Member

I ran with python 2. We do not officially support Python 3, and don't test with it, so a regression is possible (but unfortunate). If you are able to troubleshoot this with py3 it would be very helpful.

@worrel
Copy link
Author

worrel commented Mar 9, 2018

One thing I noticed was a comment in setup.py about pinning the thrift version which I wasn't doing. So I set it to 0.9.2 (also tried 0.9.3), because I was getting 0.11 previously. That didn't make a difference though.

With println debugging I can see that the basic steps in reporter.py seem to happen indentically. However, I do see that the TUDPTransport.write() is never called in 3.8.0 while it is in 3.7.1. I'll keep trying to narrow down why that is.

@worrel
Copy link
Author

worrel commented Mar 9, 2018

@yurishkuro I'm made progress: I found that send_emitBatch never seemed to result in a call to TUDPTransport.write() as it should. It appears the Tag keys and values are coming in as byte strings e.g. (from my added debug logging)

in: Tag.write
struct write
write key b'jaeger.version'

and that causes the oprot.writeString calls to choke (no error is thrown or logged, but processing of the batch silently stops). This is from https://github.com/jaegertracing/jaeger-client-python/blob/master/jaeger_client/thrift.py#L55 . I'm not clear on why the one branch of the conditional returns bytes & the other a string - but I'm not familiar with Python 2 & python unicode handling in general.

If that method has to stay the way it is, we need to decode in write e.g.:

oprot.writeString(self.key.decode('utf-8'))

which enables it to get further.

There's an additional wrinkle: Later on in the set of tags it's trying to write the Const sampler tag with its "decision" value, which is a boolean, though it looks like the assumption is made in thrift.py that all tags are string tags. I edited https://github.com/jaegertracing/jaeger-client-python/blob/master/jaeger_client/sampler.py#L90 to wrap decision as str(decision) and that finally allows the batch to send properly.

I'd send the above as a PR but I'm not clear if they're the right fixes, nor what impact any changes would have on the Python 2 flow.

@worrel
Copy link
Author

worrel commented Mar 9, 2018

I should also add that the thrift version changes I mentioned above were a bad idea. Setting thrift to 0.9.2 or 0.9.3 actually seemed to break 3.7.1 for me. I didn't notice this the first time I made the change because I forgot to pip install the changes to the requirements file. I've gone back to explicitly using latest thrift (0.11) and that's working for 3.8.0 and 3.7.1 (on Python 3)

@worrel worrel changed the title 3.8.0 doesn't report spans 3.8.0 doesn't report spans in Python 3 Mar 9, 2018
@fox91
Copy link

fox91 commented Mar 12, 2018

Same problem for me with jaeger-client==3.8.0 and Python 3.5.2.
With jaeger-client==3.7.1 works fine.

@trondhindenes
Copy link
Contributor

trondhindenes commented Apr 2, 2018

I can't get a python3 (flask) app to run regardless of of the jager-client/opentracing versions.
In the exercises, the "'formatter" flask app always chokes on this:
span_ctx = tracer.extract(Format.HTTP_HEADERS, request.headers) with the error
opentracing.propagation.InvalidCarrierException: carrier not a collection

(everything works perfectly fine in Python2)

Desperately awaiting python3 support, all our code is based on python3.5 or 3.6.

@bohea
Copy link

bohea commented Apr 4, 2018

thrift 0.9.3 don't support python3

@yurishkuro
Copy link
Member

@bohea jaeger does not require you to use 0.9.3, I believe it works even if you pick a higher version that is py3 compatible (that's how people were able to run it before on py3).

@nziebart
Copy link
Contributor

I ran into an error today with python 3.6. It was due to this line: https://github.com/jaegertracing/jaeger-client-python/blob/master/jaeger_client/thrift.py#L55

In python3, the encode() function returns a bytes literal, which then causes the following error in thrift, which is expecting a str:

...
  File "/Users/nziebart/.virtualenvs/nummus/lib/python3.6/site-packages/jaeger_client/thrift_gen/jaeger/ttypes.py", line 146, in write
    oprot.writeString(self.key)
  File "/Users/nziebart/.virtualenvs/nummus/lib/python3.6/site-packages/thrift/protocol/TProtocol.py", line 121, in writeString
    self.writeBinary(str_to_binary(str_val))
  File "/Users/nziebart/.virtualenvs/nummus/lib/python3.6/site-packages/thrift/compat.py", line 41, in str_to_binary
    return bytes(str_val, 'utf8')
TypeError: encoding without a string argument

@nziebart
Copy link
Contributor

Was able to fix this by replacing the _to_string function linked above with:

        if six.PY2 and isinstance(s, unicode):
            return s.encode('utf8')
        else:
            return str(s)

After this I was able to report spans successfully. Will open a separate issue

@yurishkuro
Copy link
Member

Should be fixed by #154

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
6 participants