Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update StatusCodes enum to comply with specs and another opentelemetry libraries. #1402

Closed
DariaPlotnikova opened this issue Nov 20, 2020 · 10 comments
Assignees
Labels
bug Something isn't working

Comments

@DariaPlotnikova
Copy link

Hi. I'm trying to implement opentelemetry tracing with jaeger+elastichsearch from python app using jaegertracing/jaeger-opentelemetry-* latest docker images. I found that if python jaeger client do nothing specific with correct spans (without errors) it comes to jaeger-agent with code 1 (see print of batch at the last line of code snippet), which is wrong according to opentelemetry spec (UNSET is the first, so it should have code 0).

Describe your environment
I use Python 3.5, Linux and such opentelemetry packages:

# pip freeze | grep opentelemetry
opentelemetry-api==0.15b0
opentelemetry-exporter-jaeger==0.15b0
opentelemetry-instrumentation==0.15b0
opentelemetry-instrumentation-requests==0.15b0
opentelemetry-sdk==0.15b0

Steps to reproduce

  1. Setup python env as above.
  2. Download repo with code to reproduce
  3. Run jaeger+elastic from docker-compose.yaml: docker-compose up -d jaeger-query jaeger-agent jaeger-collector kibana
  4. Run code snippet: python main.py

What is the expected behavior?

  1. span.status.status_code is equal to StatusCode.UNSET with code is 0 instead of 1.
  2. status_code tag in elasticsearch is equal to python enum.

What is the actual behavior?

  1. This leads to invalid interpretation of span status by jaeger - correct spans with default status (UNSET) are accepted with code 1, which is STATUS_CODE_OK, according to opentelemetry-collector enum:
# span tag in elasticsearch 
{
  "key": "status.code",
  "type": "string",
  "value": "STATUS_CODE_OK"
},

image

Additional context

  1. There is opentelemetry client for .net, there codes are the same as in specification: UNSET=0, OK=1, ERROR=2.
  2. If you insert dirty hack at the begining of main.py, jaeger UI will display spans as expected (OK in python will be OK in jaeger, UNSET will be UNSET):
# dirty hack
from opentelemetry.trace import status
from enum import Enum


class StatusCode(Enum):
    """Status codes according to opentelemtery spec."""

    UNSET = 0
    """The default status."""

    OK = 1
    """The operation has been validated by an Application developer or Operator to have completed successfully."""

    ERROR = 2
    """The operation contains an error."""

status.StatusCode = StatusCode

# main.py content
@DariaPlotnikova DariaPlotnikova added the bug Something isn't working label Nov 20, 2020
@lzchen lzchen added the release:required-for-ga To be resolved before GA release label Nov 23, 2020
@lzchen
Copy link
Contributor

lzchen commented Nov 30, 2020

This might not necessarily be a discrepancy from the specs. The specs merely says that these options should be available, not that there should be any enum ordering enforced. Some implementations may not even use enums. Go, javascript and java all have different ordering implementations.

Perhaps instead of showing the numerical value in the exported span tag in the exporter, we should use the name instead, so we don't rely on the the enum value.

@lzchen
Copy link
Contributor

lzchen commented Nov 30, 2020

@DariaPlotnikova
Also curious, how did the output "become" STATUS_CODE_OK"? What is the mechanism that is reading the status from the span and displaying this value?

@DariaPlotnikova
Copy link
Author

@lzchen,

The specs merely says that these options should be available, not that there should be any enum ordering enforced. Some implementations may not even use enums. Go, javascript and java all have different ordering implementations.

Looked at code links and yes, here codes are different from python. But how do this clients send status code to jaeger (as string representation or as number)? I'm agree that in fact it shouldn't have matter which code is used in language specific library, while this code isn't sent as number to another library with differ code interpretation.

Also curious, how did the output "become" STATUS_CODE_OK"? What is the mechanism that is reading the status from the span and displaying this value?

The main things is that in python opentelemetry-exporter-jaeger StatusCode is transformed to number and goes to jaeger-agent as 0 or 1 or 2:

  1. JaegerSpanExporter sets tag status.code to each span.
  2. Tag status.code is jaeger.Tag object of type LONG and numerical value 0 or 1 or 2.
  3. Exporter calls Batch to transform spans to jaeger-compatible form.
  4. During the outputting spans to jaeger tags are transformed inside Batch object as jaeger.Tag objects.
  5. jaeger.Tag.write() transform tag of type LONG as numerical value (which are 0 (OK), 1 (UNSET), 2 (ERROR) according to StatusCode enum).

Then "magic" happens in jaeger-agent, where they try to transform number back to code and got STATUS_CODE_OK instead of STATUS_CODE_UNSET, because pdata.StatusCode structure has 0-value for UNSET, 1 - for OK, 2 - ERROR.

I'm not abolutely sure that problem is in jaegertracing agent, cause I also have .net services and there everything is ok. Changing status codes in jaegertracing agent will affect every client library which sends statuses as number. This is why I created issue here. Maybe we can change somehow the manner which is used to output status code to jaeger (string representation instead of number).

@lzchen
Copy link
Contributor

lzchen commented Dec 1, 2020

@DariaPlotnikova
Thanks for the clarification.

Maybe we can change somehow the manner which is used to output status code to jaeger (string representation instead of number).

Yes this is the avenue that would be best to pursue: if possible change our exporter behaviour instead of enforcing ordering in the specs for this use case. The only outstanding question is what the jaeger-agent EXPECTS from the tag. This line specifically:

Then "magic" happens in jaeger-agent, where they try to transform number back to code and got STATUS_CODE_OK instead of STATUS_CODE_UNSET, because pdata.StatusCode structure has 0-value for UNSET, 1 - for OK, 2 - ERROR.

If the jaeger backend EXPECTS 0 - OK, 1 - UNSET, 2 - ERROR, this might be a specs issue. Is there any documentation on the behaviour of jaeger detecting status tags and translating them into statuses?

@DariaPlotnikova
Copy link
Author

Is there any documentation on the behaviour of jaeger detecting status tags and translating them into statuses?

I didn't find anything in docs. The only thing I found is code in repo, but I'm not so good in golang to be sure that jaeger can understand tag status.code as string instead of number. I'll try to clarify this by opening question-issue in their repo during the week.

@lzchen lzchen removed release:required-for-ga To be resolved before GA release 1.0.0rc1 labels Dec 10, 2020
@lzchen
Copy link
Contributor

lzchen commented Dec 10, 2020

We will probably just modify the behaviour of the jaeger exporter to account for this. We can either remove the statuscode from the tags, use a different tag that's not a number (just the string representation), or manually switch the enum values before export.

@lzchen
Copy link
Contributor

lzchen commented Dec 10, 2020

@DariaPlotnikova
Copy link
Author

We can either remove the statuscode from the tags

Tryed without status.code (commented this string) - jaeger understands spans correctly (both success and error). It works but in both cases there is no status.code tag at all:
image

I'm not sure is that ok, because accordind to spec, span status should be presented (set by default UNSET or set by library as ERROR/OK).

Anyway, I asked in jaegertracing gitter community, is it possible to send string representation of status code. Hope they will answer.

@lzchen
Copy link
Contributor

lzchen commented Dec 11, 2020

@DariaPlotnikova
Thanks for looking into this. From the specs we will be using a string representation instead.

@lzchen
Copy link
Contributor

lzchen commented Jan 4, 2021

Closing this in favour of #1474

@lzchen lzchen closed this as completed Jan 4, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants