Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Set status for ended spans #297

Merged
merged 1 commit into from
Dec 11, 2019
Merged

Set status for ended spans #297

merged 1 commit into from
Dec 11, 2019

Conversation

ocelotl
Copy link
Contributor

@ocelotl ocelotl commented Nov 19, 2019

Fixes #292

A few test issues pending still. Opening up this PR to receive comments and feedback, specially related to the implementation of the status classes and exceptions.

Copy link
Member

@Oberon00 Oberon00 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My main complaint is that you should not (need to) change the whole Status API in this PR. I think it is perfectly fine as it is now. See #297 (comment).

@@ -82,7 +84,7 @@ def succeeded(self, event: monitoring.CommandSucceededEvent):
span.set_attribute(
"db.mongo.duration_micros", event.duration_micros
)
span.set_status(Status(StatusCanonicalCode.OK, event.reply))
span.set_status(OkStatus())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do you discard the description here? This seems like a regression to me.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why updating pymongo integration at all this in the first place?, referenced issue you added is suggesting we do similar stuff as this code, having a default status of unknown when there is an error

Copy link
Contributor Author

@ocelotl ocelotl Nov 21, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ready, I added the description back. I understand how it can be valuable to have it to provide a better explanation of the status code, possibly to describe specific details of the code in its context.

Still, I am still changing the API because this PR changes the Status class from being instantiated with a particular code to having separated classes for every code itself. I believe this to be advantageous since when setting a status we go from this:

set_status(Status(StatusCanonicalCode.OK, "description")))

to this:

set_status(OkStatus("description"))

Nevertheless, this may or may not be the desired approach for this project, please let me know in the comments.

Another important thing about this PR is the proposed dynamic generation of the statuses and their matching exceptions. Instead of having all them coded by hand, an OrderedDict is being used to define them and they are dynamically added to the module automatically. This saves code and saves us from having to keep the exceptions and the statuses synchronized by hand.

@lzchen points out here that in OpenCensus the status seems to be derived from the exception thrown. This implementation attempts to do that, by providing status-related exceptions that can be raised in the span code. For example, raising a InvalidArgumentError("description") exception in the span code automatically sets the span status to InvalidArgumentStatus(description="description").

This may or may not be the desired approach for this project, please let me know in the comments.

Thanks!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Irrespective of the merit of these changes, it is just best practice to submit independent changes in independent pull requests, at least when a controversial change is among them. Please split this PR up into:

  • One that actually just automatically sets the status for ended spans on exception
  • One that changes the Status API if you still want to do that

IMHO, this PR is mostly a discussion of the latter so it would make sense to make a new PR for (just) the exception Status thing and remove these changes from this PR.

@@ -92,7 +94,7 @@ def failed(self, event: monitoring.CommandFailedEvent):
span.set_attribute(
"db.mongo.duration_micros", event.duration_micros
)
span.set_status(Status(StatusCanonicalCode.UNKNOWN, event.failure))
span.set_status(UnknownStatus())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here, event.failure could contain valuable information.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 hiding the actual error would be pretty bad

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, adding the description field back 👍

@@ -144,9 +146,13 @@ class SpanKind(enum.Enum):
CONSUMER = 4


class Span:
class Span(object):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are not Python2 compatible, pylint should issue an error here about useless object inheritance.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, too much Python2 compatible code has made that almost a reflex response in me, removing. 😅

"""A span represents a single operation within a trace."""

def __init__(self, auto_update_status=False):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't really like adding constructor arguments here. Maybe we move the behavior controlled by this into the SDK?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I moved it. Please let me know if this is what you were expecting.

self.status is None and
self._auto_update_status
):
if isinstance(exc_val, Status):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems strange to have an exception derived from Status. Python enforces that all raised objects must derive from BaseException, otherwise a TypeError is raised. If we want to be able to set a status by throwing a special value, we should throw some exception type that has a status, not one that is a status. But I'd rather not support that since throwing an OpenTelemetry-specific exception type would mix observability and application logic and we definitely don't want an application's behavior to change depending on the used API implementation or observability configuration.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, fixed that mistake. That being said, I acknowledge that the implementation as it is now is intended to allow the user to throw exceptions precisely in the way that you mention you don't want. I'm not trying to cause controversy here, is just that I feel like this implementation is what the issue submitter suggested, as OpenCensus does something similar 🙂

Again, this may or may not be the desired approach, please let me know in your comments about your thoughts on this 👍

Comment on lines 263 to 273
# FIXME https://docs.python.org/3.7/reference/datamodel.html?highlight=__exit__#object.__exit__
# says that a boolean value should be returned. Is the lack of the
# return of a boolean value here intentional?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it is intentional. We get pylint problems otherwise and it is completely idiomatic to not use a return statement in context managers that never swallow exceptions, like this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, removed the FIXME.

):
self._canonical_code = canonical_code
self._description = description
from enum import Enum
Copy link
Member

@Oberon00 Oberon00 Nov 19, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please don't change the whole Status API in a PR that should just set a certain status value in certain cases.

In particular, there is no 1:1 mapping between description and status code (e.g. see the pymongo integration which makes good use of the status description to add additional information that could not be inferred from the status code alone). I have no idea why everyone seems to think that: #213 (comment)

Description is a free-form text that can and should be specified by the user. For example I could create a status with "NOT_FOUND" and Text "open: The system could not find the specified file" or "HTTP 404: Username does not exist".

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All right, I left the existing comments as documentation for the XYZStatus classes and introduced again the description field.

from sys import modules
from typing import Optional

_STATUS_META = OrderedDict([
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are you changing the enum to a orderedDict?

Copy link
Contributor Author

@ocelotl ocelotl Nov 21, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not, actually 🙂. It may be not obvious, but the StatusCanonicalCode is still an Enum.

@ocelotl ocelotl force-pushed the issue_292 branch 3 times, most recently from 118d6f8 to 8b5bb8c Compare November 21, 2019 00:02
@codecov-io
Copy link

codecov-io commented Nov 22, 2019

Codecov Report

Merging #297 into master will increase coverage by 0.08%.
The diff coverage is 92.85%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #297      +/-   ##
==========================================
+ Coverage   84.51%   84.59%   +0.08%     
==========================================
  Files          33       33              
  Lines        1666     1675       +9     
  Branches      197      199       +2     
==========================================
+ Hits         1408     1417       +9     
  Misses        201      201              
  Partials       57       57
Impacted Files Coverage Δ
...ntelemetry-api/src/opentelemetry/trace/__init__.py 83.73% <ø> (ø) ⬆️
...pentelemetry-api/src/opentelemetry/trace/status.py 85.29% <100%> (ø) ⬆️
...emetry-sdk/src/opentelemetry/sdk/trace/__init__.py 90.38% <92.3%> (+0.43%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update b25af7f...81d6b06. Read the comment docs.

@@ -82,7 +84,7 @@ def succeeded(self, event: monitoring.CommandSucceededEvent):
span.set_attribute(
"db.mongo.duration_micros", event.duration_micros
)
span.set_status(Status(StatusCanonicalCode.OK, event.reply))
span.set_status(OkStatus())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Irrespective of the merit of these changes, it is just best practice to submit independent changes in independent pull requests, at least when a controversial change is among them. Please split this PR up into:

  • One that actually just automatically sets the status for ended spans on exception
  • One that changes the Status API if you still want to do that

IMHO, this PR is mostly a discussion of the latter so it would make sense to make a new PR for (just) the exception Status thing and remove these changes from this PR.

@ocelotl ocelotl force-pushed the issue_292 branch 5 times, most recently from 10524af to 1be426a Compare November 27, 2019 23:37
@ocelotl ocelotl requested a review from Oberon00 November 28, 2019 00:03
Copy link
Contributor

@codeboten codeboten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couple of changes and a question regarding what to do in the cause of auto_update without a status. The current behaviour is for the status to be OK and based on the conversation in #292, I believe that's the desired behaviour. If so, that needs to be implemented here.

tox.ini Outdated Show resolved Hide resolved
opentelemetry-api/src/opentelemetry/trace/__init__.py Outdated Show resolved Hide resolved
@@ -340,6 +343,7 @@ class DefaultSpan(Span):
"""

def __init__(self, context: "SpanContext") -> None:
super(DefaultSpan, self).__init__()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this be removed since Span isn't doing anything with its __init__?

Copy link
Contributor Author

@ocelotl ocelotl Nov 29, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Funny, I think tox -e lint was failing before if this line was not there. It is not failing now after I removed it. 🤔

self.assertIs(root.status, None)

def test_error_status(self):
with raises(StatusError):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could this be using with self.assertRaises(StatusError): instead and remove the need to import raises above?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's new to me! Updated 👍

)

else:
self.set_status(Status(StatusCanonicalCode.UNKNOWN))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If auto_update_status is true, should the status be set to OK or UNKNOWN if there's no exception?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense, will update.

opentelemetry-api/src/opentelemetry/util/loader.py Outdated Show resolved Hide resolved
@@ -134,6 +136,7 @@ def __init__(
links: Sequence[trace_api.Link] = (),
kind: trace_api.SpanKind = trace_api.SpanKind.INTERNAL,
span_processor: SpanProcessor = SpanProcessor(),
auto_update_status: bool = True,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It feels a bit odd that this becomes a property of the span. But since the span is also the context manager, I guess it's OK.


self.span_processor = span_processor
self.status = trace_api.Status()
# FIXME: check the statement below.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is that FIXME resolved yet?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I removed it. What I want to point out is that self.status would be no longer be OK by default, but None. This makes it possible to detect if the status has been set manually by the user, in that case, the auto-update-status mechanism will not try to update it (this was requested here. That specific request asked for the status to be automatically updated only if the status was OK, but what happens if the user had manually set the status to OK? In that case, the OK status set by the user would be overridden, which is not wanted.

Also, to the best of my understanding, the status is a response code, so, in my opinion, it should be set only after there has been a response, not from the very beginning of the span life.

Please share your opinions on this particular approach (having self.status = None as default).

else:
self.set_status(Status(StatusCanonicalCode.OK))

super(Span, self).__exit__(exc_type, exc_val, exc_tb)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure when to use super vs just NameOfBaseClass. In any case, if we decide to use super() we can use it without arguments, since we are not Python 2 compatible.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the result should be the same either way. Just because of personal preference I have used super still, but removed the arguments as you indicated.

self.set_status(Status(StatusCanonicalCode.UNKNOWN))

else:
self.set_status(Status(StatusCanonicalCode.OK))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can use the span without a context manager, so this code should be moved to end().

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that may be against this request. I also think is cleaner if we put the automatic status setting mechanisms only in the context manager. In this way, it is possible to simply state that if the context manager is not used, then the user has full control and is the responsibility of the user to handle all the starting, ending and status setting of the span manually.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By the way, should __enter__ call self.start?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, by the time __enter__ is called, the span will always be started already (we don't have unstarted spans anymore).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that may be against this request.

I think there is a slight misunderstanding here. I only meant the else: part should be moved to end(). Because right now there is actually a bug:

my_span = tracer.start_span('mycoolspanname')
my_span.end() # This will cause a span with `status == None` to be sent to the exporter, which is invalid

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, moved this to end inside an if that guards against overriding a status previously set by the user.

opentelemetry-sdk/src/opentelemetry/sdk/trace/__init__.py Outdated Show resolved Hide resolved
@@ -183,3 +183,25 @@ def description(self) -> typing.Optional[str]:
def is_ok(self) -> bool:
"""Returns false if this represents an error, true otherwise."""
return self._canonical_code is StatusCanonicalCode.OK


class StatusError(Exception):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm against having a status error exception. It forces the app/library developer to mix observability code with application logic.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, but without a status-specific exception like this one, how can the auto-status-update mechanism know which status to set? @lzchen mentions here: In OpenCensus the status seems to be derived from the exception thrown.

This PR tries to set the status depending on the exception thrown by simply taking the status from the exception and using it to set the status. That, of course, makes it necessary that the user raises a StatusError with the desired status in their application code. Just to be sure that I understand you correctly, is that necessity what you consider a mix of observability code with application logic?

I wonder if we can work around this problem by making the context manager "smart" in some way, with that I mean capable of setting the status without being explicitly informed which status to set by the raised exception. I'll try to come up with some idea, please let me know your thoughts about this too, @Oberon00.

Copy link
Member

@Oberon00 Oberon00 Dec 5, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how can the auto-status-update mechanism know which status to set

I think it would be nice to have an API like addExceptionToStatusTranslator(translator: Callable[[Exception], Status]) that returns None if it doesn't know the exception and otherwise a Status for the exception. These could then be queried in order. But I also think this is beyond the scope of this PR and for now just always using Unknown is fine.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think another argument against the StatusErrorException is that you can always just set the span status directly instead of raising a special exception, if you care about the actual status.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I have updated the PR with the removal of StatusError. After taking a look at OpenCensus here I realized that they basically just set the status to Unknown. This matches what you have suggested above. I implemented like this, so basically, the behavior is equivalent to OpenCensus on this matter.

I kept the removal of StatusError in a separate commit, just in case. I'll squash when this has your LGTM, @Oberon00.

Now, regarding the mixture of observability and application code, I have just a slight concern. I think that you have a valid point when you mention that this is not desired, but then you mention that: you can always just set the span status directly instead of raising a special exception

Isn't setting the status directly also a mix of observability and application code?

The previously implemented mechanism allowed the user to raise an exception while also setting the status, if desired. Without this, I guess the user will set the status directly and then raise an exception. I find these two approaches kind of equivalent, both apparently mix observability and application code.

So, if we want to avoid this mix, should we also make the status setting method at least not part of the interface that the user has access to?

Well, this is quickly getting in the realm of design philosophy 😄 Just for the record, and to get things moving, I am ok with merging your requested changes that are now implemented. Just wanted to know your thoughts on this issue of mixing application and observability code. 👍

Thank you, @Oberon00!

opentelemetry-sdk/src/opentelemetry/sdk/trace/__init__.py Outdated Show resolved Hide resolved
@ocelotl ocelotl force-pushed the issue_292 branch 3 times, most recently from b423d16 to 152821a Compare December 3, 2019 19:26
@ocelotl ocelotl requested a review from Oberon00 December 4, 2019 20:34
@Oberon00 Oberon00 dismissed their stale review December 5, 2019 13:05

Most blocking comments were addressed.

@ocelotl ocelotl force-pushed the issue_292 branch 2 times, most recently from 6d0fcf3 to 9923ea8 Compare December 9, 2019 16:26
Copy link
Member

@Oberon00 Oberon00 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Most important: Default the status to OK before passing the span to the span processor.

@@ -392,6 +395,7 @@ def start_span(
attributes: typing.Optional[types.Attributes] = None,
links: typing.Sequence[Link] = (),
start_time: typing.Optional[int] = None,
auto_update_status: bool = True,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, I'd like to have a more descriptive name for this, but the best I can come up with is set_error_status_on_exception which is a mouthful.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, fixed!

@@ -423,6 +427,8 @@ def start_span(
attributes: The span's attributes.
links: Links span to other spans
start_time: Sets the start time of a span
auto_update_status: Defines if the status should be updated
automatically when the span finishes
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is too vague. How does it update?

I suggest: "Whether the span status should automatically be set to an error when an exception leaves the with-block where the span is used. A manually set status overrides this, even if set before the exception."

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed too!

"""Ends context manager and calls `end` on the `Span`."""

if self.status is None and self._auto_update_status:
if exc_val is not None:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could be and-ed to the above line.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed!

Copy link
Member

@Oberon00 Oberon00 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@Oberon00
Copy link
Member

@codeboten could you please re-review?

@ocelotl ocelotl requested a review from a team December 10, 2019 17:24
@Oberon00
Copy link
Member

@ocelotl Please don't force-push over reviewed code, it makes it very hard to see what changed since the last review (in this case it seems to be nothing, just a rebase, but I had to carefully look at the diff to find this out).

@ocelotl
Copy link
Contributor Author

ocelotl commented Dec 10, 2019

Ok, will remember that, @Oberon00 👍 This was just a squash of 2 commits into one.

Copy link
Contributor

@codeboten codeboten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 looks good

@c24t c24t merged commit cb0169c into open-telemetry:master Dec 11, 2019
@ocelotl ocelotl deleted the issue_292 branch December 11, 2019 21:17
srikanthccv pushed a commit to srikanthccv/opentelemetry-python that referenced this pull request Nov 1, 2020
* feat: add metrics SDK package

* feat: add metrics sdk package
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Define default status behavior for spans when ending in failure
6 participants