-
Notifications
You must be signed in to change notification settings - Fork 657
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Set status for ended spans #297
Conversation
49bb8c5
to
4411184
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My main complaint is that you should not (need to) change the whole Status API in this PR. I think it is perfectly fine as it is now. See #297 (comment).
@@ -82,7 +84,7 @@ def succeeded(self, event: monitoring.CommandSucceededEvent): | |||
span.set_attribute( | |||
"db.mongo.duration_micros", event.duration_micros | |||
) | |||
span.set_status(Status(StatusCanonicalCode.OK, event.reply)) | |||
span.set_status(OkStatus()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do you discard the description here? This seems like a regression to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why updating pymongo integration at all this in the first place?, referenced issue you added is suggesting we do similar stuff as this code, having a default status of unknown when there is an error
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ready, I added the description back. I understand how it can be valuable to have it to provide a better explanation of the status code, possibly to describe specific details of the code in its context.
Still, I am still changing the API because this PR changes the Status
class from being instantiated with a particular code to having separated classes for every code itself. I believe this to be advantageous since when setting a status we go from this:
set_status(Status(StatusCanonicalCode.OK, "description")))
to this:
set_status(OkStatus("description"))
Nevertheless, this may or may not be the desired approach for this project, please let me know in the comments.
Another important thing about this PR is the proposed dynamic generation of the statuses and their matching exceptions. Instead of having all them coded by hand, an OrderedDict
is being used to define them and they are dynamically added to the module automatically. This saves code and saves us from having to keep the exceptions and the statuses synchronized by hand.
@lzchen points out here that in OpenCensus the status seems to be derived from the exception thrown. This implementation attempts to do that, by providing status-related exceptions that can be raised in the span code. For example, raising a InvalidArgumentError("description")
exception in the span code automatically sets the span status to InvalidArgumentStatus(description="description")
.
This may or may not be the desired approach for this project, please let me know in the comments.
Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Irrespective of the merit of these changes, it is just best practice to submit independent changes in independent pull requests, at least when a controversial change is among them. Please split this PR up into:
- One that actually just automatically sets the status for ended spans on exception
- One that changes the Status API if you still want to do that
IMHO, this PR is mostly a discussion of the latter so it would make sense to make a new PR for (just) the exception Status thing and remove these changes from this PR.
@@ -92,7 +94,7 @@ def failed(self, event: monitoring.CommandFailedEvent): | |||
span.set_attribute( | |||
"db.mongo.duration_micros", event.duration_micros | |||
) | |||
span.set_status(Status(StatusCanonicalCode.UNKNOWN, event.failure)) | |||
span.set_status(UnknownStatus()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here, event.failure
could contain valuable information.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 hiding the actual error would be pretty bad
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, adding the description field back 👍
@@ -144,9 +146,13 @@ class SpanKind(enum.Enum): | |||
CONSUMER = 4 | |||
|
|||
|
|||
class Span: | |||
class Span(object): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We are not Python2 compatible, pylint should issue an error here about useless object inheritance.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, too much Python2 compatible code has made that almost a reflex response in me, removing. 😅
"""A span represents a single operation within a trace.""" | ||
|
||
def __init__(self, auto_update_status=False): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't really like adding constructor arguments here. Maybe we move the behavior controlled by this into the SDK?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I moved it. Please let me know if this is what you were expecting.
self.status is None and | ||
self._auto_update_status | ||
): | ||
if isinstance(exc_val, Status): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems strange to have an exception derived from Status. Python enforces that all raised objects must derive from BaseException
, otherwise a TypeError
is raised. If we want to be able to set a status by throwing a special value, we should throw some exception type that has a status, not one that is a status. But I'd rather not support that since throwing an OpenTelemetry-specific exception type would mix observability and application logic and we definitely don't want an application's behavior to change depending on the used API implementation or observability configuration.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, fixed that mistake. That being said, I acknowledge that the implementation as it is now is intended to allow the user to throw exceptions precisely in the way that you mention you don't want. I'm not trying to cause controversy here, is just that I feel like this implementation is what the issue submitter suggested, as OpenCensus does something similar 🙂
Again, this may or may not be the desired approach, please let me know in your comments about your thoughts on this 👍
# FIXME https://docs.python.org/3.7/reference/datamodel.html?highlight=__exit__#object.__exit__ | ||
# says that a boolean value should be returned. Is the lack of the | ||
# return of a boolean value here intentional? | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it is intentional. We get pylint problems otherwise and it is completely idiomatic to not use a return statement in context managers that never swallow exceptions, like this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, removed the FIXME
.
): | ||
self._canonical_code = canonical_code | ||
self._description = description | ||
from enum import Enum |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please don't change the whole Status API in a PR that should just set a certain status value in certain cases.
In particular, there is no 1:1 mapping between description and status code (e.g. see the pymongo integration which makes good use of the status description to add additional information that could not be inferred from the status code alone). I have no idea why everyone seems to think that: #213 (comment)
Description is a free-form text that can and should be specified by the user. For example I could create a status with "NOT_FOUND" and Text "open: The system could not find the specified file" or "HTTP 404: Username does not exist".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All right, I left the existing comments as documentation for the XYZStatus
classes and introduced again the description field.
from sys import modules | ||
from typing import Optional | ||
|
||
_STATUS_META = OrderedDict([ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are you changing the enum to a orderedDict?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not, actually 🙂. It may be not obvious, but the StatusCanonicalCode
is still an Enum
.
118d6f8
to
8b5bb8c
Compare
Codecov Report
@@ Coverage Diff @@
## master #297 +/- ##
==========================================
+ Coverage 84.51% 84.59% +0.08%
==========================================
Files 33 33
Lines 1666 1675 +9
Branches 197 199 +2
==========================================
+ Hits 1408 1417 +9
Misses 201 201
Partials 57 57
Continue to review full report at Codecov.
|
@@ -82,7 +84,7 @@ def succeeded(self, event: monitoring.CommandSucceededEvent): | |||
span.set_attribute( | |||
"db.mongo.duration_micros", event.duration_micros | |||
) | |||
span.set_status(Status(StatusCanonicalCode.OK, event.reply)) | |||
span.set_status(OkStatus()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Irrespective of the merit of these changes, it is just best practice to submit independent changes in independent pull requests, at least when a controversial change is among them. Please split this PR up into:
- One that actually just automatically sets the status for ended spans on exception
- One that changes the Status API if you still want to do that
IMHO, this PR is mostly a discussion of the latter so it would make sense to make a new PR for (just) the exception Status thing and remove these changes from this PR.
10524af
to
1be426a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Couple of changes and a question regarding what to do in the cause of auto_update without a status. The current behaviour is for the status to be OK and based on the conversation in #292, I believe that's the desired behaviour. If so, that needs to be implemented here.
@@ -340,6 +343,7 @@ class DefaultSpan(Span): | |||
""" | |||
|
|||
def __init__(self, context: "SpanContext") -> None: | |||
super(DefaultSpan, self).__init__() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can this be removed since Span
isn't doing anything with its __init__
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Funny, I think tox -e lint
was failing before if this line was not there. It is not failing now after I removed it. 🤔
self.assertIs(root.status, None) | ||
|
||
def test_error_status(self): | ||
with raises(StatusError): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could this be using with self.assertRaises(StatusError):
instead and remove the need to import raises
above?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's new to me! Updated 👍
) | ||
|
||
else: | ||
self.set_status(Status(StatusCanonicalCode.UNKNOWN)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If auto_update_status
is true, should the status be set to OK or UNKNOWN if there's no exception?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense, will update.
@@ -134,6 +136,7 @@ def __init__( | |||
links: Sequence[trace_api.Link] = (), | |||
kind: trace_api.SpanKind = trace_api.SpanKind.INTERNAL, | |||
span_processor: SpanProcessor = SpanProcessor(), | |||
auto_update_status: bool = True, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It feels a bit odd that this becomes a property of the span. But since the span is also the context manager, I guess it's OK.
|
||
self.span_processor = span_processor | ||
self.status = trace_api.Status() | ||
# FIXME: check the statement below. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is that FIXME resolved yet?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I removed it. What I want to point out is that self.status
would be no longer be OK
by default, but None
. This makes it possible to detect if the status has been set manually by the user, in that case, the auto-update-status mechanism will not try to update it (this was requested here. That specific request asked for the status to be automatically updated only if the status was OK
, but what happens if the user had manually set the status to OK
? In that case, the OK
status set by the user would be overridden, which is not wanted.
Also, to the best of my understanding, the status is a response code, so, in my opinion, it should be set only after there has been a response, not from the very beginning of the span life.
Please share your opinions on this particular approach (having self.status = None
as default).
else: | ||
self.set_status(Status(StatusCanonicalCode.OK)) | ||
|
||
super(Span, self).__exit__(exc_type, exc_val, exc_tb) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure when to use super
vs just NameOfBaseClass
. In any case, if we decide to use super()
we can use it without arguments, since we are not Python 2 compatible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the result should be the same either way. Just because of personal preference I have used super
still, but removed the arguments as you indicated.
self.set_status(Status(StatusCanonicalCode.UNKNOWN)) | ||
|
||
else: | ||
self.set_status(Status(StatusCanonicalCode.OK)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can use the span without a context manager, so this code should be moved to end()
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that may be against this request. I also think is cleaner if we put the automatic status setting mechanisms only in the context manager. In this way, it is possible to simply state that if the context manager is not used, then the user has full control and is the responsibility of the user to handle all the starting, ending and status setting of the span manually.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By the way, should __enter__
call self.start
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, by the time __enter__
is called, the span will always be started already (we don't have unstarted spans anymore).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that may be against this request.
I think there is a slight misunderstanding here. I only meant the else:
part should be moved to end()
. Because right now there is actually a bug:
my_span = tracer.start_span('mycoolspanname')
my_span.end() # This will cause a span with `status == None` to be sent to the exporter, which is invalid
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, moved this to end inside an if
that guards against overriding a status previously set by the user.
@@ -183,3 +183,25 @@ def description(self) -> typing.Optional[str]: | |||
def is_ok(self) -> bool: | |||
"""Returns false if this represents an error, true otherwise.""" | |||
return self._canonical_code is StatusCanonicalCode.OK | |||
|
|||
|
|||
class StatusError(Exception): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm against having a status error exception. It forces the app/library developer to mix observability code with application logic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, but without a status-specific exception like this one, how can the auto-status-update mechanism know which status to set? @lzchen mentions here: In OpenCensus the status seems to be derived from the exception thrown.
This PR tries to set the status depending on the exception thrown by simply taking the status from the exception and using it to set the status. That, of course, makes it necessary that the user raises a StatusError
with the desired status in their application code. Just to be sure that I understand you correctly, is that necessity what you consider a mix of observability code with application logic?
I wonder if we can work around this problem by making the context manager "smart" in some way, with that I mean capable of setting the status without being explicitly informed which status to set by the raised exception. I'll try to come up with some idea, please let me know your thoughts about this too, @Oberon00.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how can the auto-status-update mechanism know which status to set
I think it would be nice to have an API like addExceptionToStatusTranslator(translator: Callable[[Exception], Status])
that returns None
if it doesn't know the exception and otherwise a Status for the exception. These could then be queried in order. But I also think this is beyond the scope of this PR and for now just always using Unknown is fine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think another argument against the StatusErrorException is that you can always just set the span status directly instead of raising a special exception, if you care about the actual status.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I have updated the PR with the removal of StatusError
. After taking a look at OpenCensus here I realized that they basically just set the status to Unknown. This matches what you have suggested above. I implemented like this, so basically, the behavior is equivalent to OpenCensus on this matter.
I kept the removal of StatusError
in a separate commit, just in case. I'll squash when this has your LGTM, @Oberon00.
Now, regarding the mixture of observability and application code, I have just a slight concern. I think that you have a valid point when you mention that this is not desired, but then you mention that: you can always just set the span status directly instead of raising a special exception
Isn't setting the status directly also a mix of observability and application code?
The previously implemented mechanism allowed the user to raise an exception while also setting the status, if desired. Without this, I guess the user will set the status directly and then raise an exception. I find these two approaches kind of equivalent, both apparently mix observability and application code.
So, if we want to avoid this mix, should we also make the status setting method at least not part of the interface that the user has access to?
Well, this is quickly getting in the realm of design philosophy 😄 Just for the record, and to get things moving, I am ok with merging your requested changes that are now implemented. Just wanted to know your thoughts on this issue of mixing application and observability code. 👍
Thank you, @Oberon00!
b423d16
to
152821a
Compare
6d0fcf3
to
9923ea8
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Most important: Default the status to OK before passing the span to the span processor.
@@ -392,6 +395,7 @@ def start_span( | |||
attributes: typing.Optional[types.Attributes] = None, | |||
links: typing.Sequence[Link] = (), | |||
start_time: typing.Optional[int] = None, | |||
auto_update_status: bool = True, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, I'd like to have a more descriptive name for this, but the best I can come up with is set_error_status_on_exception
which is a mouthful.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, fixed!
@@ -423,6 +427,8 @@ def start_span( | |||
attributes: The span's attributes. | |||
links: Links span to other spans | |||
start_time: Sets the start time of a span | |||
auto_update_status: Defines if the status should be updated | |||
automatically when the span finishes |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is too vague. How does it update?
I suggest: "Whether the span status should automatically be set to an error when an exception leaves the with
-block where the span is used. A manually set status overrides this, even if set before the exception."
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed too!
"""Ends context manager and calls `end` on the `Span`.""" | ||
|
||
if self.status is None and self._auto_update_status: | ||
if exc_val is not None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could be and
-ed to the above line.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
@codeboten could you please re-review? |
@ocelotl Please don't force-push over reviewed code, it makes it very hard to see what changed since the last review (in this case it seems to be nothing, just a rebase, but I had to carefully look at the diff to find this out). |
Ok, will remember that, @Oberon00 👍 This was just a squash of 2 commits into one. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 looks good
* feat: add metrics SDK package * feat: add metrics sdk package
Fixes #292
A few test issues pending still. Opening up this PR to receive comments and feedback, specially related to the implementation of the status classes and exceptions.