update tracing error tag for grpc status codes#20090
update tracing error tag for grpc status codes#20090lizan merged 14 commits intoenvoyproxy:mainfrom bryanwux:main
Conversation
Signed-off-by: Jiayu Wu <jiayu1.wu@intel.com>
|
Hi @bryanwux, welcome and thank you for your contribution. We will try to review your Pull Request as quickly as possible. In the meantime, please take a look at the contribution guidelines if you have not done so already. |
|
Retrying Azure Pipelines: |
| span.setTag(Tracing::Tags::get().Error, Tracing::Tags::get().True); | ||
| if (grpc_status_code.has_value()) { | ||
| const auto& status = grpc_status_code.value(); | ||
| if (status != Grpc::Status::WellKnownGrpcStatus::InvalidCode && |
There was a problem hiding this comment.
Yes, detailed introduction can be found here: https://grpc.github.io/grpc/core/md_doc_statuscodes.html
There was a problem hiding this comment.
What do you think of converting to a switch, so if gRPC adds new error codes and we pick up on import, we compile fail?
There was a problem hiding this comment.
Also can we link https://grpc.github.io/grpc/core/md_doc_statuscodes.html here in the comment as well?
There was a problem hiding this comment.
What do you think of converting to a switch, so if gRPC adds new error codes and we pick up on import, we compile fail?
Yes, I agree a switch makes more sense in this case.
Also can we link https://grpc.github.io/grpc/core/md_doc_statuscodes.html here in the comment as well?
Done
Signed-off-by: Jiayu Wu <jiayu1.wu@intel.com>
|
Looks like this is a continuation for #19603 which got mistakenly closed. /assign @alyssawilk |
alyssawilk
left a comment
There was a problem hiding this comment.
Sorry for the delay in review - I was out and it got misplaced when reopened
/wait
| span.setTag(Tracing::Tags::get().Error, Tracing::Tags::get().True); | ||
| if (grpc_status_code.has_value()) { | ||
| const auto& status = grpc_status_code.value(); | ||
| if (status != Grpc::Status::WellKnownGrpcStatus::InvalidCode && |
There was a problem hiding this comment.
What do you think of converting to a switch, so if gRPC adds new error codes and we pick up on import, we compile fail?
| span.setTag(Tracing::Tags::get().Error, Tracing::Tags::get().True); | ||
| if (grpc_status_code.has_value()) { | ||
| const auto& status = grpc_status_code.value(); | ||
| if (status != Grpc::Status::WellKnownGrpcStatus::InvalidCode && |
There was a problem hiding this comment.
Also can we link https://grpc.github.io/grpc/core/md_doc_statuscodes.html here in the comment as well?
| absl::optional<Grpc::Status::GrpcStatus> grpc_status_code = Grpc::Common::getGrpcStatus(headers); | ||
| if (grpc_status_code && grpc_status_code.value() != Grpc::Status::WellKnownGrpcStatus::Ok) { | ||
| span.setTag(Tracing::Tags::get().Error, Tracing::Tags::get().True); | ||
| if (grpc_status_code.has_value()) { |
There was a problem hiding this comment.
I think we're going to want to runtime guard this change, as described in CONTRIBUTING.md
Let's also add a comment where the tracing error code is defined to be more clear about what error means in this case (upstream or envoy error, not client error)
|
Retrying Azure Pipelines: |
Signed-off-by: Jiayu Wu <jiayu1.wu@intel.com>
|
/retest |
|
Retrying Azure Pipelines: |
alyssawilk
left a comment
There was a problem hiding this comment.
Looking better and better!
| case Grpc::Status::WellKnownGrpcStatus::OutOfRange: | ||
| case Grpc::Status::WellKnownGrpcStatus::Unauthenticated: | ||
| break; | ||
| default: |
There was a problem hiding this comment.
so if you have a default we still won't catch new error codes when added. Can we avoid this?
Signed-off-by: Jiayu Wu <jiayu1.wu@intel.com>
Signed-off-by: Jiayu Wu <jiayu1.wu@intel.com>
Signed-off-by: Jiayu Wu <jiayu1.wu@intel.com>
Signed-off-by: bryanwux <jiayu1.wu@intel.com>
Signed-off-by: Jiayu Wu <jiayu1.wu@intel.com>
alyssawilk
left a comment
There was a problem hiding this comment.
Looking good! Sorry for the review delay I was out Friday.
Just two more questions, so I'll add in a second reviewer as well
/wait
| case Grpc::Status::WellKnownGrpcStatus::AlreadyExists: | ||
| case Grpc::Status::WellKnownGrpcStatus::PermissionDenied: | ||
| case Grpc::Status::WellKnownGrpcStatus::FailedPrecondition: | ||
| case Grpc::Status::WellKnownGrpcStatus::Aborted: |
There was a problem hiding this comment.
hm, canceled says "usually by the caller" but aborted says it's a concurrency issue. Would that be client side or server side error do you think?
There was a problem hiding this comment.
aborted is used due to a concurrency issue which means the client should restart a read-modify-write sequence. I think this would be a client side error.
|
@lizan I think this is ready for your pass (only 2 nits left on my end) |
Below is the proposed change for tracing error tag behaviour. A brief description about whether each response code should be indicated as an error has also been listed. ``` OK | 0 | error=false CANCELLED | 1 | error=false Not a server error. The operation is cancelled by the client. UNKNOWN | 2 | error=true Server error. Status value received from unknown address space. INVALID_ARGUMENT | 3 | error=false Not a server error. The client enters an invalid argument. DEADLINE_EXCEEDED | 4 | error=true Server error. Response from the server has been delayed long, even if it may be successful. NOT_FOUND | 5 | error=false Not a server error. The requested file or directory is not found. ALREADY_EXISTS | 6 | error=false Not a server error. The file or directory the client attempts to create already exists. PERMISSION_DENIED | 7 | error=false Not a server error. The client doesn't have permission to execute the operation. RESOURCE_EXHAUSTED | 8 | error=false Not a server error. Some resource has been exhausted like the file system is out of space. FAILED_PRECONDITION | 9 | error=false Not a server error. The requested operation is rejected because it doesn't meet preconditions. ABORTED | 10 | error=false Not a server error. The operation is aborted, typically due to concurrency issues. OUT_OF_RANGE | 11 | error=false Not a server error. The operation was attempted to read past the valid range. UNIMPLEMENTED | 12 | error=true Server error. The service is not implemented or not enabled. INTERNAL | 13 | error=true Server error. The system is broken, typically reserved for serious cases. UNAVAILABLE | 14 | error=true Server error. The service is currently not available. DATA_LOSS | 15 | error=true Server error. Unrecoverable data loss or corruption. UNAUTHENTICATED | 16 | error=false Not a server error. The request does not have valid credentials for the operation. ``` Risk Level: Low Testing: Docs Changes: Release Notes: Platform Specific Features: Fixes envoyproxy#18877 Signed-off-by: bryanwux <jiayu1.wu@intel.com> Signed-off-by: kuochunghsu <kuochunghsu@pinterest.com>
Signed-off-by: Jiayu Wu jiayu1.wu@intel.com
Commit Message: Change tracing error tag behaviour for grpc codes #18877
Additional Description:
Below is the proposed change for tracing error tag behaviour. A brief description about whether each response code should be indicated as an error has also been listed.
Risk Level: Low
Testing:
Docs Changes:
Release Notes:
Platform Specific Features:
[Optional Runtime guard:]
[Optional Fixes #Issue]: #18877
[Optional Fixes commit #PR or SHA]
[Optional Deprecated:]
[Optional API Considerations:]