Skip to content
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 2 additions & 1 deletion docs/root/intro/arch_overview/observability/tracing.rst
Original file line number Diff line number Diff line change
Expand Up @@ -134,7 +134,8 @@ associated with it. Each span generated by Envoy contains the following data:
* Upstream cluster name, observability name, and address.
* HTTP response status code.
* GRPC response status and message (if available).
* An error tag when HTTP status is 5xx or GRPC status is not "OK".
* An error tag when HTTP status is 5xx or GRPC status is not "OK" and represents a server side error.
See `GRPC's documentation <https://grpc.github.io/grpc/core/md_doc_statuscodes.html>`_ for more information about GRPC status code.
* Tracing system-specific metadata.

The span also includes a name (or operation) which by default is defined as the host of the invoked
Expand Down
1 change: 1 addition & 0 deletions docs/root/version_history/current.rst
Original file line number Diff line number Diff line change
Expand Up @@ -32,6 +32,7 @@ Bug Fixes
* jwt_authn: fixed the crash when a CONNECT request is sent to JWT filter configured with regex match on the Host header.
* tcp_proxy: fix a crash that occurs when configured for :ref:`upstream tunneling <envoy_v3_api_field_extensions.filters.network.tcp_proxy.v3.TcpProxy.tunneling_config>` and the downstream connection disconnects while the the upstream connection or http/2 stream is still being established.
* tls: fix a bug while matching a certificate SAN with an exact value in ``match_typed_subject_alt_names`` of a listener where wildcard ``*`` character is not the only character of the dns label. Example, ``baz*.example.net`` and ``*baz.example.net`` and ``b*z.example.net`` will match ``baz1.example.net`` and ``foobaz.example.net`` and ``buzz.example.net``, respectively.
* tracing: set tracing error tag for grpc non-ok response code only when it is a server side error.
* upstream: fix stack overflow when a cluster with large number of idle connections is removed.
* xray: fix the AWS X-Ray tracer extension to not sample the trace if ``sampled=`` keyword is not present in the header ``x-amzn-trace-id``.

Expand Down
19 changes: 15 additions & 4 deletions source/common/tracing/http_tracer_impl.cc
Original file line number Diff line number Diff line change
Expand Up @@ -84,10 +84,21 @@ static void addGrpcRequestTags(Span& span, const Http::RequestHeaderMap& headers
template <class T> static void addGrpcResponseTags(Span& span, const T& headers) {
addTagIfNotNull(span, Tracing::Tags::get().GrpcStatusCode, headers.GrpcStatus());
addTagIfNotNull(span, Tracing::Tags::get().GrpcMessage, headers.GrpcMessage());
// Set error tag when status is not OK.
// Set error tag when Grpc status code represents an error. See
// https://github.com/envoyproxy/envoy/issues/18877
absl::optional<Grpc::Status::GrpcStatus> grpc_status_code = Grpc::Common::getGrpcStatus(headers);
if (grpc_status_code && grpc_status_code.value() != Grpc::Status::WellKnownGrpcStatus::Ok) {
span.setTag(Tracing::Tags::get().Error, Tracing::Tags::get().True);
if (grpc_status_code.has_value()) {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we're going to want to runtime guard this change, as described in CONTRIBUTING.md

Let's also add a comment where the tracing error code is defined to be more clear about what error means in this case (upstream or envoy error, not client error)

@bryanwux bryanwux Mar 1, 2022

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

const auto& status = grpc_status_code.value();
if (status != Grpc::Status::WellKnownGrpcStatus::InvalidCode &&

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does these cover all the error status?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, detailed introduction can be found here: https://grpc.github.io/grpc/core/md_doc_statuscodes.html

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you think of converting to a switch, so if gRPC adds new error codes and we pick up on import, we compile fail?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also can we link https://grpc.github.io/grpc/core/md_doc_statuscodes.html here in the comment as well?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you think of converting to a switch, so if gRPC adds new error codes and we pick up on import, we compile fail?

Yes, I agree a switch makes more sense in this case.

Also can we link https://grpc.github.io/grpc/core/md_doc_statuscodes.html here in the comment as well?

Done

(status == Grpc::Status::WellKnownGrpcStatus::Unknown ||
status == Grpc::Status::WellKnownGrpcStatus::DeadlineExceeded ||
status == Grpc::Status::WellKnownGrpcStatus::Unimplemented ||
status == Grpc::Status::WellKnownGrpcStatus::Internal ||
status == Grpc::Status::WellKnownGrpcStatus::Unavailable ||
status == Grpc::Status::WellKnownGrpcStatus::DataLoss ||
status == Grpc::Status::WellKnownGrpcStatus::Unauthenticated)) {
span.setTag(Tracing::Tags::get().Error, Tracing::Tags::get().True);
}
}
}

Expand Down Expand Up @@ -267,4 +278,4 @@ SpanPtr HttpTracerImpl::startSpan(const Config& config, Http::RequestHeaderMap&
}

} // namespace Tracing
} // namespace Envoy
} // namespace Envoy
3 changes: 0 additions & 3 deletions test/common/tracing/http_tracer_impl_test.cc
Original file line number Diff line number Diff line change
Expand Up @@ -634,7 +634,6 @@ TEST_F(HttpConnManFinalizerImplTest, GrpcErrorTag) {
stream_info.downstream_connection_info_provider_->setDirectRemoteAddressForTest(remote_address);

EXPECT_CALL(span, setTag(_, _)).Times(testing::AnyNumber());
EXPECT_CALL(span, setTag(Eq(Tracing::Tags::get().Error), Eq(Tracing::Tags::get().True)));
EXPECT_CALL(span, setTag(Eq(Tracing::Tags::get().HttpMethod), Eq("POST")));
EXPECT_CALL(span, setTag(Eq(Tracing::Tags::get().HttpProtocol), Eq("HTTP/2")));
EXPECT_CALL(span, setTag(Eq(Tracing::Tags::get().HttpStatusCode), Eq("200")));
Expand Down Expand Up @@ -679,7 +678,6 @@ TEST_F(HttpConnManFinalizerImplTest, GrpcTrailersOnly) {
stream_info.downstream_connection_info_provider_->setDirectRemoteAddressForTest(remote_address);

EXPECT_CALL(span, setTag(_, _)).Times(testing::AnyNumber());
EXPECT_CALL(span, setTag(Eq(Tracing::Tags::get().Error), Eq(Tracing::Tags::get().True)));
EXPECT_CALL(span, setTag(Eq(Tracing::Tags::get().HttpMethod), Eq("POST")));
EXPECT_CALL(span, setTag(Eq(Tracing::Tags::get().HttpProtocol), Eq("HTTP/2")));
EXPECT_CALL(span, setTag(Eq(Tracing::Tags::get().HttpStatusCode), Eq("200")));
Expand Down Expand Up @@ -825,7 +823,6 @@ TEST_F(HttpTracerImplTest, ChildUpstreamSpanTest) {
EXPECT_CALL(*second_span, setTag(Eq(Tracing::Tags::get().HttpStatusCode), Eq("200")));
EXPECT_CALL(*second_span, setTag(Eq(Tracing::Tags::get().GrpcStatusCode), Eq("7")));
EXPECT_CALL(*second_span, setTag(Eq(Tracing::Tags::get().GrpcMessage), Eq("permission denied")));
EXPECT_CALL(*second_span, setTag(Eq(Tracing::Tags::get().Error), Eq(Tracing::Tags::get().True)));

HttpTracerUtility::finalizeUpstreamSpan(*child_span, &response_headers_, &response_trailers_,
stream_info_, config_);
Expand Down