-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
stats/internal: OpenTelemetry tracing GRPCTraceBinPropagator #7677
base: master
Are you sure you want to change the base?
stats/internal: OpenTelemetry tracing GRPCTraceBinPropagator #7677
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #7677 +/- ##
==========================================
- Coverage 81.91% 81.85% -0.07%
==========================================
Files 374 376 +2
Lines 37930 37988 +58
==========================================
+ Hits 31071 31095 +24
- Misses 5566 5588 +22
- Partials 1293 1305 +12
|
92de02a
to
9a9336b
Compare
d0ddcc0
to
5e524a8
Compare
I'll let @zasweq do the primary review here and just take a 2nd pass. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some comments on implementation; didn't review tests yet. My next pass will be more on correctness, this pass was more general Go style/API and docstring semantics.
// GRPCTraceBinPropagator is TextMapPropagator to propagate cross-cutting | ||
// concerns as both text and binary key-value pairs within a carrier that | ||
// travels in-band across process boundaries. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here and elsewhere: I don't know what these comments semantically refer to. What does "cross-cutting concerns" mean? I don't think this is a strict requirement "within a carrier that travels in-band across process boundaries."
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
both "cross-cutting concerns" and "within a carrier that travels in-band across process boundaries." are taken directly from OpenTelemetry propagation.TextMapCarrier
interface https://pkg.go.dev/go.opentelemetry.io/otel/propagation#TextMapPropagator.
"cross-cutting concerns" simply refers to functionalities that are not central to the core logic of your application but are needed across different parts of it (eg. tracing, logging etc.) which are transported "in-band" (within the main data flow) across process boundaries (separate applications).
However, you are right to point out that we should keep the docstrings simplified for anyone to understand the purpose of the method. I have simplified the comments everwhere now. Let me know what you think.
func (c CustomCarrier) GetBinary() ([]byte, error) { | ||
values := stats.Trace(*c.Ctx) | ||
if len(values) == 0 { | ||
return nil, errors.New("`grpc-trace-bin` header not found") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this the correct error message? I think both in the case it's not present in md and if it's set to an empty byte string it'll be not found. Is an empty byte string distinct from nil in the return? Is the empty byte string a valid thing to transmit for this header? Do we want this to return an error at all?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed returning error from here to be consistent with Get(..)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm fine with the error. What is the intended usage of this function? Do you intended to treat not set and set to an empty string equivalent? Is empty string a valid key for this? In that case you should make the error message fail with that. But that relates to the intended usage of this, and the failure modes (i.e. what happens in the error case).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here we are only looking trace context which needs to conform to following struct. So the propagator will not allow to set any such value (like empty) as it won't be a valid SpanContext
. Hence, the only reason for returning nil is client didn't set the grpc-trace-bin
header.
// Fields:
//
// TraceId: (field_id = 0, len = 16, default = "0000000000000000") -
// 16-byte array representing the trace_id.
//
// SpanId: (field_id = 1, len = 8, default = "00000000") - 8-byte array
// representing the span_id.
//
// TraceOptions: (field_id = 2, len = 1, default = "0") - 1-byte array
representing the trace_options.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is empty string a valid key for this?
Binary methods will not deal with string keys because we are using existing stats package
Line 316 in d9d8f34
func SetTrace(ctx context.Context, b []byte) context.Context { |
|
||
// SetBinary sets the binary value to the gRPC context, which will be sent in | ||
// the outgoing RPC with the header grpc-trace-bin. | ||
func (c CustomCarrier) SetBinary(value []byte) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't get the point of this API. It seems to do things by deferring to operations on a context either to the stats package or the metadata package? What is the function of this API?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah this requires a bit more background on migration path from opencensus
to opentelemetry
(see A72).
The design proposes that while gRPC OpenCensus directly interacts with metadata API, gRPC Open Telemetry will use standardized https://pkg.go.dev/go.opentelemetry.io/otel/propagation package for context propagation by encoding them in metadata for the following benefits:
- Full integration with OpenTelemetry APIs that is easier for users to reason about.
- Make it possible to plugin other propagators that the community supports.
- Flexible API that allows clean and simple migration paths to a different propagator.
As of today, OpenTelemetry propagator API only supports https://pkg.go.dev/go.opentelemetry.io/otel/propagation#TextMapPropagator, that is to send string key/value pairs between the client and server, which is different from the binary header that gRPC currently uses. The future roadmap to support binary propagators at OpenTelemetry is unclear. So, gRPC will use propagator API in TextMap format with optimization path to work around the binary propagator API. Once the Open Telemetry binary propagator API is available in the future, we can continuously integrate with those API with little effort.
Therefore, we need a custom implementation for the Carrier that supports both binary and text values. For binary header grpc-trace-bin
we are reusing the existing stats package to conform with existing grpc design.
// context. It returns an empty string if the key is not present in the | ||
// context. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It also returns empty string if it's set in the context but has no values. But I don't know if this is the right behavior, should we return an error? What is the intended usage of this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, I replied above the reason why empty value is not possible. We check if the SpanContext is valid in propagator's Inject()
method before setting in carrier
span := oteltrace.SpanFromContext(ctx)
if !span.SpanContext().IsValid() {
return
}
```
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add to documentation the case if it's set in context but has no values. We don't assume this operation comes after that in the callsite, this is a documentation for this layer/operation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added
t.Run("Fast path with CustomCarrier", func(t *testing.T) { | ||
ctx, cancel := context.WithCancel(context.Background()) | ||
carrier := internaltracing.NewCustomCarrier(metadata.NewOutgoingContext(ctx, metadata.MD{})) | ||
propagator.Inject(traceCtx, carrier) | ||
|
||
got := stats.OutgoingTrace(*carrier.Context()) | ||
want := Binary(spanContext) | ||
if string(got) != string(want) { | ||
t.Fatalf("got = %v, want %v", got, want) | ||
} | ||
cancel() | ||
}) | ||
|
||
t.Run("Slow path with TextMapCarrier", func(t *testing.T) { | ||
carrier := otelpropagation.MapCarrier{} | ||
propagator.Inject(traceCtx, carrier) | ||
|
||
got := carrier.Get(internaltracing.GRPCTraceBinHeaderKey) | ||
want := base64.StdEncoding.EncodeToString(Binary(spanContext)) | ||
if got != want { | ||
t.Fatalf("got = %v, want %v", got, want) | ||
} | ||
}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please make t-test.
See https://github.com/grpc/grpc-go/blob/master/stats/opentelemetry/csm/observability_test.go#L200 for an example.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah i had tabular test before but both tests have different veriffication logic so I wrote in this way
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah i had them as tabular test before but verification logic is significantly different because of binary and string format so i wrote it this way
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I saw there are some existing tests written in similar way
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can declare the want in the t-test as a variable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you see logic to retrieve got is different as well. For fast path which will use SetBinary
we get it through stats package because we are using CustomCarrier
where as for slow path which will use Set
, we just retrieve directly using carrier's Get
because we are using a different carrier (not implemented by us). Would you prefer to have separate tests altogether like TestInject_FastPath, TestInject_SlowPath?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah separate tests are preferred over t.Run(). Thanks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Separated the fast and slow path tests
} | ||
cancel() | ||
}) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same comments on this test as above.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
refactor GRPCTraceBinPropagator to be exported externally Use context instead of metadata in CustomCarrier
7f1ffdd
to
0126d32
Compare
@dfawley as we discussed offline, I have modified the Carrier and GRPCTraceBinPropagator to directly set the binary header as string in carrier's metadata. However, I still use stats package stats package is to check if trace data was inserted using opencensus during injection and to insert binary header to outgoing context during extraction |
…d remove base64 usages
0126d32
to
062e769
Compare
000f90c
to
ee2869c
Compare
@dfawley as discussed offline. I have removed the |
This PR adds the
GRPCTraceBinPropagator
that implementsTextMapPropagator
as proposed in gRFC A72 which outlines a custom propagator to handle both binary and text formats for trace context propagation. This will allow gRPC to keep usinggrpc-trace-bin
header for context propagation and also support any other text propagators. When usinggrpc-trace-bin
the OpenCensus SpanContext and OpenTelemetry SpanContext are identical, therefore a gRPC OpenCensus client can speak with a gRPC OpenTelemetry server and vice versa. It is encouraged to useGRPCTraceBinPropagator
for the migration. Using the same header greatly simplifies rollout.Injecting trace context
GRPCTraceBinPropagator
first attempts to retrieve any existing binary trace data from the provided context usingstats.Trace()
. If found, it means that a trace data was injected by a system using gRPC OpenCensus plugin. Hence, we inject this trace data into the carrier, allowing trace from the systems which are using gRPC OpenCensus, to continue to propagate downstream. However, we set the value in string format againstgrpc-trace-bin
key so that downstream systems which are using gRPC OpenTelemetry plugin are able to extract it usingGRPCTraceBinPropagator
.GRPCTraceBinPropagator
then attempts to retrieve an OpenTelemetry span context from the provided context. If not found, that means either there is no trace context or previous system had injected it using gRPC OpenCensus plugin. Therefore, it returns early without doing anymore modification to carrier. If found, that means previous system had injected using OpenTelemetry plugin so it converts the span context to binary and set in carrier in string format againstgrpc-trace-bin
key.Extracting trace context
GRPCTraceBinPropagator
first attempts to readgrpc-trace-bin
header value from carrier. If found, that means the trace data was injected using gRPC OpenTelemetry plugin usingGRPCTraceBinPropagator
. It then set the trace data into context usingstats.SetTrace
for downstream systems still using gRPC OpenCensus plugin to be able to use this context.GRPCTraceBinPropagator
then also extracts the OpenTelemetry span context from binary header value. If span context is not valid, it just returns the context as parent. If span context is valid, it creates a new context containing the extracted OpenTelemetry span context marked as remote so that downstream systems using OpenTelemetry are able use this context.RELEASE NOTES: None