-
Notifications
You must be signed in to change notification settings - Fork 771
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add SuppressInstrumentation to Grpc client #1184
Add SuppressInstrumentation to Grpc client #1184
Conversation
Codecov Report
@@ Coverage Diff @@
## master #1184 +/- ##
==========================================
+ Coverage 78.92% 79.01% +0.09%
==========================================
Files 215 216 +1
Lines 6262 6270 +8
==========================================
+ Hits 4942 4954 +12
+ Misses 1320 1316 -4
|
using OpenTelemetry.Context; | ||
|
||
namespace OpenTelemetry | ||
{ | ||
public sealed class SuppressInstrumentationScope : IDisposable | ||
{ | ||
private static readonly RuntimeContextSlot<bool> Slot = RuntimeContext.RegisterSlot<bool>("otel.suppress_instrumentation"); | ||
private static readonly RuntimeContextSlot<ActivitySpanId?> ActivitySlot = RuntimeContext.RegisterSlot<ActivitySpanId?>("otel.suppress_instrumentation_activity"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👀
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Even I wanted to avoid this allocation. Suppress is called in OnStartActivity(Activity activity, object payload)
of GrpcClientDiagnosticListener
, OnStopActivity(Activity activity, object payload)
will not get called if we don't have this special case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ugh... this is kinda a bugger, but I understand why you need it. Just thinking out loud, what if this were not a sealed class and instead we had a derived class for this use case of suppressing downstream instrumentation?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@alanwest I feel like this is indicative of a bug in the design.
Am I understanding this flow correctly?
- Grpc activity starts. OnStart is called. Suppression is turned on.
- Http activity starts. OnStart skipped.
- Http activity stops. OnStop skipped.
- Grpc activity stops. OnStop is skipped?
We (somehow) need to make sure that last OnStop fires?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@CodeBlanch , You're correct. This is the current behavior, Grpc activity Onstop is skipped.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@alanwest I feel like this is indicative of a bug in the design.
Either that or maybe this is not quite the right tool for the job. I think the SuppressInstrumentationScope works well from the standpoint of exporters, but from the standpoint of suppressing things within a specific trace I think this is where things get trickier.
I spoke with @reyang a little while back. I was questioning whether this was going to be the right tool for the job of suppressing child activities. I felt that the concept of a "leaf" span or as I believe you called it a "terminal" span may make more sense. Though, Reiley's concerns are valid in that the concept of a terminal span is specific to traces, and that in the future we'll want to be able to suppress downstream metrics and logs as well.
The fact that we're bringing in the notion of an Activity/SpanId into the SuppressInstrumentationScope brings me some pause for similar reasons, but all may not be lost...
Metrics and logs do not have a notion of start/stop like activities do, so I think the standard implementation of the SuppressInstrumentationScope
would work for them. This is why I was pondering an derived version of the SuppressInstrumentationScope
that was "Activity" aware.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if we make the otel.suppress_instrumentation
slot a ref-count (similar like how a reentrant mutex is implemented)?
For example:
- The default value of the slot is
-1
, which means "do not suppress". - Each
SuppressInstrumentationScope
will increment the slot value (no contention since it is async/thread local). - For gRPC case, if slot value > 0, just suppress everything.
- For gPRC case, if the slot value is
0
, theOnStart
adapter will increment the value if user opt-in to omit underlying operations; theOnStop
adapter will decrement the value (if user opt-in to omit underlying operation); adapter will use the value to decide if the processor should be called or not.
In a nutshell, use an integer to count how deep we are in the stack, and based on whether we're stepping into / out the stack, perform suppression accordingly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like where you're going @reyang, though I think there's a little more to this...
OnStop
does not get invoked because of this check here
opentelemetry-dotnet/src/OpenTelemetry/Instrumentation/DiagnosticSourceListener.cs
Lines 40 to 45 in bced8cf
public void OnNext(KeyValuePair<string, object> value) | |
{ | |
if (Sdk.SuppressInstrumentation) | |
{ | |
return; | |
} |
This occurs above the level of the gRPC instrumentation and prevents all instrumentation operations from occurring OnStart
, OnStop
, OnException
, OnCustom
. Similarly, if this were a library that was instrumented with ActivitySource
then the the check happens in the ActivityListener
.
In order to do something like you're suggesting, we'd have to push the check down into the actual instrumentation. This might be ok for the "legacy" instrumentation, but I think things may become less than ideal if this were a library instrumented with ActivitySource
... though I guess in this case it would be the concern of the library author. Which bring me back to the thought, it would be ideal if .NET solved this issue for us 😄.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See if this would work #1202.
I would imagine the usage should be as simple as:
public override void OnStartActivity(Activity activity, object payload)
{
if (options.StopGoingAnyFurther)
{
SuppressInstrumentationScope.Enter();
}
}
/// <summary> | ||
/// Gets or sets a value indicating whether instrumentation is suppressed (disabled). | ||
/// </summary> | ||
public bool SuppressInstrumentation { get; set; } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Must be named something like "SupressChildHttpInstrumentation". And description should explain the impact. @alanwest can probably suggest alternate wordings :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@alanwest , Do you think SupressChildHttpInstrumentation
is a best name to use?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good question. SuppressChildHttpInstrumentation
is ok.
Another thought is to go with something more generic like SuppressDownstreamInstrumentation
. Thought being that we may want to apply this pattern to other instrumentation and this name starts a commonly understood convention for suppressing downstream instrumentation. Essentially "terminate instrumentation beyond this point"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
WCF is an example of an RPC framework that supports multiple underlying protocols that may also be instrumented. Would it make sense to have a separate config setting for each of the protocols, or would the more generic option be best?
/// <summary> | ||
/// Gets or sets a value indicating whether instrumentation is suppressed (disabled). | ||
/// </summary> | ||
public bool SuppressInstrumentation { get; set; } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess the intention is to suppress the underlying HTTP instrumentation (for gRPC over HTTP) so we only have gRPC spans? If this is the case, the name needs to reflect the intention.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree with @reyang. If you are configuring instrumentation, what does "suppress instrumentation" mean? Users are going to be very confused because in their minds, there is just one instrumentation at play. Seems like an on/off switch for the Grpc instrumentation 😄 Maybe something like DisableTransportSpans
? 🤷
using OpenTelemetry.Context; | ||
|
||
namespace OpenTelemetry | ||
{ | ||
public sealed class SuppressInstrumentationScope : IDisposable | ||
{ | ||
private static readonly RuntimeContextSlot<bool> Slot = RuntimeContext.RegisterSlot<bool>("otel.suppress_instrumentation"); | ||
private static readonly RuntimeContextSlot<ActivitySpanId?> ActivitySlot = RuntimeContext.RegisterSlot<ActivitySpanId?>("otel.suppress_instrumentation_activity"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we reuse the same Slot instead of creating new one?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reiley had the same question, just answered above.
@@ -28,16 +28,21 @@ public static class TracerProviderBuilderExtensions | |||
/// Enables gRPClient Instrumentation. | |||
/// </summary> | |||
/// <param name="builder"><see cref="TracerProviderBuilder"/> being configured.</param> | |||
/// <param name="configureGrpcClientInstrumentationOptions">GrpcClient configuration options.</param> | |||
/// <returns>The instance of <see cref="TracerProviderBuilder"/> to chain the calls.</returns> | |||
public static TracerProviderBuilder AddGrpcClientInstrumentation( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@cijothomas @CodeBlanch this makes me wonder if we want AddInstrumentation<GrpcClientInstrumentation>(options)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the idea but I see some challenges.
What we want to do:
AddInstrumentation<GrpcClientInstrumentation>(options => options.Something = true);
Probably has to be more like:
AddInstrumentation<GrpcClientInstrumentation, GrpcClientInstrumentationOptions>(options => options.Something = true);
Right there we lose a lot of value. The user has to know the instrumentation type and the options type.
What the method has to do:
public static TracerProviderBuilder AddInstrumentation<TInstrumentation, TOptions>(
this TracerProviderBuilder builder,
Action<TOptions> configure = null)
where TOptions : new()
{
TOptions options = new TOptions();
configure?.Invoke(options);
builder.AddInstrumentation((activitySource) => new TInstrumentation(activitySource, options));
return builder;
}
The new TInstrumentation(activitySource, options)
won't compile. We'll have to do a factory or use reflection kind of like how ASP.NET Core does middleware. I think that would be worth pursuing but having to specify the two types gives me pause.
src/OpenTelemetry.Instrumentation.GrpcNetClient/TracerProviderBuilderExtensions.cs
Outdated
Show resolved
Hide resolved
this.activitySource = activitySource; | ||
} | ||
|
||
public override void OnStartActivity(Activity activity, object payload) | ||
{ | ||
SuppressInstrumentationScope.Begin(this.options.SuppressInstrumentation); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Begin
returns a scope that should be disposed at some point. Is this happening somehow?
Also, can we change this to the following to avoid allocating a scope?
if (this.options.SuppressInstrumentation) SuppressInstrumentationScope.Begin()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Dispose
is not dis-allocating any object. It just resets the slot to previous value. I think call to dispose
is not required here. Collected heap snapshot and verified SuppressInstrumentationScope
getting cleared up after stop activity. Please let me know if I miss something here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is sound when AsyncLocal/logical call context is the backing store. Though we might run into issues if the the underlying ContextSlotType is changed to ThreadLocal. Not sure this use case is actually possible today, though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@reyang It may make sense to refactor the RuntimeContext stuff a bit to ensure that this is not possible. That is, I think we'll always want an AsyncLocal
based store for the SuppressInstrumentationScope
. I don't think we'd want people to change the ContextSlotType
via the API to something else and then have SuppressInstrumentationScope
adopt whatever the user set it to. Thoughts?
Maybe something like this for use on the SuppressInstrumentationScope
class
// Uses AsyncLocal for 4.6+ and logical call context for 4.5.
RuntimeContext.RegisterAsyncStorageSlot<bool>("otel.suppress_instrumentation");
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It just resets the slot to previous value.
@rajkumar-rangaraj Isn't that the point though? Let's say this Grpc call is part of some other process the user is executing. If we don't reset the flag, it stays set and flows onward, no? Could that lead to accidental suppression of subsequent things?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense, I will get this covered.
…rajkumar-rangaraj/opentelemetry-dotnet into rajrang/suppressinstrumentation
src/OpenTelemetry.Instrumentation.GrpcNetClient/TracerProviderBuilderExtensions.cs
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
LGTM, too. Though I believe there are a few caveats with this implementation which may just be theoretical and not worth being concerned with unless an actual use case is encountered:
Neither of these caveats are concerns for the gRPC library, but just noting them. |
And the comparison implementation is very inefficient (O(M*N) where M is the number of suffixes and N is the average strlen of the suffixes). |
Fixes part of #893 .
Changes
GrpcClientInstrumentationOptions.SuppressInstrumentation
to suppress child activity creationSuppressInstrumentationScope
to ensure stop activity is calledFor significant contributions please make sure you have completed the following items:
CHANGELOG.md
updated for non-trivial changes