Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add SuppressInstrumentation to Grpc client #1184

Merged
merged 14 commits into from
Sep 10, 2020
Merged

Add SuppressInstrumentation to Grpc client #1184

merged 14 commits into from
Sep 10, 2020

Conversation

rajkumar-rangaraj
Copy link
Contributor

@rajkumar-rangaraj rajkumar-rangaraj commented Aug 27, 2020

Fixes part of #893 .

Changes

  • Introduced GrpcClientInstrumentationOptions.SuppressInstrumentation to suppress child activity creation
  • Added an additional check in SuppressInstrumentationScope to ensure stop activity is called
  • Created a test to validate the scenario

For significant contributions please make sure you have completed the following items:

  • CHANGELOG.md updated for non-trivial changes
  • Design discussion issue #
  • Changes in public API reviewed

@rajkumar-rangaraj rajkumar-rangaraj requested a review from a team August 27, 2020 19:29
@rajkumar-rangaraj rajkumar-rangaraj changed the title Added SuppressInstrumentation to Grpc client Add SuppressInstrumentation to Grpc client Aug 27, 2020
@codecov
Copy link

codecov bot commented Aug 27, 2020

Codecov Report

Merging #1184 into master will increase coverage by 0.09%.
The diff coverage is 100.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #1184      +/-   ##
==========================================
+ Coverage   78.92%   79.01%   +0.09%     
==========================================
  Files         215      216       +1     
  Lines        6262     6270       +8     
==========================================
+ Hits         4942     4954      +12     
+ Misses       1320     1316       -4     
Impacted Files Coverage Δ
...ntation.GrpcNetClient/GrpcClientInstrumentation.cs 87.50% <100.00%> (ø)
....GrpcNetClient/GrpcClientInstrumentationOptions.cs 100.00% <100.00%> (ø)
...ent/Implementation/GrpcClientDiagnosticListener.cs 90.32% <100.00%> (+1.03%) ⬆️
...n.GrpcNetClient/TracerProviderBuilderExtensions.cs 100.00% <100.00%> (ø)
...emetry/Instrumentation/DiagnosticSourceListener.cs 79.16% <100.00%> (+1.89%) ⬆️
...us/Implementation/PrometheusExporterEventSource.cs 63.63% <0.00%> (-9.10%) ⬇️
...emetry.Api/Internal/OpenTelemetryApiEventSource.cs 61.76% <0.00%> (-2.95%) ⬇️
src/OpenTelemetry/SuppressInstrumentationScope.cs 100.00% <0.00%> (+26.08%) ⬆️

using OpenTelemetry.Context;

namespace OpenTelemetry
{
public sealed class SuppressInstrumentationScope : IDisposable
{
private static readonly RuntimeContextSlot<bool> Slot = RuntimeContext.RegisterSlot<bool>("otel.suppress_instrumentation");
private static readonly RuntimeContextSlot<ActivitySpanId?> ActivitySlot = RuntimeContext.RegisterSlot<ActivitySpanId?>("otel.suppress_instrumentation_activity");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👀

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even I wanted to avoid this allocation. Suppress is called in OnStartActivity(Activity activity, object payload) of GrpcClientDiagnosticListener, OnStopActivity(Activity activity, object payload) will not get called if we don't have this special case.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ugh... this is kinda a bugger, but I understand why you need it. Just thinking out loud, what if this were not a sealed class and instead we had a derived class for this use case of suppressing downstream instrumentation?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@alanwest I feel like this is indicative of a bug in the design.

Am I understanding this flow correctly?

  • Grpc activity starts. OnStart is called. Suppression is turned on.
  • Http activity starts. OnStart skipped.
  • Http activity stops. OnStop skipped.
  • Grpc activity stops. OnStop is skipped?

We (somehow) need to make sure that last OnStop fires?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@CodeBlanch , You're correct. This is the current behavior, Grpc activity Onstop is skipped.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@alanwest I feel like this is indicative of a bug in the design.

Either that or maybe this is not quite the right tool for the job. I think the SuppressInstrumentationScope works well from the standpoint of exporters, but from the standpoint of suppressing things within a specific trace I think this is where things get trickier.

I spoke with @reyang a little while back. I was questioning whether this was going to be the right tool for the job of suppressing child activities. I felt that the concept of a "leaf" span or as I believe you called it a "terminal" span may make more sense. Though, Reiley's concerns are valid in that the concept of a terminal span is specific to traces, and that in the future we'll want to be able to suppress downstream metrics and logs as well.

The fact that we're bringing in the notion of an Activity/SpanId into the SuppressInstrumentationScope brings me some pause for similar reasons, but all may not be lost...

Metrics and logs do not have a notion of start/stop like activities do, so I think the standard implementation of the SuppressInstrumentationScope would work for them. This is why I was pondering an derived version of the SuppressInstrumentationScope that was "Activity" aware.

Copy link
Member

@reyang reyang Aug 28, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if we make the otel.suppress_instrumentation slot a ref-count (similar like how a reentrant mutex is implemented)?
For example:

  • The default value of the slot is -1, which means "do not suppress".
  • Each SuppressInstrumentationScope will increment the slot value (no contention since it is async/thread local).
  • For gRPC case, if slot value > 0, just suppress everything.
  • For gPRC case, if the slot value is 0, the OnStart adapter will increment the value if user opt-in to omit underlying operations; the OnStop adapter will decrement the value (if user opt-in to omit underlying operation); adapter will use the value to decide if the processor should be called or not.

In a nutshell, use an integer to count how deep we are in the stack, and based on whether we're stepping into / out the stack, perform suppression accordingly.

Copy link
Member

@alanwest alanwest Aug 28, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like where you're going @reyang, though I think there's a little more to this...

OnStop does not get invoked because of this check here

public void OnNext(KeyValuePair<string, object> value)
{
if (Sdk.SuppressInstrumentation)
{
return;
}

This occurs above the level of the gRPC instrumentation and prevents all instrumentation operations from occurring OnStart, OnStop, OnException, OnCustom. Similarly, if this were a library that was instrumented with ActivitySource then the the check happens in the ActivityListener.

In order to do something like you're suggesting, we'd have to push the check down into the actual instrumentation. This might be ok for the "legacy" instrumentation, but I think things may become less than ideal if this were a library instrumented with ActivitySource... though I guess in this case it would be the concern of the library author. Which bring me back to the thought, it would be ideal if .NET solved this issue for us 😄.

Copy link
Member

@reyang reyang Aug 29, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See if this would work #1202.

I would imagine the usage should be as simple as:

public override void OnStartActivity(Activity activity, object payload)
{
    if (options.StopGoingAnyFurther)
    {
        SuppressInstrumentationScope.Enter();
    }
}

/// <summary>
/// Gets or sets a value indicating whether instrumentation is suppressed (disabled).
/// </summary>
public bool SuppressInstrumentation { get; set; }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Must be named something like "SupressChildHttpInstrumentation". And description should explain the impact. @alanwest can probably suggest alternate wordings :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@alanwest , Do you think SupressChildHttpInstrumentation is a best name to use?

Copy link
Member

@alanwest alanwest Aug 27, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question. SuppressChildHttpInstrumentation is ok.

Another thought is to go with something more generic like SuppressDownstreamInstrumentation. Thought being that we may want to apply this pattern to other instrumentation and this name starts a commonly understood convention for suppressing downstream instrumentation. Essentially "terminate instrumentation beyond this point"

Copy link
Member

@alanwest alanwest Aug 27, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

WCF is an example of an RPC framework that supports multiple underlying protocols that may also be instrumented. Would it make sense to have a separate config setting for each of the protocols, or would the more generic option be best?

/// <summary>
/// Gets or sets a value indicating whether instrumentation is suppressed (disabled).
/// </summary>
public bool SuppressInstrumentation { get; set; }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess the intention is to suppress the underlying HTTP instrumentation (for gRPC over HTTP) so we only have gRPC spans? If this is the case, the name needs to reflect the intention.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree with @reyang. If you are configuring instrumentation, what does "suppress instrumentation" mean? Users are going to be very confused because in their minds, there is just one instrumentation at play. Seems like an on/off switch for the Grpc instrumentation 😄 Maybe something like DisableTransportSpans? 🤷

using OpenTelemetry.Context;

namespace OpenTelemetry
{
public sealed class SuppressInstrumentationScope : IDisposable
{
private static readonly RuntimeContextSlot<bool> Slot = RuntimeContext.RegisterSlot<bool>("otel.suppress_instrumentation");
private static readonly RuntimeContextSlot<ActivitySpanId?> ActivitySlot = RuntimeContext.RegisterSlot<ActivitySpanId?>("otel.suppress_instrumentation_activity");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we reuse the same Slot instead of creating new one?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reiley had the same question, just answered above.

@@ -28,16 +28,21 @@ public static class TracerProviderBuilderExtensions
/// Enables gRPClient Instrumentation.
/// </summary>
/// <param name="builder"><see cref="TracerProviderBuilder"/> being configured.</param>
/// <param name="configureGrpcClientInstrumentationOptions">GrpcClient configuration options.</param>
/// <returns>The instance of <see cref="TracerProviderBuilder"/> to chain the calls.</returns>
public static TracerProviderBuilder AddGrpcClientInstrumentation(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cijothomas @CodeBlanch this makes me wonder if we want AddInstrumentation<GrpcClientInstrumentation>(options)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the idea but I see some challenges.

What we want to do:

AddInstrumentation<GrpcClientInstrumentation>(options => options.Something = true);

Probably has to be more like:

AddInstrumentation<GrpcClientInstrumentation, GrpcClientInstrumentationOptions>(options => options.Something = true);

Right there we lose a lot of value. The user has to know the instrumentation type and the options type.

What the method has to do:

        public static TracerProviderBuilder AddInstrumentation<TInstrumentation, TOptions>(
            this TracerProviderBuilder builder,
            Action<TOptions> configure = null)
            where TOptions : new()
        {
            TOptions options = new TOptions();
            configure?.Invoke(options);

            builder.AddInstrumentation((activitySource) => new TInstrumentation(activitySource, options));

            return builder;
        }

The new TInstrumentation(activitySource, options) won't compile. We'll have to do a factory or use reflection kind of like how ASP.NET Core does middleware. I think that would be worth pursuing but having to specify the two types gives me pause.

this.activitySource = activitySource;
}

public override void OnStartActivity(Activity activity, object payload)
{
SuppressInstrumentationScope.Begin(this.options.SuppressInstrumentation);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Begin returns a scope that should be disposed at some point. Is this happening somehow?

Also, can we change this to the following to avoid allocating a scope?

if (this.options.SuppressInstrumentation) SuppressInstrumentationScope.Begin()

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dispose is not dis-allocating any object. It just resets the slot to previous value. I think call to dispose is not required here. Collected heap snapshot and verified SuppressInstrumentationScope getting cleared up after stop activity. Please let me know if I miss something here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is sound when AsyncLocal/logical call context is the backing store. Though we might run into issues if the the underlying ContextSlotType is changed to ThreadLocal. Not sure this use case is actually possible today, though.

Copy link
Member

@alanwest alanwest Aug 27, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@reyang It may make sense to refactor the RuntimeContext stuff a bit to ensure that this is not possible. That is, I think we'll always want an AsyncLocal based store for the SuppressInstrumentationScope. I don't think we'd want people to change the ContextSlotType via the API to something else and then have SuppressInstrumentationScope adopt whatever the user set it to. Thoughts?

Maybe something like this for use on the SuppressInstrumentationScope class

// Uses AsyncLocal for 4.6+ and logical call context for 4.5.
RuntimeContext.RegisterAsyncStorageSlot<bool>("otel.suppress_instrumentation");

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It just resets the slot to previous value.

@rajkumar-rangaraj Isn't that the point though? Let's say this Grpc call is part of some other process the user is executing. If we don't reset the flag, it stays set and flows onward, no? Could that lead to accidental suppression of subsequent things?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense, I will get this covered.

Copy link
Member

@reyang reyang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

Copy link
Member

@CodeBlanch CodeBlanch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@alanwest
Copy link
Member

alanwest commented Sep 9, 2020

LGTM, too.

Though I believe there are a few caveats with this implementation which may just be theoretical and not worth being concerned with unless an actual use case is encountered:

  1. I'm thinking of a scenario where instrumentation for a library does SuppressInstrumentationScope.Enter(); and the library also writes DiagnosticSource events that would normally invoke the OnCustom or OnException - these would be prevented from being invoked.
  2. I also think this implementation relies on the library writing events suffixed with Start/Stop.

Neither of these caveats are concerns for the gRPC library, but just noting them.

@reyang
Copy link
Member

reyang commented Sep 9, 2020

  1. I also think this implementation relies on the library writing events suffixed with Start/Stop.

And the comparison implementation is very inefficient (O(M*N) where M is the number of suffixes and N is the average strlen of the suffixes).

@cijothomas cijothomas merged commit 4ae9104 into open-telemetry:master Sep 10, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants