Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add logging sampling #5574

Draft
wants to merge 38 commits into
base: main
Choose a base branch
from

Conversation

evgenyfedorov2
Copy link
Contributor

@evgenyfedorov2 evgenyfedorov2 commented Oct 25, 2024

Related to the #5123 proposal, this PR is focused on the logging sampling only. The buffering part will follow in a separate PR soon.

Microsoft Reviewers: Open in CodeFlow

@evgenyfedorov2
Copy link
Contributor Author

@noahfalk @geeknoid

@geeknoid
Copy link
Member

In my doc, I had separated the idea of matching from what you do once you match. In this PR, these two things have been coupled as captured in the SamplingParameters struct.

The reason I think we need to separate "matching" from "actions upon matching" is that we have multiple actions possible. Once a record matches, we want to globally filter it, globally buffer it, filter it at the request level, and buffer it at the request level.

/// <param name="logLevel"><see cref="Microsoft.Extensions.Logging.LogLevel"/> of the log record.</param>
/// <param name="category">Category of the log record.</param>
/// <param name="eventId"><see cref="Microsoft.Extensions.Logging.EventId"/> of the log record.</param>
public SamplingParameters(LogLevel? logLevel, string? category, EventId? eventId)
Copy link
Member

@samsp-msft samsp-msft Oct 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this also include an Activity/TraceId associated with the active request for the log message?

I see you fetch it from Activity.Current - if we have it already, lets pass it in, as the lookup is using AsyncState which is not great.

In many cases the actual Id does not matter so much as whether the log message is being delivered in the context of a request or not.
If some kind of head sampling is being performed, then using the traceId so that you can sample at the request level rather than the log level may be important.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if we have it already, lets pass it in

I don't think the code will already have it unforetunately. There is the opt-in feature to include TraceIds as part of a logging scope (https://source.dot.net/#Microsoft.Extensions.Logging/LoggerFactoryScopeProvider.cs,35), but that is an API the LoggerProvider calls back to after the LoggerFactory already invoked ILogger.Log() on the provider. I don't see a good way that to share the reference that wouldn't wind up being more expensive than doing two independent queries. AsyncLocal lookups certainly cost more than a field lookup, but thankfully not that much more, probably 5-10ns.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it can be handled by creating a new sampler object that can sample on Activity info? I guess in some cases, users may create aggregated sampler encapsulating more than one sample inside.

Also, should we expose samplers like TraceBasedSampler so users can manually create it and wrap inside other custom samplers?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it can be handled by creating a new sampler object that can sample on Activity info?

Yeah, anyone could write a sampler that follows a similar approach to the TraceBasedSampler if they want to. They could also use the API that takes a delegate:

logging.AddSampler( p => Activity.Current?.Recorded );

Also, should we expose samplers like TraceBasedSampler so users can manually create it and wrap inside other custom samplers?

My preference would be not to add more API surface given that developers could trivially reproduce the logic with 1 line of code if that's the behavior they want. Its not a big deal to me either way though.

@@ -266,6 +265,12 @@ private void ModernPath(LogLevel logLevel, EventId eventId, LoggerMessageState m
ref readonly MessageLogger loggerInfo = ref loggers[i];
if (loggerInfo.IsNotFilteredOut(logLevel))
{
if (!config.Sampler.ShouldSample(new SamplingParameters(logLevel, loggerInfo.Category!, eventId)))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume we only want to sample a given record once and then reuse that result for all loggers. It would be quite surprising if a sampling policy of 'Random.Shared.GetNext() > 0.5' meant the message is randomly sent to some providers but not others.

Copy link
Contributor Author

@evgenyfedorov2 evgenyfedorov2 Oct 30, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But this is a for loop through different loggers which can (and usually do) have different categories, and since we pass the Category to the sampler via SamplingParameters, there might be a custom sampler which makes sampling decisions based on the Category value.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But this is a for loop through different loggers which can (and usually do) have different categories

Thats not what I'd expect? Unless the R9 implementation is using the same named types to mean something different than they do in the M.E.L implementation, I'd expect you have one ExtendedLogger instance per category. The MessageLoggers array being looped over represents the Logger created by ILoggerProvider.CreateLogger() for that one category.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated

/// <param name="logLevel"><see cref="Microsoft.Extensions.Logging.LogLevel"/> of the log record.</param>
/// <param name="category">Category of the log record.</param>
/// <param name="eventId"><see cref="Microsoft.Extensions.Logging.EventId"/> of the log record.</param>
public SamplingParameters(LogLevel? logLevel, string? category, EventId? eventId)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if we have it already, lets pass it in

I don't think the code will already have it unforetunately. There is the opt-in feature to include TraceIds as part of a logging scope (https://source.dot.net/#Microsoft.Extensions.Logging/LoggerFactoryScopeProvider.cs,35), but that is an API the LoggerProvider calls back to after the LoggerFactory already invoked ILogger.Log() on the provider. I don't see a good way that to share the reference that wouldn't wind up being more expensive than doing two independent queries. AsyncLocal lookups certainly cost more than a field lookup, but thankfully not that much more, probably 5-10ns.

@@ -43,6 +46,7 @@ public ExtendedLoggerFactory(
#pragma warning restore S107 // Methods should not have too many parameters
{
_scopeProvider = scopeProvider;
_sampler = sampler ?? new AlwaysOnSampler();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Performance-wise its probably a little bit faster to execute _sampler == null ? true : _sampler.ShouldSample() instead of invoking _sampler.ShouldSample() when no sampler was provided. You can do a little microbenchmark to confirm.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will keep this thread open and update later

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agree with @noah. Also it will be a way to check if the logger is created with sampler or not too.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seeing AlwaysOnSampler is internal, this make my previous comment is not accurate.

internal sealed class TraceBasedSampler : LoggerSampler
{
public override bool ShouldSample(SamplingParameters _) =>
Activity.Current?.Recorded ?? false;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We might want no Activity to be true, or maybe to be configurable via the API. This is a spot where some experimental feedback feels very useful.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: this can be written like

Activity.Current?.Recorded is true. will invoke Current once.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Taking into account the @noahfalk comment that we should sample in if there is no Activity, I think this makes sense
Activity.Current?.Recorded ?? true

@noahfalk
Copy link
Member

@tarekgh - not sure if you have seen this yet?

public SamplingParameters(LogLevel logLevel, string category, EventId eventId)
{
LogLevel = logLevel;
Category = category;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Category = category;

what happen if someone forced null value? should we intentionally not allow that here?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you proposing adding Throw.IfNull(category) check? I assume at the moment if you passed null then its possible you get a NullReferenceException inside the call to ShouldSample() depending on its implementation.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you proposing adding Throw.IfNull(category) check?

Yes.

I assume at the moment if you passed null then its possible you get a NullReferenceException inside the call to ShouldSample() depending on its implementation.

Getting NullReferenceException will be not a good experience. Get exception when creating SamplingParameters will be much better and informative.

/// Contains the parameters helping make sampling decisions for logs.
/// </summary>
[Experimental(diagnosticId: DiagnosticIds.Experiments.Telemetry, UrlFormat = DiagnosticIds.UrlFormat)]
public readonly struct SamplingParameters : IEquatable<SamplingParameters>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SamplingParameters

should we call it SamplingOptions better? we use Options everywhere and it convey the same meaning.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here it is called SamplingParameters, that's why I have decided to re-use this name. Options kind of names are usually used to represent config with the IOptions<> pattern, so might not be the best choice here. What do you think?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SamplingParameters is ok. I was only trying to get attention if we thought about it. Let us stick with that name if no-one else has any concern about it. By the way, I tried to look at OTEL specs just in case they suggest something but couldn't find any info there.

@tarekgh
Copy link
Member

tarekgh commented Oct 30, 2024

Reviewed, in general, looks good. I added a minor question comments.

@evgenyfedorov2
Copy link
Contributor Author

In my doc, I had separated the idea of matching from what you do once you match. In this PR, these two things have been coupled as captured in the SamplingParameters struct.

The reason I think we need to separate "matching" from "actions upon matching" is that we have multiple actions possible. Once a record matches, we want to globally filter it, globally buffer it, filter it at the request level, and buffer it at the request level.

Discussed offline. Added configuration support allowing for specifying matching conditions per action. The action is only one for now - Ratio based sampler itself.

/// Represents a rule used for filtering log messages for purposes of log sampling and buffering.
/// </summary>
[Experimental(diagnosticId: DiagnosticIds.Experiments.Telemetry, UrlFormat = DiagnosticIds.UrlFormat)]
public interface ILoggerSamplerFilterRule
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd suggest making this interface internal

/// <summary>
/// Gets the filter delegate that would be applied to messages that passed the <see cref="LogLevel"/>.
/// </summary>
public Func<string?, LogLevel?, int?, bool>? Filter { get; }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this API needs to be included in the interface because the rule selector doesn't access it.

@@ -39,6 +38,12 @@ public ExtendedLogger(ExtendedLoggerFactory factory, LoggerInformation[] loggers

public void Log<TState>(LogLevel logLevel, EventId eventId, TState state, Exception? exception, Func<TState, Exception?, string> formatter)
{
if (MessageLoggers.Length == 0 || !_factory.Config.Sampler.ShouldSample(new SamplingParameters(logLevel, MessageLoggers[0].Category, eventId)))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we want to run the sampler after we know the log record is enabled in at least one logger. For example if someone wanted to make a rate limiting sampler that logs no more than 1000 messages per second they might write:

class Sampler
{
    int _count; // pretend this gets reset to zero on timer every second
    bool ShouldSample(...) => _count++ < 1000;
}

If the app has lots of logging instrumentation that uses the Trace logging level, but the app config has trace logging disabled the app developer may not get any messages logged at all. The calls to Log() at trace level use up the entire quota of 1000 messages in the sampler only to get filtered out later by the Logger.IsEnabled() checks.

public double Probability { get; set; }

/// <inheritdoc/>
public Func<string?, LogLevel?, int?, bool>? Filter { get; }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This API isn't used right? There isn't going to be any filtering other than probability check.

/// Gets or sets the collection of <see cref="RatioBasedSamplerFilterRule"/> used for filtering log messages.
/// </summary>
#pragma warning disable CA1002 // Do not expose generic lists - List is necessary to be able to call .AddRange()
#pragma warning disable CA2227 // Collection properties should be read only - setter is necessary for options pattern
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume its necessary if the implementation calls section.Get<RatioBasedSamplerOptions>(), but it wouldn't be necessary with a more manual implementation of the config parsing. I don't know how much this matters but if the API review folks wanted this not to be settable we could do it.

// 2. If there nothing matched by category take all rules without category
// 3. If there is only one rule use it's level and filter
// 4. If there are multiple rules use last
// 5. If there are no applicable rules use global minimal level
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment looks like the precedence rules for the Logging.LogLevels, but presumably we'll need slightly different rules here. Probably something like this:

  1. Rules with an EventId take precedence over those without an EventId
  2. Rules with a longer category string take precedence over shorter string or no string
  3. Rules with lower LogLevel take precedence over higher LogLevel
  4. If there are still multiple rules, take the last

/// <param name="configuration">The <see cref="IConfiguration" /> to add.</param>
/// <returns>The value of <paramref name="builder"/>.</returns>
/// <exception cref="ArgumentNullException"><paramref name="builder"/> is <see langword="null"/>.</exception>
public static ILoggingBuilder AddRatioBasedSamplerConfiguration(this ILoggingBuilder builder, IConfiguration configuration)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this need to be public? I'm not sure what scenario would use it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, my thinking was to have this method to be able to call it from HostBuilder, similar to AddConfiguration() for logging here

/// <param name="configuration">The <see cref="IConfiguration" /> to add.</param>
/// <returns>The value of <paramref name="builder"/>.</returns>
/// <exception cref="ArgumentNullException"><paramref name="builder"/> is <see langword="null"/>.</exception>
public static ILoggingBuilder AddRatioBasedSampler(this ILoggingBuilder builder, IConfiguration configuration)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm thinking we may want to rename all the places that say 'RatioBasedSampler' to 'ProbabilitySampler'. What do other folks think?

probability = 0.0;

// TO DO: check if we can optimize this. It is a hot path and
// we should be able to minimize number of rule selections on every log record.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree we'd want to optimize it but it doesn't impact the design review much. I think its good you left it as a TODO 👍.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants