Add logging sampling #5574

evgenyfedorov2 · 2024-10-25T16:43:40Z

Related to the #5123 proposal, this PR is focused on the logging sampling only. The buffering part will follow in a separate PR soon.

Microsoft Reviewers: Open in CodeFlow

evgenyfedorov2 · 2024-10-29T11:46:36Z

@noahfalk @geeknoid

geeknoid · 2024-10-29T18:02:01Z

In my doc, I had separated the idea of matching from what you do once you match. In this PR, these two things have been coupled as captured in the SamplingParameters struct.

The reason I think we need to separate "matching" from "actions upon matching" is that we have multiple actions possible. Once a record matches, we want to globally filter it, globally buffer it, filter it at the request level, and buffer it at the request level.

samsp-msft · 2024-10-29T18:31:28Z

src/Libraries/Microsoft.Extensions.Telemetry.Abstractions/Sampling/SamplingParameters.cs

+    /// <param name="logLevel"><see cref="Microsoft.Extensions.Logging.LogLevel"/> of the log record.</param>
+    /// <param name="category">Category of the log record.</param>
+    /// <param name="eventId"><see cref="Microsoft.Extensions.Logging.EventId"/> of the log record.</param>
+    public SamplingParameters(LogLevel? logLevel, string? category, EventId? eventId)


Should this also include an Activity/TraceId associated with the active request for the log message?

I see you fetch it from Activity.Current - if we have it already, lets pass it in, as the lookup is using AsyncState which is not great.

In many cases the actual Id does not matter so much as whether the log message is being delivered in the context of a request or not.
If some kind of head sampling is being performed, then using the traceId so that you can sample at the request level rather than the log level may be important.

if we have it already, lets pass it in

I don't think the code will already have it unforetunately. There is the opt-in feature to include TraceIds as part of a logging scope (https://source.dot.net/#Microsoft.Extensions.Logging/LoggerFactoryScopeProvider.cs,35), but that is an API the LoggerProvider calls back to after the LoggerFactory already invoked ILogger.Log() on the provider. I don't see a good way that to share the reference that wouldn't wind up being more expensive than doing two independent queries. AsyncLocal lookups certainly cost more than a field lookup, but thankfully not that much more, probably 5-10ns.

Is it can be handled by creating a new sampler object that can sample on Activity info? I guess in some cases, users may create aggregated sampler encapsulating more than one sample inside.

Also, should we expose samplers like TraceBasedSampler so users can manually create it and wrap inside other custom samplers?

Is it can be handled by creating a new sampler object that can sample on Activity info?

Yeah, anyone could write a sampler that follows a similar approach to the TraceBasedSampler if they want to. They could also use the API that takes a delegate:

logging.AddSampler( p => Activity.Current?.Recorded );

Also, should we expose samplers like TraceBasedSampler so users can manually create it and wrap inside other custom samplers?

My preference would be not to add more API surface given that developers could trivially reproduce the logic with 1 line of code if that's the behavior they want. Its not a big deal to me either way though.

src/Libraries/Microsoft.Extensions.Telemetry.Abstractions/Sampling/SamplingParameters.cs

noahfalk · 2024-10-29T23:03:59Z

src/Libraries/Microsoft.Extensions.Telemetry/Logging/ExtendedLogger.cs

@@ -266,6 +265,12 @@ private void ModernPath(LogLevel logLevel, EventId eventId, LoggerMessageState m
            ref readonly MessageLogger loggerInfo = ref loggers[i];
            if (loggerInfo.IsNotFilteredOut(logLevel))
            {
+                if (!config.Sampler.ShouldSample(new SamplingParameters(logLevel, loggerInfo.Category!, eventId)))


I assume we only want to sample a given record once and then reuse that result for all loggers. It would be quite surprising if a sampling policy of 'Random.Shared.GetNext() > 0.5' meant the message is randomly sent to some providers but not others.

But this is a for loop through different loggers which can (and usually do) have different categories, and since we pass the Category to the sampler via SamplingParameters, there might be a custom sampler which makes sampling decisions based on the Category value.

But this is a for loop through different loggers which can (and usually do) have different categories

Thats not what I'd expect? Unless the R9 implementation is using the same named types to mean something different than they do in the M.E.L implementation, I'd expect you have one ExtendedLogger instance per category. The MessageLoggers array being looped over represents the Logger created by ILoggerProvider.CreateLogger() for that one category.

noahfalk · 2024-10-29T23:19:28Z

src/Libraries/Microsoft.Extensions.Telemetry.Abstractions/Sampling/SamplingParameters.cs

+    /// <param name="logLevel"><see cref="Microsoft.Extensions.Logging.LogLevel"/> of the log record.</param>
+    /// <param name="category">Category of the log record.</param>
+    /// <param name="eventId"><see cref="Microsoft.Extensions.Logging.EventId"/> of the log record.</param>
+    public SamplingParameters(LogLevel? logLevel, string? category, EventId? eventId)


if we have it already, lets pass it in

I don't think the code will already have it unforetunately. There is the opt-in feature to include TraceIds as part of a logging scope (https://source.dot.net/#Microsoft.Extensions.Logging/LoggerFactoryScopeProvider.cs,35), but that is an API the LoggerProvider calls back to after the LoggerFactory already invoked ILogger.Log() on the provider. I don't see a good way that to share the reference that wouldn't wind up being more expensive than doing two independent queries. AsyncLocal lookups certainly cost more than a field lookup, but thankfully not that much more, probably 5-10ns.

noahfalk · 2024-10-29T23:22:18Z

src/Libraries/Microsoft.Extensions.Telemetry/Logging/ExtendedLoggerFactory.cs

@@ -43,6 +46,7 @@ public ExtendedLoggerFactory(
 #pragma warning restore S107 // Methods should not have too many parameters
    {
        _scopeProvider = scopeProvider;
+        _sampler = sampler ?? new AlwaysOnSampler();


Performance-wise its probably a little bit faster to execute _sampler == null ? true : _sampler.ShouldSample() instead of invoking _sampler.ShouldSample() when no sampler was provided. You can do a little microbenchmark to confirm.

I will keep this thread open and update later

agree with @noah. Also it will be a way to check if the logger is created with sampler or not too.

seeing AlwaysOnSampler is internal, this make my previous comment is not accurate.

src/Libraries/Microsoft.Extensions.Telemetry/Sampling/SamplingLoggerBuilderExtensions.cs

noahfalk · 2024-10-29T23:37:49Z

src/Libraries/Microsoft.Extensions.Telemetry/Sampling/TraceBasedSampler.cs

+internal sealed class TraceBasedSampler : LoggerSampler
+{
+    public override bool ShouldSample(SamplingParameters _) =>
+        Activity.Current?.Recorded ?? false;


We might want no Activity to be true, or maybe to be configurable via the API. This is a spot where some experimental feedback feels very useful.

nit: this can be written like

Activity.Current?.Recorded is true. will invoke Current once.

Taking into account the @noahfalk comment that we should sample in if there is no Activity, I think this makes sense
Activity.Current?.Recorded ?? true

src/Libraries/Microsoft.Extensions.Telemetry.Abstractions/Sampling/SamplingParameters.cs

src/Libraries/Microsoft.Extensions.Telemetry/Sampling/RatioBasedSampler.cs

src/Libraries/Microsoft.Extensions.Telemetry/Sampling/SamplingLoggerBuilderExtensions.cs

noahfalk · 2024-10-30T07:47:21Z

@tarekgh - not sure if you have seen this yet?

tarekgh · 2024-10-30T16:37:28Z

src/Libraries/Microsoft.Extensions.Telemetry.Abstractions/Sampling/SamplingParameters.cs

+    public SamplingParameters(LogLevel logLevel, string category, EventId eventId)
+    {
+        LogLevel = logLevel;
+        Category = category;


Category = category;

what happen if someone forced null value? should we intentionally not allow that here?

Are you proposing adding Throw.IfNull(category) check? I assume at the moment if you passed null then its possible you get a NullReferenceException inside the call to ShouldSample() depending on its implementation.

Are you proposing adding Throw.IfNull(category) check?

Yes.

I assume at the moment if you passed null then its possible you get a NullReferenceException inside the call to ShouldSample() depending on its implementation.

Getting NullReferenceException will be not a good experience. Get exception when creating SamplingParameters will be much better and informative.

src/Libraries/Microsoft.Extensions.Telemetry.Abstractions/Sampling/SamplingParameters.cs

tarekgh · 2024-10-30T16:43:29Z

src/Libraries/Microsoft.Extensions.Telemetry.Abstractions/Sampling/SamplingParameters.cs

+/// Contains the parameters helping make sampling decisions for logs.
+/// </summary>
+[Experimental(diagnosticId: DiagnosticIds.Experiments.Telemetry, UrlFormat = DiagnosticIds.UrlFormat)]
+public readonly struct SamplingParameters : IEquatable<SamplingParameters>


SamplingParameters

should we call it SamplingOptions better? we use Options everywhere and it convey the same meaning.

Here it is called SamplingParameters, that's why I have decided to re-use this name. Options kind of names are usually used to represent config with the IOptions<> pattern, so might not be the best choice here. What do you think?

SamplingParameters is ok. I was only trying to get attention if we thought about it. Let us stick with that name if no-one else has any concern about it. By the way, I tried to look at OTEL specs just in case they suggest something but couldn't find any info there.

tarekgh · 2024-10-30T17:04:18Z

Reviewed, in general, looks good. I added a minor question comments.

evgenyfedorov2 · 2024-11-06T16:07:41Z

In my doc, I had separated the idea of matching from what you do once you match. In this PR, these two things have been coupled as captured in the SamplingParameters struct.

The reason I think we need to separate "matching" from "actions upon matching" is that we have multiple actions possible. Once a record matches, we want to globally filter it, globally buffer it, filter it at the request level, and buffer it at the request level.

Discussed offline. Added configuration support allowing for specifying matching conditions per action. The action is only one for now - Ratio based sampler itself.

noahfalk · 2024-11-07T01:25:19Z

src/Libraries/Microsoft.Extensions.Telemetry.Abstractions/Sampling/ILoggerSamplerFilterRule.cs

+/// Represents a rule used for filtering log messages for purposes of log sampling and buffering.
+/// </summary>
+[Experimental(diagnosticId: DiagnosticIds.Experiments.Telemetry, UrlFormat = DiagnosticIds.UrlFormat)]
+public interface ILoggerSamplerFilterRule


I'd suggest making this interface internal

noahfalk · 2024-11-07T01:30:59Z

src/Libraries/Microsoft.Extensions.Telemetry.Abstractions/Sampling/ILoggerSamplerFilterRule.cs

+    /// <summary>
+    /// Gets the filter delegate that would be applied to messages that passed the <see cref="LogLevel"/>.
+    /// </summary>
+    public Func<string?, LogLevel?, int?, bool>? Filter { get; }


I don't think this API needs to be included in the interface because the rule selector doesn't access it.

noahfalk · 2024-11-07T01:48:37Z

src/Libraries/Microsoft.Extensions.Telemetry/Logging/ExtendedLogger.cs

@@ -39,6 +38,12 @@ public ExtendedLogger(ExtendedLoggerFactory factory, LoggerInformation[] loggers

    public void Log<TState>(LogLevel logLevel, EventId eventId, TState state, Exception? exception, Func<TState, Exception?, string> formatter)
    {
+        if (MessageLoggers.Length == 0 || !_factory.Config.Sampler.ShouldSample(new SamplingParameters(logLevel, MessageLoggers[0].Category, eventId)))


I think we want to run the sampler after we know the log record is enabled in at least one logger. For example if someone wanted to make a rate limiting sampler that logs no more than 1000 messages per second they might write:

class Sampler { int _count; // pretend this gets reset to zero on timer every second bool ShouldSample(...) => _count++ < 1000; }

If the app has lots of logging instrumentation that uses the Trace logging level, but the app config has trace logging disabled the app developer may not get any messages logged at all. The calls to Log() at trace level use up the entire quota of 1000 messages in the sampler only to get filtered out later by the Logger.IsEnabled() checks.

noahfalk · 2024-11-07T06:23:27Z

src/Libraries/Microsoft.Extensions.Telemetry/Sampling/RatioBasedSamplerFilterRule.cs

+    public double Probability { get; set; }
+
+    /// <inheritdoc/>
+    public Func<string?, LogLevel?, int?, bool>? Filter { get; }


This API isn't used right? There isn't going to be any filtering other than probability check.

noahfalk · 2024-11-07T06:28:13Z

src/Libraries/Microsoft.Extensions.Telemetry/Sampling/RatioBasedSamplerOptions.cs

+    /// Gets or sets the collection of <see cref="RatioBasedSamplerFilterRule"/> used for filtering log messages.
+    /// </summary>
+#pragma warning disable CA1002 // Do not expose generic lists - List is necessary to be able to call .AddRange()
+#pragma warning disable CA2227 // Collection properties should be read only - setter is necessary for options pattern


I assume its necessary if the implementation calls section.Get<RatioBasedSamplerOptions>(), but it wouldn't be necessary with a more manual implementation of the config parsing. I don't know how much this matters but if the API review folks wanted this not to be settable we could do it.

noahfalk · 2024-11-07T06:30:50Z

src/Libraries/Microsoft.Extensions.Telemetry/Sampling/SamplerRuleSelector.cs

+            // 2. If there nothing matched by category take all rules without category
+            // 3. If there is only one rule use it's level and filter
+            // 4. If there are multiple rules use last
+            // 5. If there are no applicable rules use global minimal level


This comment looks like the precedence rules for the Logging.LogLevels, but presumably we'll need slightly different rules here. Probably something like this:

Rules with an EventId take precedence over those without an EventId

Rules with a longer category string take precedence over shorter string or no string

Rules with lower LogLevel take precedence over higher LogLevel

If there are still multiple rules, take the last

noahfalk · 2024-11-07T07:07:16Z

src/Libraries/Microsoft.Extensions.Telemetry/Sampling/SamplingLoggerBuilderExtensions.cs

+    /// <param name="configuration">The <see cref="IConfiguration" /> to add.</param>
+    /// <returns>The value of <paramref name="builder"/>.</returns>
+    /// <exception cref="ArgumentNullException"><paramref name="builder"/> is <see langword="null"/>.</exception>
+    public static ILoggingBuilder AddRatioBasedSamplerConfiguration(this ILoggingBuilder builder, IConfiguration configuration)


Does this need to be public? I'm not sure what scenario would use it.

Yes, my thinking was to have this method to be able to call it from HostBuilder, similar to AddConfiguration() for logging here

noahfalk · 2024-11-07T07:10:27Z

src/Libraries/Microsoft.Extensions.Telemetry/Sampling/SamplingLoggerBuilderExtensions.cs

+    /// <param name="configuration">The <see cref="IConfiguration" /> to add.</param>
+    /// <returns>The value of <paramref name="builder"/>.</returns>
+    /// <exception cref="ArgumentNullException"><paramref name="builder"/> is <see langword="null"/>.</exception>
+    public static ILoggingBuilder AddRatioBasedSampler(this ILoggingBuilder builder, IConfiguration configuration)


I'm thinking we may want to rename all the places that say 'RatioBasedSampler' to 'ProbabilitySampler'. What do other folks think?

noahfalk · 2024-11-07T07:17:10Z

src/Libraries/Microsoft.Extensions.Telemetry/Sampling/RatioBasedSampler.cs

+        probability = 0.0;
+
+        // TO DO: check if we can optimize this. It is a hot path and
+        // we should be able to minimize number of rule selections on every log record.


I agree we'd want to optimize it but it doesn't impact the design review much. I think its good you left it as a TODO 👍.

R9 Fundamentals and others added 27 commits June 5, 2024 07:10

initial proposal

1280d1b

update

f8b502c

rebase

ed7e0db

use .net 9

e15139d

bufferin - initial

12fd4b4

Remove junk

9fa8403

buffer - renames

2ff5f78

.

0cf2c50

.

6f8265e

Sampling WIP with Global and HttpRequest samplers

cdb2c82

sampling

cb1de9d

cosmetic

c4596a3

Prepare to update API proposal

5e92d14

.

8f760b1

state at 23_10_2024 after updating Github proposal

57f902b

Return Global and Http Request buffering options

610647c

polish sampling

3be3850

add alwaysOnSampler

8ce40c1

update namespaces

d288300

abstractions test

e120145

add some tests

b443094

remove buffering

b41fcdf

merge

a5e1440

update tests

cbd2a15

remove sample app

5580cff

cosmetic changes

07a2860

update tests

9ac7cab

dotnet-policy-service bot assigned evgenyfedorov2 Oct 25, 2024

evgenyfedorov2 added 2 commits October 25, 2024 18:52

validate supplied probability

0df498f

Fix warnings

7b9479d

evgenyfedorov2 added 2 commits October 26, 2024 10:16

Merge branch 'main' into evgenyfedorov2/log_sampling

4007ce1

more tests

473bfca

samsp-msft reviewed Oct 29, 2024

View reviewed changes

noahfalk reviewed Oct 29, 2024

View reviewed changes

Address PR comments

b140e68

tarekgh reviewed Oct 30, 2024

View reviewed changes

src/Libraries/Microsoft.Extensions.Telemetry.Abstractions/Sampling/SamplingParameters.cs Show resolved Hide resolved

tarekgh reviewed Oct 30, 2024

View reviewed changes

evgenyfedorov2 added 2 commits October 31, 2024 12:24

Address PR comments

32b5adc

wip

ebd4795

RussKie requested a review from a team November 5, 2024 01:02

RussKie added the area-resourcemonitoring label Nov 5, 2024

evgenyfedorov2 added 3 commits November 6, 2024 15:51

add config support to Ratio based Sampler

901bc22

fix warnings

e33863e

Merge branch 'main' into evgenyfedorov2/log_sampling

d43f891

noahfalk reviewed Nov 7, 2024

View reviewed changes

PR comments

0adbfc7

Add logging sampling #5574

Are you sure you want to change the base?

Add logging sampling #5574

Conversation

evgenyfedorov2 commented Oct 25, 2024 • edited Loading

Microsoft Reviewers: Open in CodeFlow

evgenyfedorov2 commented Oct 29, 2024

geeknoid commented Oct 29, 2024

samsp-msft Oct 29, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

evgenyfedorov2 Oct 30, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

noahfalk commented Oct 30, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tarekgh commented Oct 30, 2024

evgenyfedorov2 commented Nov 6, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

evgenyfedorov2 commented Oct 25, 2024 •

edited

Loading

samsp-msft Oct 29, 2024 •

edited

Loading

evgenyfedorov2 Oct 30, 2024 •

edited

Loading