Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Geneva.Logs] Support prefix-based table name mapping #796

Merged
merged 18 commits into from
Dec 16, 2022

Conversation

CodeBlanch
Copy link
Member

@CodeBlanch CodeBlanch commented Nov 24, 2022

Changes

This PR adds support for prefix-based resolution (similar to logging configuration) to the table mapping configuration options.

Usage

Given this config...

        var options = new GenevaExporterOptions
        {
            TableNameMappings = new Dictionary<string, string>()
            {
                ["*"] = "DefaultLogs",
                ["MyCompany"] = "InternalLogs",
                ["MyCompany.Product1"] = "InternalProduct1Logs",
                ["MyCompany.Product2"] = "InternalProduct2Logs",
                ["MyCompany.Product2.Security"] = "InternalSecurityLogs",
                ["MyPartner"] = "*",
            },
        };
  • ILogger<ThirdParty.Thing>: This would go to "DefaultLogs"
  • ILogger<MyCompany.ProductX.Thing>: This would go to "InternalLogs"
  • ILogger<MyCompany.Product1.Thing>: This would go to "InternalProduct1Logs"
  • ILogger<MyCompany.Product2.Thing>: This would go to "InternalProduct2Logs"
  • ILogger<MyCompany.Product2.Security.Alert>: This would go to "InternalSecurityLogs"
  • ILogger<MyPartner.Product.Thing>: This is marked as pass-through ("*") so it will be sanitized as "MyPartnerProductThing" table name

Implementation details

The implementation is close to what ILogger does for its filter configuration. The idea is you don't want to have to apply the rules each time you process a log message because the perf would be terrible. You want to have the result of rule selection computed once and available each time a log message is written. What the logging framework does is store the result on the ILogger instance created/injected from the factory. We don't have that luxury here. What I did was introduce a helper class which maintains a cache of the final category -> table name. It is backed by a dictionary so you pay for a hash on the lookup which is what we were paying before (if mappings were used at all).

Some details about the cache...

  • Lookups are lock-free. If a new mapping is needed we do a copy-on-write. So warmup is expensive but once you are warm it is cheap.

  • There is no constraint on the number of keys (category names) which will be cached. ILogger itself maintains an unconstrained cache of loggers by category name so it seemed like that would OOM well before this.

  • If the mapping is pass-through we only cache up to 1024 of the sanitized table names. Once the max is reached we just write them out to the buffer as they are logged (this was the existing behavior always).

Benchmarks

Implementation + optimizations actually speed things up for everyone.

Before:

Method IncludeFormattedMessage Mean Error StdDev Gen0 Allocated
LoggerWithMessageTemplate False 546.6 ns 10.43 ns 9.76 ns 0.0162 208 B
LoggerWithDirectLoggerAPI False 509.0 ns 9.98 ns 11.09 ns 0.0267 344 B
LoggerWithSourceGenerator False 529.5 ns 5.30 ns 4.43 ns 0.0134 168 B
SerializeLogRecord False 359.4 ns 5.70 ns 5.34 ns 0.0081 104 B
Export False 390.7 ns 5.24 ns 4.90 ns 0.0081 104 B
LoggerWithMessageTemplate True 588.1 ns 10.01 ns 9.36 ns 0.0162 208 B
LoggerWithDirectLoggerAPI True 512.0 ns 7.44 ns 6.96 ns 0.0267 344 B
LoggerWithSourceGenerator True 537.0 ns 8.15 ns 6.80 ns 0.0134 168 B
SerializeLogRecord True 361.9 ns 5.39 ns 5.05 ns 0.0081 104 B
Export True 391.7 ns 4.14 ns 3.87 ns 0.0081 104 B

After:

Method IncludeFormattedMessage Mean Error StdDev Gen0 Allocated
LoggerWithMessageTemplate False 530.6 ns 3.30 ns 2.92 ns 0.0162 208 B
LoggerWithDirectLoggerAPI False 496.9 ns 8.87 ns 8.30 ns 0.0267 344 B
LoggerWithSourceGenerator False 516.1 ns 8.12 ns 7.60 ns 0.0134 168 B
SerializeLogRecord False 343.4 ns 6.64 ns 6.52 ns 0.0081 104 B
Export False 383.6 ns 5.77 ns 5.39 ns 0.0081 104 B
LoggerWithMessageTemplate True 566.3 ns 10.16 ns 9.50 ns 0.0162 208 B
LoggerWithDirectLoggerAPI True 504.9 ns 9.36 ns 8.29 ns 0.0267 344 B
LoggerWithSourceGenerator True 524.2 ns 5.07 ns 4.75 ns 0.0134 168 B
SerializeLogRecord True 349.7 ns 4.56 ns 4.26 ns 0.0081 104 B
Export True 380.9 ns 3.93 ns 3.67 ns 0.0081 104 B

TODOs

  • Appropriate CHANGELOG.md updated for non-trivial changes
  • Unit tests
  • README updates

@codecov
Copy link

codecov bot commented Nov 24, 2022

Codecov Report

Merging #796 (6d5536b) into main (97f1882) will increase coverage by 0.20%.
The diff coverage is 93.02%.

Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##             main     #796      +/-   ##
==========================================
+ Coverage   77.88%   78.08%   +0.20%     
==========================================
  Files         176      177       +1     
  Lines        5303     5379      +76     
==========================================
+ Hits         4130     4200      +70     
- Misses       1173     1179       +6     
Impacted Files Coverage Δ
...Exporter.Geneva/MsgPackExporter/MsgPackExporter.cs 93.54% <ø> (ø)
...Telemetry.Exporter.Geneva/GenevaExporterOptions.cs 86.36% <66.66%> (-5.53%) ⬇️
...er.Geneva/MsgPackExporter/MessagePackSerializer.cs 90.15% <77.77%> (+0.79%) ⬆️
...ry.Exporter.Geneva/Internal/TableNameSerializer.cs 96.22% <96.22%> (ø)
...OpenTelemetry.Exporter.Geneva/GenevaLogExporter.cs 80.00% <100.00%> (ø)
...enTelemetry.Exporter.Geneva/GenevaTraceExporter.cs 80.00% <100.00%> (ø)
...orter.Geneva/MsgPackExporter/MsgPackLogExporter.cs 92.63% <100.00%> (-1.34%) ⬇️

@CodeBlanch CodeBlanch marked this pull request as ready for review November 28, 2022 21:25
@CodeBlanch CodeBlanch requested a review from a team November 28, 2022 21:25
}
}

// Note: This is using copy-on-write pattern to keep the happy
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mentioned above possibly using Hashtable. Hashtable has a unique quirk in that it supports the many-readers-single-writer pattern. That would allow removing copy-on-write, which is great for the warm-up phase:

Method Mean Error StdDev Gen0 Gen1 Allocated
CopyOnWrite 3,659.8 us 63.39 us 56.19 us 1328.1250 183.5938 16326.84 KB
Hashtable 103.6 us 1.95 us 1.82 us 13.4277 6.5918 165.3 KB
Benchmark code
using System.Collections;
using System.Collections.Generic;
using BenchmarkDotNet.Attributes;

namespace OpenTelemetry.Exporter.Geneva.Benchmark;

[MemoryDiagnoser]
public class CopyOnWriteBenchmarks
{
    private readonly object lockObject = new();

    [Benchmark]
    public Dictionary<string, byte[]> CopyOnWrite()
    {
        Dictionary<string, byte[]> d = new();

        for (int i = 0; i < 1024; i++)
        {
            string k = $"Category{i}";

            if (!d.TryGetValue(k, out var value))
            {
                lock (this.lockObject)
                {
                    if (!d.TryGetValue(k, out value))
                    {
                        d[k] = new byte[1];

                        var old = d;

                        d = new(old);
                    }
                }
            }
        }

        return d;
    }

    [Benchmark]
    public Hashtable Hashtable()
    {
        Hashtable ht = new();

        for (int i = 0; i < 1024; i++)
        {
            string k = $"Category{i}";

            var value = (string)ht[k];
            if (value == null)
            {
                lock (this.lockObject)
                {
                    value = (string)ht[k];
                    if (value == null)
                    {
                        ht[k] = new byte[1];
                    }
                }
            }
        }

        return ht;
    }
}

Chatted with @utpilla we decided to not use Hashtable for now because the docs also say it isn't recommended to be used.

It is an implementation detail so we can always change it in the future.

the value of the key to either the desired custom table name or `*` to enable
"pass-through" mapping. For details on "pass-through" mode see below.

For example, given the configuration...
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ThomsonTan please review this part and see if the agent can do the same in theory (in case folks use the OTLP path and want to achieve the same).


string currentKey = null;

foreach (var mapping in this.m_tableMappings)
Copy link
Contributor

@utpilla utpilla Dec 1, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can be optimized. We can avoid checking every key of the TableNameMappings, if we keep a collection of the keys sorted in descending order and use that to decide if we need to keep checking for next keys.

Let's consider the example provided in the description. Let's say the category name was MyCompany.Product2.Component and we have the keys sorted in descending order:

"MyPartner"
"MyCompany.Product2.Security"
"MyCompany.Product2"
"MyCompany.Product1"
"MyCompany"
"*"

Once we find a match we can break out of the loop as that would be the longest match. In this example the match would be MyCompany.Product2 which is the one we want.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've been working on this but it is getting kind of big with benchmarks, etc. Thinking I'll merge this PR and then open perf improvements as its own thing.

@utpilla utpilla added the comp:exporter.geneva Things related to OpenTelemetry.Exporter.Geneva label Dec 1, 2022
? this.m_defaultTableName
: s_passthroughTableName;

if (mappedTableName.Length <= 0 && tableNameCache.CachedSanitizedTableNameCount < MaxCachedSanitizedTableNames)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why does this limit of 1024 matter? If I understand correctly, regardless of the CachedSanitizedTableNameCount value, we would always add a new category as a key to the m_tableNameCache. So why not just add the actual sanitized category name as the value instead of s_passthroughTableName when the 1024 limit is reached.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Short answer: Today we don't cache any of the sanitized names so this was me being conservative.

Long answer:

If I understand correctly, regardless of the CachedSanitizedTableNameCount value, we would always add a new category as a key to the m_tableNameCache

Correct

So why not just add the actual sanitized category name as the value instead of s_passthroughTableName when the 1024 limit is reached.

Just to make it less greedy. Each category added after 1024 will grow by a pointer to s_passthroughTableName instead of a pointer to some freshly allocated buffer (up to 52 bytes IIRC).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to make it less greedy. Each category added after 1024 will grow by a pointer to s_passthroughTableName instead of a pointer to some freshly allocated buffer (up to 52 bytes IIRC).

Would 52 bytes be considered high? For every new set of 1024 keys, it's only 52 KB max.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Honestly, didn't think too much about it. Just nice to prevent unconstrained growth where we can. The check is only processed the first time we see a category name value and then we cache the result. So it is pretty inexpensive (a few nanos) to have the cap. You want me to remove it?

Copy link
Contributor

@utpilla utpilla Dec 6, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

More than the if check I was thinking we could save on another round of parsing the Category name when we have more than 1024 category names.

It will come down to the cost of copying a small byte[] to the buffer vs executing WriteSanitizedCategoryName (parsing + serialize). I was wondering if we anyway have to parse the string for dictionary lookup why not just get hold of the byte[] instead of again parsing and serializing the sanitized Category name.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm. I think there is some confusion about what is going on. So we always cache something. For a given category name the cache will be one of these things...

  1. A defined mapping (pointer to something in m_tableMappings).
  2. The default table name (pointer to m_defaultTableName).
  3. A sanitized category name (allocation).
  4. Array.Empty if category name could not be sanitized (pointer to static).
  5. The pass-through indicator if the cache was full (pointer to s_passthroughTableName).

The logic is...

  • Lookup the mapping using category name from the cache.
  • If the mapping is the pass-through indicator, write out the sanitized table name directly to the buffer.

For pass-through today the logic is...

  • Check for a mapping using category name.
  • If nothing was found and pass-through is enabled, write out the sanitized table name directly to the buffer.

So basically when the cache fills up we don't do double-work, we just fallback to the behavior we have today.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Update... I synced up with @utpilla offline. The concern here is that in computing hash code for CategoryName in order to look into the cache "parses" the string. I have some ideas to improve this, but I don't thing we need to do anything here. The perf should be better for most, slightly worse for some.

  • For users using at least 1 mapping, there is already a hash code computation + dictionary lookup in their path today. What these users will see: There should be see a speed-up for <= 1024 categories. Same speed as today for > 1024 categories.

  • For users NOT using any mappings today, there is now a new hash code computation + dictionary lookup in their path. What these users will see: When using default table name, there should be a speed-up because we now copy a pre-built array using span helpers instead of calling into MessagePackSerializer.SerializeAsciiString each time. When using pass-through: Speed-up for <= 1024 categories. A slight slow-down for > 1024 categories.

@github-actions
Copy link
Contributor

This PR was marked stale due to lack of activity. It will be closed in 7 days.

@github-actions github-actions bot added the Stale label Dec 14, 2022
@github-actions github-actions bot removed the Stale label Dec 15, 2022
@CodeBlanch CodeBlanch merged commit 3768220 into open-telemetry:main Dec 16, 2022
@CodeBlanch CodeBlanch deleted the geneva-logs-prefix-support branch December 16, 2022 19:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
comp:exporter.geneva Things related to OpenTelemetry.Exporter.Geneva
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants