[Geneva.Logs] Support prefix-based table name mapping #796

CodeBlanch · 2022-11-24T00:02:42Z

Changes

This PR adds support for prefix-based resolution (similar to logging configuration) to the table mapping configuration options.

Usage

Given this config...

        var options = new GenevaExporterOptions
        {
            TableNameMappings = new Dictionary<string, string>()
            {
                ["*"] = "DefaultLogs",
                ["MyCompany"] = "InternalLogs",
                ["MyCompany.Product1"] = "InternalProduct1Logs",
                ["MyCompany.Product2"] = "InternalProduct2Logs",
                ["MyCompany.Product2.Security"] = "InternalSecurityLogs",
                ["MyPartner"] = "*",
            },
        };

ILogger<ThirdParty.Thing>: This would go to "DefaultLogs"
ILogger<MyCompany.ProductX.Thing>: This would go to "InternalLogs"
ILogger<MyCompany.Product1.Thing>: This would go to "InternalProduct1Logs"
ILogger<MyCompany.Product2.Thing>: This would go to "InternalProduct2Logs"
ILogger<MyCompany.Product2.Security.Alert>: This would go to "InternalSecurityLogs"
ILogger<MyPartner.Product.Thing>: This is marked as pass-through ("*") so it will be sanitized as "MyPartnerProductThing" table name

Implementation details

The implementation is close to what ILogger does for its filter configuration. The idea is you don't want to have to apply the rules each time you process a log message because the perf would be terrible. You want to have the result of rule selection computed once and available each time a log message is written. What the logging framework does is store the result on the ILogger instance created/injected from the factory. We don't have that luxury here. What I did was introduce a helper class which maintains a cache of the final category -> table name. It is backed by a dictionary so you pay for a hash on the lookup which is what we were paying before (if mappings were used at all).

Some details about the cache...

Lookups are lock-free. If a new mapping is needed we do a copy-on-write. So warmup is expensive but once you are warm it is cheap.
There is no constraint on the number of keys (category names) which will be cached. ILogger itself maintains an unconstrained cache of loggers by category name so it seemed like that would OOM well before this.
If the mapping is pass-through we only cache up to 1024 of the sanitized table names. Once the max is reached we just write them out to the buffer as they are logged (this was the existing behavior always).

Benchmarks

Implementation + optimizations actually speed things up for everyone.

Before:

Method	IncludeFormattedMessage	Mean	Error	StdDev	Gen0	Allocated
LoggerWithMessageTemplate	False	546.6 ns	10.43 ns	9.76 ns	0.0162	208 B
LoggerWithDirectLoggerAPI	False	509.0 ns	9.98 ns	11.09 ns	0.0267	344 B
LoggerWithSourceGenerator	False	529.5 ns	5.30 ns	4.43 ns	0.0134	168 B
SerializeLogRecord	False	359.4 ns	5.70 ns	5.34 ns	0.0081	104 B
Export	False	390.7 ns	5.24 ns	4.90 ns	0.0081	104 B
LoggerWithMessageTemplate	True	588.1 ns	10.01 ns	9.36 ns	0.0162	208 B
LoggerWithDirectLoggerAPI	True	512.0 ns	7.44 ns	6.96 ns	0.0267	344 B
LoggerWithSourceGenerator	True	537.0 ns	8.15 ns	6.80 ns	0.0134	168 B
SerializeLogRecord	True	361.9 ns	5.39 ns	5.05 ns	0.0081	104 B
Export	True	391.7 ns	4.14 ns	3.87 ns	0.0081	104 B

After:

Method	IncludeFormattedMessage	Mean	Error	StdDev	Gen0	Allocated
LoggerWithMessageTemplate	False	530.6 ns	3.30 ns	2.92 ns	0.0162	208 B
LoggerWithDirectLoggerAPI	False	496.9 ns	8.87 ns	8.30 ns	0.0267	344 B
LoggerWithSourceGenerator	False	516.1 ns	8.12 ns	7.60 ns	0.0134	168 B
SerializeLogRecord	False	343.4 ns	6.64 ns	6.52 ns	0.0081	104 B
Export	False	383.6 ns	5.77 ns	5.39 ns	0.0081	104 B
LoggerWithMessageTemplate	True	566.3 ns	10.16 ns	9.50 ns	0.0162	208 B
LoggerWithDirectLoggerAPI	True	504.9 ns	9.36 ns	8.29 ns	0.0267	344 B
LoggerWithSourceGenerator	True	524.2 ns	5.07 ns	4.75 ns	0.0134	168 B
SerializeLogRecord	True	349.7 ns	4.56 ns	4.26 ns	0.0081	104 B
Export	True	380.9 ns	3.93 ns	3.67 ns	0.0081	104 B

TODOs

Appropriate CHANGELOG.md updated for non-trivial changes
Unit tests
README updates

codecov · 2022-11-24T00:14:20Z

Codecov Report

Merging #796 (6d5536b) into main (97f1882) will increase coverage by 0.20%.
The diff coverage is 93.02%.

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #796      +/-   ##
==========================================
+ Coverage   77.88%   78.08%   +0.20%     
==========================================
  Files         176      177       +1     
  Lines        5303     5379      +76     
==========================================
+ Hits         4130     4200      +70     
- Misses       1173     1179       +6

Impacted Files	Coverage Δ
...Exporter.Geneva/MsgPackExporter/MsgPackExporter.cs	`93.54% <ø> (ø)`
...Telemetry.Exporter.Geneva/GenevaExporterOptions.cs	`86.36% <66.66%> (-5.53%)`	⬇️
...er.Geneva/MsgPackExporter/MessagePackSerializer.cs	`90.15% <77.77%> (+0.79%)`	⬆️
...ry.Exporter.Geneva/Internal/TableNameSerializer.cs	`96.22% <96.22%> (ø)`
...OpenTelemetry.Exporter.Geneva/GenevaLogExporter.cs	`80.00% <100.00%> (ø)`
...enTelemetry.Exporter.Geneva/GenevaTraceExporter.cs	`80.00% <100.00%> (ø)`
...orter.Geneva/MsgPackExporter/MsgPackLogExporter.cs	`92.63% <100.00%> (-1.34%)`	⬇️

src/OpenTelemetry.Exporter.Geneva/GenevaExporterOptions.cs

src/OpenTelemetry.Exporter.Geneva/Internal/TableNameSerializer.cs

reyang · 2022-11-28T21:40:16Z

src/OpenTelemetry.Exporter.Geneva/Internal/TableNameSerializer.cs

+                }
+            }
+
+            // Note: This is using copy-on-write pattern to keep the happy


Mentioned above possibly using Hashtable. Hashtable has a unique quirk in that it supports the many-readers-single-writer pattern. That would allow removing copy-on-write, which is great for the warm-up phase:

Method Mean Error StdDev Gen0 Gen1 Allocated

CopyOnWrite 3,659.8 us 63.39 us 56.19 us 1328.1250 183.5938 16326.84 KB

Hashtable 103.6 us 1.95 us 1.82 us 13.4277 6.5918 165.3 KB

Benchmark code

using System.Collections; using System.Collections.Generic; using BenchmarkDotNet.Attributes; namespace OpenTelemetry.Exporter.Geneva.Benchmark; [MemoryDiagnoser] public class CopyOnWriteBenchmarks { private readonly object lockObject = new(); [Benchmark] public Dictionary<string, byte[]> CopyOnWrite() { Dictionary<string, byte[]> d = new(); for (int i = 0; i < 1024; i++) { string k = $"Category{i}"; if (!d.TryGetValue(k, out var value)) { lock (this.lockObject) { if (!d.TryGetValue(k, out value)) { d[k] = new byte[1]; var old = d; d = new(old); } } } } return d; } [Benchmark] public Hashtable Hashtable() { Hashtable ht = new(); for (int i = 0; i < 1024; i++) { string k = $"Category{i}"; var value = (string)ht[k]; if (value == null) { lock (this.lockObject) { value = (string)ht[k]; if (value == null) { ht[k] = new byte[1]; } } } } return ht; } }

Chatted with @utpilla we decided to not use Hashtable for now because the docs also say it isn't recommended to be used.

It is an implementation detail so we can always change it in the future.

reyang · 2022-11-28T21:42:58Z

src/OpenTelemetry.Exporter.Geneva/README.md

+  the value of the key to either the desired custom table name or `*` to enable
+  "pass-through" mapping. For details on "pass-through" mode see below.
+
+  For example, given the configuration...


@ThomsonTan please review this part and see if the agent can do the same in theory (in case folks use the OTLP path and want to achieve the same).

src/OpenTelemetry.Exporter.Geneva/Internal/TableNameSerializer.cs

utpilla · 2022-12-01T02:08:08Z

src/OpenTelemetry.Exporter.Geneva/Internal/TableNameSerializer.cs

+
+                string currentKey = null;
+
+                foreach (var mapping in this.m_tableMappings)


This can be optimized. We can avoid checking every key of the TableNameMappings, if we keep a collection of the keys sorted in descending order and use that to decide if we need to keep checking for next keys.

Let's consider the example provided in the description. Let's say the category name was MyCompany.Product2.Component and we have the keys sorted in descending order:

"MyPartner" "MyCompany.Product2.Security" "MyCompany.Product2" "MyCompany.Product1" "MyCompany" "*"

Once we find a match we can break out of the loop as that would be the longest match. In this example the match would be MyCompany.Product2 which is the one we want.

I've been working on this but it is getting kind of big with benchmarks, etc. Thinking I'll merge this PR and then open perf improvements as its own thing.

src/OpenTelemetry.Exporter.Geneva/Internal/TableNameSerializer.cs

utpilla · 2022-12-01T02:51:17Z

src/OpenTelemetry.Exporter.Geneva/Internal/TableNameSerializer.cs

+                ? this.m_defaultTableName
+                : s_passthroughTableName;
+
+            if (mappedTableName.Length <= 0 && tableNameCache.CachedSanitizedTableNameCount < MaxCachedSanitizedTableNames)


Why does this limit of 1024 matter? If I understand correctly, regardless of the CachedSanitizedTableNameCount value, we would always add a new category as a key to the m_tableNameCache. So why not just add the actual sanitized category name as the value instead of s_passthroughTableName when the 1024 limit is reached.

Short answer: Today we don't cache any of the sanitized names so this was me being conservative.

Long answer:

If I understand correctly, regardless of the CachedSanitizedTableNameCount value, we would always add a new category as a key to the m_tableNameCache

Correct

So why not just add the actual sanitized category name as the value instead of s_passthroughTableName when the 1024 limit is reached.

Just to make it less greedy. Each category added after 1024 will grow by a pointer to s_passthroughTableName instead of a pointer to some freshly allocated buffer (up to 52 bytes IIRC).

Just to make it less greedy. Each category added after 1024 will grow by a pointer to s_passthroughTableName instead of a pointer to some freshly allocated buffer (up to 52 bytes IIRC).

Would 52 bytes be considered high? For every new set of 1024 keys, it's only 52 KB max.

Honestly, didn't think too much about it. Just nice to prevent unconstrained growth where we can. The check is only processed the first time we see a category name value and then we cache the result. So it is pretty inexpensive (a few nanos) to have the cap. You want me to remove it?

More than the if check I was thinking we could save on another round of parsing the Category name when we have more than 1024 category names.

It will come down to the cost of copying a small byte[] to the buffer vs executing WriteSanitizedCategoryName (parsing + serialize). I was wondering if we anyway have to parse the string for dictionary lookup why not just get hold of the byte[] instead of again parsing and serializing the sanitized Category name.

Hmm. I think there is some confusion about what is going on. So we always cache something. For a given category name the cache will be one of these things...

A defined mapping (pointer to something in m_tableMappings).

The default table name (pointer to m_defaultTableName).

A sanitized category name (allocation).

Array.Empty if category name could not be sanitized (pointer to static).

The pass-through indicator if the cache was full (pointer to s_passthroughTableName).

The logic is...

Lookup the mapping using category name from the cache.

If the mapping is the pass-through indicator, write out the sanitized table name directly to the buffer.

For pass-through today the logic is...

Check for a mapping using category name.

If nothing was found and pass-through is enabled, write out the sanitized table name directly to the buffer.

So basically when the cache fills up we don't do double-work, we just fallback to the behavior we have today.

Update... I synced up with @utpilla offline. The concern here is that in computing hash code for CategoryName in order to look into the cache "parses" the string. I have some ideas to improve this, but I don't thing we need to do anything here. The perf should be better for most, slightly worse for some.

For users using at least 1 mapping, there is already a hash code computation + dictionary lookup in their path today. What these users will see: There should be see a speed-up for <= 1024 categories. Same speed as today for > 1024 categories.

For users NOT using any mappings today, there is now a new hash code computation + dictionary lookup in their path. What these users will see: When using default table name, there should be a speed-up because we now copy a pre-built array using span helpers instead of calling into MessagePackSerializer.SerializeAsciiString each time. When using pass-through: Speed-up for <= 1024 categories. A slight slow-down for > 1024 categories.

…support

src/OpenTelemetry.Exporter.Geneva/Internal/TableNameSerializer.cs

github-actions · 2022-12-14T03:17:08Z

This PR was marked stale due to lack of activity. It will be closed in 7 days.

CodeBlanch added 2 commits November 23, 2022 14:05

Added support for prefix based table mappings in geneva log exporter.

087654a

Benchmark tweak.

e6dd129

github-actions bot assigned cijothomas, CodeBlanch, mic-max, reyang, utpilla and Yun-Ting Nov 24, 2022

github-actions bot requested review from cijothomas, mic-max, reyang, utpilla and Yun-Ting November 24, 2022 00:03

CodeBlanch added 4 commits November 28, 2022 12:09

Patch CHANGELOG.

317f877

README updates.

1e8d971

Merge from main.

953e000

Lint.

f274f92

CodeBlanch marked this pull request as ready for review November 28, 2022 21:25

CodeBlanch requested a review from a team November 28, 2022 21:25

reyang reviewed Nov 28, 2022

View reviewed changes

src/OpenTelemetry.Exporter.Geneva/GenevaExporterOptions.cs Show resolved Hide resolved

reyang reviewed Nov 28, 2022

View reviewed changes

src/OpenTelemetry.Exporter.Geneva/Internal/TableNameSerializer.cs Outdated Show resolved Hide resolved

reyang reviewed Nov 28, 2022

View reviewed changes

reyang approved these changes Nov 28, 2022

View reviewed changes

utpilla reviewed Dec 1, 2022

View reviewed changes

src/OpenTelemetry.Exporter.Geneva/Internal/TableNameSerializer.cs Show resolved Hide resolved

utpilla reviewed Dec 1, 2022

View reviewed changes

src/OpenTelemetry.Exporter.Geneva/Internal/TableNameSerializer.cs Outdated Show resolved Hide resolved

utpilla reviewed Dec 1, 2022

View reviewed changes

utpilla added the comp:exporter.geneva Things related to OpenTelemetry.Exporter.Geneva label Dec 1, 2022

utpilla reviewed Dec 1, 2022

View reviewed changes

src/OpenTelemetry.Exporter.Geneva/Internal/TableNameSerializer.cs Outdated Show resolved Hide resolved

utpilla reviewed Dec 1, 2022

View reviewed changes

CodeBlanch added 4 commits December 5, 2022 10:15

Merge remote-tracking branch 'upstream/main' into geneva-logs-prefix-…

609dc0a

…support

Code review.

750150d

Code review.

4ae1512

Test updates.

ff3f964

utpilla reviewed Dec 5, 2022

View reviewed changes

src/OpenTelemetry.Exporter.Geneva/Internal/TableNameSerializer.cs Show resolved Hide resolved

CodeBlanch added 2 commits December 5, 2022 15:45

Code review.

a626db9

Code review.

8405ec1

utpilla reviewed Dec 6, 2022

View reviewed changes

src/OpenTelemetry.Exporter.Geneva/Internal/TableNameSerializer.cs Outdated Show resolved Hide resolved

Code review.

9ced458

utpilla approved these changes Dec 6, 2022

View reviewed changes

Code review.

bbb2bc9

CodeBlanch mentioned this pull request Dec 6, 2022

PassThru TableNameMappings using the logger category name. #289

Closed

github-actions bot added the Stale label Dec 14, 2022

Merge from main.

f2fd6b5

github-actions bot removed the Stale label Dec 15, 2022

CodeBlanch added 3 commits December 15, 2022 13:15

Merge branch 'main' into geneva-logs-prefix-support

e1c3735

Warning cleanup.

ced896c

Merge branch 'main' into geneva-logs-prefix-support

6d5536b

CodeBlanch merged commit 3768220 into open-telemetry:main Dec 16, 2022

CodeBlanch deleted the geneva-logs-prefix-support branch December 16, 2022 19:22

utpilla mentioned this pull request Jan 9, 2023

[Exporter.Geneva] Add TLDLogExporter #874

Merged

CodeBlanch mentioned this pull request Jan 10, 2023

[Geneva] Allow customization of columns by table name in MsgPackLogExporter #878

Closed

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Geneva.Logs] Support prefix-based table name mapping #796

[Geneva.Logs] Support prefix-based table name mapping #796

CodeBlanch commented Nov 24, 2022 •

edited

Loading

codecov bot commented Nov 24, 2022 •

edited

Loading

reyang Nov 28, 2022

CodeBlanch Dec 6, 2022

reyang Nov 28, 2022

utpilla Dec 1, 2022 •

edited

Loading

CodeBlanch Dec 16, 2022

utpilla Dec 1, 2022

CodeBlanch Dec 5, 2022

utpilla Dec 5, 2022

CodeBlanch Dec 6, 2022

utpilla Dec 6, 2022 •

edited

Loading

CodeBlanch Dec 6, 2022

CodeBlanch Dec 16, 2022

github-actions bot commented Dec 14, 2022

Method	Mean	Error	StdDev	Gen0	Gen1	Allocated
CopyOnWrite	3,659.8 us	63.39 us	56.19 us	1328.1250	183.5938	16326.84 KB
Hashtable	103.6 us	1.95 us	1.82 us	13.4277	6.5918	165.3 KB


		string currentKey = null;

		foreach (var mapping in this.m_tableMappings)

[Geneva.Logs] Support prefix-based table name mapping #796

[Geneva.Logs] Support prefix-based table name mapping #796

Conversation

CodeBlanch commented Nov 24, 2022 • edited Loading

Changes

Usage

Implementation details

Benchmarks

TODOs

codecov bot commented Nov 24, 2022 • edited Loading

Codecov Report

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

utpilla Dec 1, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

utpilla Dec 6, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

github-actions bot commented Dec 14, 2022

CodeBlanch commented Nov 24, 2022 •

edited

Loading

codecov bot commented Nov 24, 2022 •

edited

Loading

utpilla Dec 1, 2022 •

edited

Loading

utpilla Dec 6, 2022 •

edited

Loading