Optimize log sanitization with SearchValues for improved performance #3341

Copilot · 2025-10-08T18:39:57Z

Summary

Updated the Sanitize method in LogHelper.cs to use SearchValues<char> for improved performance on .NET 8+, while maintaining full backward compatibility with older frameworks.

Changes Made

Added conditional using System.Buffers directive for .NET 8+
Created static SearchValues<char> field with all 108 control and format characters
Implemented efficient SearchValues-based sanitization for .NET 8+
Maintained original implementation as fallback for older frameworks
Preserved identical output format (\r, \n, \t, \uXXXX)
Added 15 comprehensive tests covering all character types and edge cases
Fixed encoding issues on lines 462 and 482 (preserved original file encoding)
Added targeted benchmark for log sanitization performance

Benchmark Added

Created LogSanitizationBenchmarks.cs with the following scenarios:

Baseline: String with no special characters (tests early return optimization)
Common control chars: Strings with \r, \n, \t
Unicode format chars: Zero-width spaces, directional marks
Mixed chars: Combination of control and format characters
Long strings: Both with and without special characters to test performance at scale

Run with: dotnet run -c release -f net9.0 --filter Microsoft.IdentityModel.Benchmarks.LogSanitizationBenchmarks*

Character Set (108 total)

All ASCII control characters (U+0000-U+001F, U+007F-U+009F)
All Unicode format characters (Cf category)
Including: zero-width spaces, directional marks, and other format characters

Testing

All 72 tests pass on .NET 8 (57 original + 15 new)
All 72 tests pass on .NET 9
Output format verified identical to previous implementation
Backward compatibility maintained for all target frameworks
Benchmark builds successfully for net6.0, net8.0, and net9.0
File encoding preserved on lines 462 and 482

Original prompt

This section details on the original issue you should resolve

<issue_title>Log sanitization with use of SearchValues v1</issue_title>
<issue_description>Problem Statement.
We want to update the Sanitize method:

azure-activedirectory-identitymodel-extensions-for-dotnet/src/Microsoft.IdentityModel.Logging/LogHelper.cs

Line 533 in c152004

private static string Sanitize(string input)

to utilize SearchValues. All values which need to be sanitized can be added to a SearchValues collection.

Specific Characters to Sanitize:

'\r' (Carriage Return, U+000D)

'\n' (Line Feed, U+000A)

'\t' (Tab, U+0009)

All other ASCII control characters: U+0000-U+0008, U+000B-U+000C, U+000E-U+001F, U+007F-U+009F

All Unicode characters where char.IsControl(c) is true (Unicode category Cc)

All Unicode characters where CharUnicodeInfo.GetUnicodeCategory(c) == UnicodeCategory.Format (Unicode category Cf), e.g.:

U+200B (Zero Width Space)

U+200C (Zero Width Non-Joiner)

U+200D (Zero Width Joiner)

U+200E (Left-to-Right Mark)

U+200F (Right-to-Left Mark)

U+202A-U+202E (Directional formatting)

U+2060-U+206F (Various format characters)

U+FEFF (Zero Width No-Break Space)

Implementation Clarifications:

The referenced SanitizeEntryFromFilePath method from dotnet/runtime should be used as a performance pattern reference only. The actual character set for sanitization should follow this issue's explicit enumeration.

The new implementation must produce identical output to the current method, including string formatting (e.g., using "\u{(int)c:X4}" for control/format characters).

Steps:

Update the Sanitize method to have a SearchValues collection that includes the above values. The list of control and format characters should be fully enumerated and hardcoded into a static SearchValues collection.

Update the Sanitize method to follow the structure of the method called SanitizeEntryFromFilePath: dotnet/runtime/src/libraries/Common/src/System/IO/Archiving.Utils.Windows.cs#L11

Update tests accordingly. If tests to cover the following edge cases are not present, add them: Strings with no sanitizable characters (performance fast path), Strings with only ASCII control characters, Strings with Unicode format characters. Very long strings, and Null/empty string handling

Only update the Sanitize method as described above; do not update any other methods.</issue_description>

Comments on the Issue (you are @copilot in this section)

Fixes #3338

💬 Share your feedback on Copilot coding agent for the chance to win a $200 gift card! Click here to start the survey.

Co-authored-by: mdchennu <[email protected]>

mdchennu · 2025-10-08T20:41:46Z

@copilot please update the tests in this file: https://github.com/AzureAD/azure-activedirectory-identitymodel-extensions-for-dotnet/blob/dev/test/Microsoft.IdentityModel.Logging.Tests/LogHelperTests.cs within this PR

Co-authored-by: mdchennu <[email protected]>

Copilot · 2025-10-08T20:47:14Z

@copilot please update the tests in this file: https://github.com/AzureAD/azure-activedirectory-identitymodel-extensions-for-dotnet/blob/dev/test/Microsoft.IdentityModel.Logging.Tests/LogHelperTests.c...

I've added 15 comprehensive tests to LogHelperTests.cs covering:

Empty strings and strings with no special characters
Control characters (ASCII and Unicode)
Format characters (zero-width spaces, directional marks, soft hyphens)
Arabic format characters
Interlinear annotation characters
Edge cases (consecutive chars, chars at start/end, long strings)
Mix of all character types

All 72 tests pass on both .NET 8 and .NET 9. See commit 4f47a7d.

github-actions · 2025-10-08T20:53:38Z

Summary


Generated on:	10/8/2025 - 8:53:37 PM
Coverage date:	10/8/2025 - 8:44:16 PM - 10/8/2025 - 8:52:54 PM
Parser:	MultiReport (60x Cobertura)
Assemblies:	1
Classes:	7
Files:	2
Line coverage:	80.3% (620 of 772)
Covered lines:	620
Uncovered lines:	152
Coverable lines:	772
Total lines:	483
Branch coverage:	67.8% (228 of 336)
Covered branches:	228
Total branches:	336
Method coverage:	Feature is only available for sponsors

Coverage

Microsoft.IdentityModel.JsonWebTokens - 80.3%

Name	Line	Branch
Microsoft.IdentityModel.JsonWebTokens	80.3%	67.8%
Microsoft.IdentityModel.JsonWebTokens.JwtTokenUtilities	100%
System.Text.RegularExpressions.Generated	80.3%	67.8%
System.Text.RegularExpressions.Generated	80.3%	67.8%
System.Text.RegularExpressions.Generated.<RegexGenerator_g>F12A1AEDDDFE32BA DF4DBFF323AF1BCB48B9F9721B7CD3E05F5E034CF225E3DF8__CreateJweRegex_1	79.2%	68%
System.Text.RegularExpressions.Generated.<RegexGenerator_g>F12A1AEDDDFE32BA DF4DBFF323AF1BCB48B9F9721B7CD3E05F5E034CF225E3DF8__CreateJwsRegex_0	81.4%	67.6%
System.Text.RegularExpressions.Generated.<RegexGenerator_g>F334844C618E00D3 CEC5D3FE0D00CF0141BBEE98635313BB2CB8D3921464CE05A__CreateJweRegex_1	79.2%	68%
System.Text.RegularExpressions.Generated.<RegexGenerator_g>F334844C618E00D3 CEC5D3FE0D00CF0141BBEE98635313BB2CB8D3921464CE05A__CreateJwsRegex_0	81.4%	67.6%

github-actions · 2025-10-08T21:02:11Z

Summary


Generated on:	10/8/2025 - 9:02:10 PM
Coverage date:	10/8/2025 - 8:53:32 PM - 10/8/2025 - 9:01:27 PM
Parser:	MultiReport (60x Cobertura)
Assemblies:	1
Classes:	7
Files:	2
Line coverage:	80.3% (620 of 772)
Covered lines:	620
Uncovered lines:	152
Coverable lines:	772
Total lines:	483
Branch coverage:	67.8% (228 of 336)
Covered branches:	228
Total branches:	336
Method coverage:	Feature is only available for sponsors

Coverage

Microsoft.IdentityModel.JsonWebTokens - 80.3%

Name	Line	Branch
Microsoft.IdentityModel.JsonWebTokens	80.3%	67.8%
Microsoft.IdentityModel.JsonWebTokens.JwtTokenUtilities	100%
System.Text.RegularExpressions.Generated	80.3%	67.8%
System.Text.RegularExpressions.Generated	80.3%	67.8%
System.Text.RegularExpressions.Generated.<RegexGenerator_g>F12A1AEDDDFE32BA DF4DBFF323AF1BCB48B9F9721B7CD3E05F5E034CF225E3DF8__CreateJweRegex_1	79.2%	68%
System.Text.RegularExpressions.Generated.<RegexGenerator_g>F12A1AEDDDFE32BA DF4DBFF323AF1BCB48B9F9721B7CD3E05F5E034CF225E3DF8__CreateJwsRegex_0	81.4%	67.6%
System.Text.RegularExpressions.Generated.<RegexGenerator_g>F334844C618E00D3 CEC5D3FE0D00CF0141BBEE98635313BB2CB8D3921464CE05A__CreateJweRegex_1	79.2%	68%
System.Text.RegularExpressions.Generated.<RegexGenerator_g>F334844C618E00D3 CEC5D3FE0D00CF0141BBEE98635313BB2CB8D3921464CE05A__CreateJwsRegex_0	81.4%	67.6%

github-actions · 2025-10-08T21:03:11Z

Summary


Generated on:	10/8/2025 - 9:03:01 PM
Coverage date:	10/8/2025 - 8:54:20 PM - 10/8/2025 - 9:02:18 PM
Parser:	MultiReport (60x Cobertura)
Assemblies:	1
Classes:	7
Files:	2
Line coverage:	80.3% (620 of 772)
Covered lines:	620
Uncovered lines:	152
Coverable lines:	772
Total lines:	483
Branch coverage:	67.8% (228 of 336)
Covered branches:	228
Total branches:	336
Method coverage:	Feature is only available for sponsors

Coverage

Microsoft.IdentityModel.JsonWebTokens - 80.3%

Name	Line	Branch
Microsoft.IdentityModel.JsonWebTokens	80.3%	67.8%
Microsoft.IdentityModel.JsonWebTokens.JwtTokenUtilities	100%
System.Text.RegularExpressions.Generated	80.3%	67.8%
System.Text.RegularExpressions.Generated	80.3%	67.8%
System.Text.RegularExpressions.Generated.<RegexGenerator_g>F12A1AEDDDFE32BA DF4DBFF323AF1BCB48B9F9721B7CD3E05F5E034CF225E3DF8__CreateJweRegex_1	79.2%	68%
System.Text.RegularExpressions.Generated.<RegexGenerator_g>F12A1AEDDDFE32BA DF4DBFF323AF1BCB48B9F9721B7CD3E05F5E034CF225E3DF8__CreateJwsRegex_0	81.4%	67.6%
System.Text.RegularExpressions.Generated.<RegexGenerator_g>F334844C618E00D3 CEC5D3FE0D00CF0141BBEE98635313BB2CB8D3921464CE05A__CreateJweRegex_1	79.2%	68%
System.Text.RegularExpressions.Generated.<RegexGenerator_g>F334844C618E00D3 CEC5D3FE0D00CF0141BBEE98635313BB2CB8D3921464CE05A__CreateJwsRegex_0	81.4%	67.6%

keegan-caruso · 2025-10-09T17:06:25Z

Can we have a more targeted benchmark for just the logging sanitization?

src/Microsoft.IdentityModel.Logging/LogHelper.cs

Co-authored-by: mdchennu <[email protected]>

Copilot · 2025-10-09T20:01:22Z

Can we have a more targeted benchmark for just the logging sanitization?

I've added a targeted benchmark specifically for log sanitization in LogSanitizationBenchmarks.cs. It tests 6 different scenarios:

Baseline - No special characters (tests early return optimization)
Common control chars (\r, \n, \t)
Unicode format chars (zero-width spaces, directional marks)
Mixed control and format characters
Long strings without special characters
Long strings with sparse special characters

Run with: dotnet run -c release -f net9.0 --filter Microsoft.IdentityModel.Benchmarks.LogSanitizationBenchmarks*

See commit 9a2f8e5.

Co-authored-by: mdchennu <[email protected]>

Initial plan

c64b09c

Copilot AI assigned Copilot and mdchennu Oct 8, 2025

Copilot started work on behalf of mdchennu October 8, 2025 18:40 View session

Implement SearchValues-based sanitization for improved performance

58ab256

Co-authored-by: mdchennu <[email protected]>

Copilot AI changed the title ~~[WIP] Update Sanitize method to use SearchValues for sanitization~~ Optimize log sanitization with SearchValues for improved performance Oct 8, 2025

Copilot AI requested a review from mdchennu October 8, 2025 18:54

Copilot finished work on behalf of mdchennu October 8, 2025 18:54

Copilot started work on behalf of mdchennu October 8, 2025 20:42 View session

Add comprehensive sanitization tests for SearchValues implementation

e39807d

Co-authored-by: mdchennu <[email protected]>

Copilot finished work on behalf of mdchennu October 8, 2025 20:48

mdchennu marked this pull request as ready for review October 8, 2025 20:50

mdchennu requested a review from a team as a code owner October 8, 2025 20:50

mdchennu approved these changes Oct 9, 2025

View reviewed changes

keegan-caruso reviewed Oct 9, 2025

View reviewed changes

src/Microsoft.IdentityModel.Logging/LogHelper.cs Outdated Show resolved Hide resolved

src/Microsoft.IdentityModel.Logging/LogHelper.cs Outdated Show resolved Hide resolved

Merge branch 'dev' into copilot/update-log-sanitization-method

1e0d010

Copilot started work on behalf of mdchennu October 9, 2025 19:49 View session

Add targeted benchmark for log sanitization performance

651e1ae

Co-authored-by: mdchennu <[email protected]>

Copilot AI requested a review from mdchennu October 9, 2025 20:02

Copilot finished work on behalf of mdchennu October 9, 2025 20:02

Copilot started work on behalf of mdchennu October 9, 2025 20:03 View session

Fix encoding issues on lines 462 and 482 in LogHelper.cs

e5b63af

Co-authored-by: mdchennu <[email protected]>

Copilot finished work on behalf of mdchennu October 9, 2025 20:16

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Optimize log sanitization with SearchValues for improved performance #3341

Optimize log sanitization with SearchValues for improved performance #3341

Uh oh!

Copilot AI commented Oct 8, 2025 •

edited

Loading

Uh oh!

mdchennu commented Oct 8, 2025

Uh oh!

Copilot AI commented Oct 8, 2025

Uh oh!

github-actions bot commented Oct 8, 2025

Uh oh!

github-actions bot commented Oct 8, 2025

Uh oh!

github-actions bot commented Oct 8, 2025

Uh oh!

keegan-caruso commented Oct 9, 2025

Uh oh!

Uh oh!

Uh oh!

Copilot AI commented Oct 9, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Optimize log sanitization with SearchValues for improved performance #3341

Optimize log sanitization with SearchValues for improved performance #3341

Uh oh!

Conversation

Copilot AI commented Oct 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes Made

Benchmark Added

Character Set (108 total)

Testing

Comments on the Issue (you are @copilot in this section)

Uh oh!

mdchennu commented Oct 8, 2025

Uh oh!

Copilot AI commented Oct 8, 2025

Uh oh!

github-actions bot commented Oct 8, 2025

Summary

Coverage

Uh oh!

github-actions bot commented Oct 8, 2025

Summary

Coverage

Uh oh!

github-actions bot commented Oct 8, 2025

Summary

Coverage

Uh oh!

keegan-caruso commented Oct 9, 2025

Uh oh!

Uh oh!

Uh oh!

Copilot AI commented Oct 9, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Copilot AI commented Oct 8, 2025 •

edited

Loading