Skip to content

Conversation

Copy link
Contributor

Copilot AI commented Oct 8, 2025

Summary

Updated the Sanitize method in LogHelper.cs to use SearchValues<char> for improved performance on .NET 8+, while maintaining full backward compatibility with older frameworks.

Changes Made

  • Added conditional using System.Buffers directive for .NET 8+
  • Created static SearchValues<char> field with all 108 control and format characters
  • Implemented efficient SearchValues-based sanitization for .NET 8+
  • Maintained original implementation as fallback for older frameworks
  • Preserved identical output format (\r, \n, \t, \uXXXX)
  • Added 15 comprehensive tests covering all character types and edge cases
  • Fixed encoding issues on lines 462 and 482 (preserved original file encoding)
  • Added targeted benchmark for log sanitization performance

Benchmark Added

Created LogSanitizationBenchmarks.cs with the following scenarios:

  • Baseline: String with no special characters (tests early return optimization)
  • Common control chars: Strings with \r, \n, \t
  • Unicode format chars: Zero-width spaces, directional marks
  • Mixed chars: Combination of control and format characters
  • Long strings: Both with and without special characters to test performance at scale

Run with: dotnet run -c release -f net9.0 --filter Microsoft.IdentityModel.Benchmarks.LogSanitizationBenchmarks*

Character Set (108 total)

  • All ASCII control characters (U+0000-U+001F, U+007F-U+009F)
  • All Unicode format characters (Cf category)
  • Including: zero-width spaces, directional marks, and other format characters

Testing

  • All 72 tests pass on .NET 8 (57 original + 15 new)
  • All 72 tests pass on .NET 9
  • Output format verified identical to previous implementation
  • Backward compatibility maintained for all target frameworks
  • Benchmark builds successfully for net6.0, net8.0, and net9.0
  • File encoding preserved on lines 462 and 482
Original prompt

This section details on the original issue you should resolve

<issue_title>Log sanitization with use of SearchValues v1</issue_title>
<issue_description>Problem Statement.
We want to update the Sanitize method:

to utilize SearchValues. All values which need to be sanitized can be added to a SearchValues collection.

Specific Characters to Sanitize:

  • '\r' (Carriage Return, U+000D)
  • '\n' (Line Feed, U+000A)
  • '\t' (Tab, U+0009)
  • All other ASCII control characters: U+0000-U+0008, U+000B-U+000C, U+000E-U+001F, U+007F-U+009F
  • All Unicode characters where char.IsControl(c) is true (Unicode category Cc)
  • All Unicode characters where CharUnicodeInfo.GetUnicodeCategory(c) == UnicodeCategory.Format (Unicode category Cf), e.g.:
    • U+200B (Zero Width Space)
    • U+200C (Zero Width Non-Joiner)
    • U+200D (Zero Width Joiner)
    • U+200E (Left-to-Right Mark)
    • U+200F (Right-to-Left Mark)
    • U+202A-U+202E (Directional formatting)
    • U+2060-U+206F (Various format characters)
    • U+FEFF (Zero Width No-Break Space)

Implementation Clarifications:

  • The referenced SanitizeEntryFromFilePath method from dotnet/runtime should be used as a performance pattern reference only. The actual character set for sanitization should follow this issue's explicit enumeration.
  • The new implementation must produce identical output to the current method, including string formatting (e.g., using "\u{(int)c:X4}" for control/format characters).

Steps:

  1. Update the Sanitize method to have a SearchValues collection that includes the above values. The list of control and format characters should be fully enumerated and hardcoded into a static SearchValues collection.
  2. Update the Sanitize method to follow the structure of the method called SanitizeEntryFromFilePath: dotnet/runtime/src/libraries/Common/src/System/IO/Archiving.Utils.Windows.cs#L11
  3. Update tests accordingly. If tests to cover the following edge cases are not present, add them: Strings with no sanitizable characters (performance fast path), Strings with only ASCII control characters, Strings with Unicode format characters. Very long strings, and Null/empty string handling

Only update the Sanitize method as described above; do not update any other methods.</issue_description>

Comments on the Issue (you are @copilot in this section)

Fixes #3338

💬 Share your feedback on Copilot coding agent for the chance to win a $200 gift card! Click here to start the survey.

Copilot AI changed the title [WIP] Update Sanitize method to use SearchValues for sanitization Optimize log sanitization with SearchValues for improved performance Oct 8, 2025
Copilot AI requested a review from mdchennu October 8, 2025 18:54
Copilot finished work on behalf of mdchennu October 8, 2025 18:54
@mdchennu
Copy link
Contributor

mdchennu commented Oct 8, 2025

Copy link
Contributor Author

Copilot AI commented Oct 8, 2025

@copilot please update the tests in this file: https://github.com/AzureAD/azure-activedirectory-identitymodel-extensions-for-dotnet/blob/dev/test/Microsoft.IdentityModel.Logging.Tests/LogHelperTests.c...

I've added 15 comprehensive tests to LogHelperTests.cs covering:

  • Empty strings and strings with no special characters
  • Control characters (ASCII and Unicode)
  • Format characters (zero-width spaces, directional marks, soft hyphens)
  • Arabic format characters
  • Interlinear annotation characters
  • Edge cases (consecutive chars, chars at start/end, long strings)
  • Mix of all character types

All 72 tests pass on both .NET 8 and .NET 9. See commit 4f47a7d.

Copilot finished work on behalf of mdchennu October 8, 2025 20:48
@mdchennu mdchennu marked this pull request as ready for review October 8, 2025 20:50
@mdchennu mdchennu requested a review from a team as a code owner October 8, 2025 20:50
@github-actions
Copy link

github-actions bot commented Oct 8, 2025

Summary

Summary
Generated on: 10/8/2025 - 8:53:37 PM
Coverage date: 10/8/2025 - 8:44:16 PM - 10/8/2025 - 8:52:54 PM
Parser: MultiReport (60x Cobertura)
Assemblies: 1
Classes: 7
Files: 2
Line coverage: 80.3% (620 of 772)
Covered lines: 620
Uncovered lines: 152
Coverable lines: 772
Total lines: 483
Branch coverage: 67.8% (228 of 336)
Covered branches: 228
Total branches: 336
Method coverage: Feature is only available for sponsors

Coverage

Microsoft.IdentityModel.JsonWebTokens - 80.3%
Name Line Branch
Microsoft.IdentityModel.JsonWebTokens 80.3% 67.8%
Microsoft.IdentityModel.JsonWebTokens.JwtTokenUtilities 100%
System.Text.RegularExpressions.Generated 80.3% 67.8%
System.Text.RegularExpressions.Generated 80.3% 67.8%
System.Text.RegularExpressions.Generated.<RegexGenerator_g>F12A1AEDDDFE32BA
DF4DBFF323AF1BCB48B9F9721B7CD3E05F5E034CF225E3DF8__CreateJweRegex_1
79.2% 68%
System.Text.RegularExpressions.Generated.<RegexGenerator_g>F12A1AEDDDFE32BA
DF4DBFF323AF1BCB48B9F9721B7CD3E05F5E034CF225E3DF8__CreateJwsRegex_0
81.4% 67.6%
System.Text.RegularExpressions.Generated.<RegexGenerator_g>F334844C618E00D3
CEC5D3FE0D00CF0141BBEE98635313BB2CB8D3921464CE05A__CreateJweRegex_1
79.2% 68%
System.Text.RegularExpressions.Generated.<RegexGenerator_g>F334844C618E00D3
CEC5D3FE0D00CF0141BBEE98635313BB2CB8D3921464CE05A__CreateJwsRegex_0
81.4% 67.6%

@github-actions
Copy link

github-actions bot commented Oct 8, 2025

Summary

Summary
Generated on: 10/8/2025 - 9:02:10 PM
Coverage date: 10/8/2025 - 8:53:32 PM - 10/8/2025 - 9:01:27 PM
Parser: MultiReport (60x Cobertura)
Assemblies: 1
Classes: 7
Files: 2
Line coverage: 80.3% (620 of 772)
Covered lines: 620
Uncovered lines: 152
Coverable lines: 772
Total lines: 483
Branch coverage: 67.8% (228 of 336)
Covered branches: 228
Total branches: 336
Method coverage: Feature is only available for sponsors

Coverage

Microsoft.IdentityModel.JsonWebTokens - 80.3%
Name Line Branch
Microsoft.IdentityModel.JsonWebTokens 80.3% 67.8%
Microsoft.IdentityModel.JsonWebTokens.JwtTokenUtilities 100%
System.Text.RegularExpressions.Generated 80.3% 67.8%
System.Text.RegularExpressions.Generated 80.3% 67.8%
System.Text.RegularExpressions.Generated.<RegexGenerator_g>F12A1AEDDDFE32BA
DF4DBFF323AF1BCB48B9F9721B7CD3E05F5E034CF225E3DF8__CreateJweRegex_1
79.2% 68%
System.Text.RegularExpressions.Generated.<RegexGenerator_g>F12A1AEDDDFE32BA
DF4DBFF323AF1BCB48B9F9721B7CD3E05F5E034CF225E3DF8__CreateJwsRegex_0
81.4% 67.6%
System.Text.RegularExpressions.Generated.<RegexGenerator_g>F334844C618E00D3
CEC5D3FE0D00CF0141BBEE98635313BB2CB8D3921464CE05A__CreateJweRegex_1
79.2% 68%
System.Text.RegularExpressions.Generated.<RegexGenerator_g>F334844C618E00D3
CEC5D3FE0D00CF0141BBEE98635313BB2CB8D3921464CE05A__CreateJwsRegex_0
81.4% 67.6%

@github-actions
Copy link

github-actions bot commented Oct 8, 2025

Summary

Summary
Generated on: 10/8/2025 - 9:03:01 PM
Coverage date: 10/8/2025 - 8:54:20 PM - 10/8/2025 - 9:02:18 PM
Parser: MultiReport (60x Cobertura)
Assemblies: 1
Classes: 7
Files: 2
Line coverage: 80.3% (620 of 772)
Covered lines: 620
Uncovered lines: 152
Coverable lines: 772
Total lines: 483
Branch coverage: 67.8% (228 of 336)
Covered branches: 228
Total branches: 336
Method coverage: Feature is only available for sponsors

Coverage

Microsoft.IdentityModel.JsonWebTokens - 80.3%
Name Line Branch
Microsoft.IdentityModel.JsonWebTokens 80.3% 67.8%
Microsoft.IdentityModel.JsonWebTokens.JwtTokenUtilities 100%
System.Text.RegularExpressions.Generated 80.3% 67.8%
System.Text.RegularExpressions.Generated 80.3% 67.8%
System.Text.RegularExpressions.Generated.<RegexGenerator_g>F12A1AEDDDFE32BA
DF4DBFF323AF1BCB48B9F9721B7CD3E05F5E034CF225E3DF8__CreateJweRegex_1
79.2% 68%
System.Text.RegularExpressions.Generated.<RegexGenerator_g>F12A1AEDDDFE32BA
DF4DBFF323AF1BCB48B9F9721B7CD3E05F5E034CF225E3DF8__CreateJwsRegex_0
81.4% 67.6%
System.Text.RegularExpressions.Generated.<RegexGenerator_g>F334844C618E00D3
CEC5D3FE0D00CF0141BBEE98635313BB2CB8D3921464CE05A__CreateJweRegex_1
79.2% 68%
System.Text.RegularExpressions.Generated.<RegexGenerator_g>F334844C618E00D3
CEC5D3FE0D00CF0141BBEE98635313BB2CB8D3921464CE05A__CreateJwsRegex_0
81.4% 67.6%

@keegan-caruso
Copy link
Contributor

Can we have a more targeted benchmark for just the logging sanitization?

Copy link
Contributor Author

Copilot AI commented Oct 9, 2025

Can we have a more targeted benchmark for just the logging sanitization?

I've added a targeted benchmark specifically for log sanitization in LogSanitizationBenchmarks.cs. It tests 6 different scenarios:

  1. Baseline - No special characters (tests early return optimization)
  2. Common control chars (\r, \n, \t)
  3. Unicode format chars (zero-width spaces, directional marks)
  4. Mixed control and format characters
  5. Long strings without special characters
  6. Long strings with sparse special characters

Run with: dotnet run -c release -f net9.0 --filter Microsoft.IdentityModel.Benchmarks.LogSanitizationBenchmarks*

See commit 9a2f8e5.

Copilot AI requested a review from mdchennu October 9, 2025 20:02
Copilot finished work on behalf of mdchennu October 9, 2025 20:02
Copilot finished work on behalf of mdchennu October 9, 2025 20:16
This was referenced Nov 24, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Log sanitization with use of SearchValues v1

5 participants