Skip to content

Conversation

@stephentoub
Copy link
Member

These were set up to require what comes after the boundary to also be disjoint from its predecessor being tested. But that's not necessary; the boundary itself is sufficient to determine atomicity.

These were set up to require what comes after the boundary to also be disjoint from its predecessor being tested. But that's not necessary; the boundary itself is sufficient to determine atomicity.
@stephentoub stephentoub requested review from MihaZupan and Copilot July 30, 2025 01:49
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This pull request improves boundary handling in atomic tests for regular expression optimizations. The change refines the logic for determining when a regex loop can be made atomic by focusing on the boundary assertion itself rather than requiring additional disjoint conditions.

Key changes:

  • Moves boundary assertion checks to be evaluated immediately as atomic conditions
  • Adds test cases to verify boundary assertions work with both disjoint and non-disjoint following characters
  • Reorganizes the atomic test logic to prioritize boundary conditions

Reviewed Changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.

File Description
RegexReductionTests.cs Adds test cases for boundary assertions followed by both word and non-word characters
RegexNode.cs Refactors boundary assertion checks to return true immediately instead of requiring evaluation of subsequent nodes

@stephentoub
Copy link
Member Author

@MihuBot regexdiff

@dotnet-policy-service
Copy link
Contributor

Tagging subscribers to this area: @dotnet/area-system-text-regularexpressions
See info in area-owners.md if you want to be subscribed.

@MihuBot
Copy link

MihuBot commented Jul 30, 2025

27 out of 18857 patterns have generated source code changes.

Examples of GeneratedRegex source diffs
"(((?<=\\W|^)-\\s*)|(?<=\\b))\\d+\\s+\\d+[/]\ ..." (540 uses)
[GeneratedRegex("(((?<=\\W|^)-\\s*)|(?<=\\b))\\d+\\s+\\d+[/]\\d+(?=(\\b[^/]|$))", RegexOptions.IgnoreCase | RegexOptions.Singleline)]
  /// ○ Match a whitespace character atomically at least once.<br/>
  /// ○ Match a Unicode digit atomically at least once.<br/>
  /// ○ Match '/'.<br/>
-   /// ○ Match a Unicode digit greedily at least once.<br/>
+   /// ○ Match a Unicode digit atomically at least once.<br/>
  /// ○ Zero-width positive lookahead.<br/>
  ///     ○ 3rd capture group.<br/>
  ///         ○ Match with 2 alternative expressions, atomically.<br/>
                  int capture_starting_pos = 0;
                  int capture_starting_pos1 = 0;
                  int capture_starting_pos2 = 0;
-                   int charloop_capture_pos = 0;
-                   int charloop_starting_pos = 0, charloop_ending_pos = 0;
                  ReadOnlySpan<char> slice = inputSpan.Slice(pos);
                  
                  // 1st capture group.
                      goto CaptureBacktrack;
                  }
                  
-                   // Match a Unicode digit greedily at least once.
-                   //{
+                   // Match a Unicode digit atomically at least once.
+                   {
                      pos++;
                      slice = inputSpan.Slice(pos);
-                       charloop_starting_pos = pos;
-                       
                      int iteration4 = 0;
                      while ((uint)iteration4 < (uint)slice.Length && char.IsDigit(slice[iteration4]))
                      {
                      
                      slice = slice.Slice(iteration4);
                      pos += iteration4;
-                       
-                       charloop_ending_pos = pos;
-                       charloop_starting_pos++;
-                       goto CharLoopEnd;
-                       
-                       CharLoopBacktrack:
-                       UncaptureUntil(charloop_capture_pos);
-                       
-                       if (Utilities.s_hasTimeout)
-                       {
-                           base.CheckTimeout();
-                       }
-                       
-                       if (charloop_starting_pos >= charloop_ending_pos)
-                       {
-                           goto CaptureBacktrack;
-                       }
-                       pos = --charloop_ending_pos;
-                       slice = inputSpan.Slice(pos);
-                       
-                       CharLoopEnd:
-                       charloop_capture_pos = base.Crawlpos();
-                   //}
+                   }
                  
                  // Zero-width positive lookahead.
                  {
                                  // Match if at the end of the string or if before an ending newline.
                                  if (pos < inputSpan.Length - 1 || ((uint)pos < (uint)inputSpan.Length && inputSpan[pos] != '\n'))
                                  {
-                                       goto CharLoopBacktrack;
+                                       goto CaptureBacktrack;
                                  }
                                  
                              }
"(((?<=\\W|^)-\\s*)|(?<=\\b))\\d+\\s+\\d+[/]\ ..." (529 uses)
[GeneratedRegex("(((?<=\\W|^)-\\s*)|(?<=\\b))\\d+\\s+\\d+[/]\\d+(?=(\\b[^/]|$))", RegexOptions.ExplicitCapture | RegexOptions.Singleline)]
  /// ○ Match a whitespace character atomically at least once.<br/>
  /// ○ Match a Unicode digit atomically at least once.<br/>
  /// ○ Match '/'.<br/>
-   /// ○ Match a Unicode digit greedily at least once.<br/>
+   /// ○ Match a Unicode digit atomically at least once.<br/>
  /// ○ Zero-width positive lookahead.<br/>
  ///     ○ Match with 2 alternative expressions, atomically.<br/>
  ///         ○ Match a sequence of expressions.<br/>
                  int matchStart = pos;
                  int alternation_branch = 0;
                  int alternation_starting_pos = 0;
-                   int charloop_starting_pos = 0, charloop_ending_pos = 0;
                  ReadOnlySpan<char> slice = inputSpan.Slice(pos);
                  
                  // Match with 2 alternative expressions.
                      goto AlternationBacktrack;
                  }
                  
-                   // Match a Unicode digit greedily at least once.
-                   //{
+                   // Match a Unicode digit atomically at least once.
+                   {
                      pos++;
                      slice = inputSpan.Slice(pos);
-                       charloop_starting_pos = pos;
-                       
                      int iteration4 = 0;
                      while ((uint)iteration4 < (uint)slice.Length && char.IsDigit(slice[iteration4]))
                      {
                      
                      slice = slice.Slice(iteration4);
                      pos += iteration4;
-                       
-                       charloop_ending_pos = pos;
-                       charloop_starting_pos++;
-                       goto CharLoopEnd;
-                       
-                       CharLoopBacktrack:
-                       
-                       if (Utilities.s_hasTimeout)
-                       {
-                           base.CheckTimeout();
-                       }
-                       
-                       if (charloop_starting_pos >= charloop_ending_pos)
-                       {
-                           goto AlternationBacktrack;
-                       }
-                       pos = --charloop_ending_pos;
-                       slice = inputSpan.Slice(pos);
-                       
-                       CharLoopEnd:
-                   //}
+                   }
                  
                  // Zero-width positive lookahead.
                  {
                              // Match if at the end of the string or if before an ending newline.
                              if (pos < inputSpan.Length - 1 || ((uint)pos < (uint)inputSpan.Length && inputSpan[pos] != '\n'))
                              {
-                                   goto CharLoopBacktrack;
+                                   goto AlternationBacktrack;
                              }
                              
                          }
"(((?<=\\W|^)-\\s*)|(?<=\\b))\\d+[/]\\d+(?=(\ ..." (488 uses)
[GeneratedRegex("(((?<=\\W|^)-\\s*)|(?<=\\b))\\d+[/]\\d+(?=(\\b[^/]|$))", RegexOptions.IgnoreCase | RegexOptions.Singleline)]
  ///             ○ Match if at a word boundary.<br/>
  /// ○ Match a Unicode digit atomically at least once.<br/>
  /// ○ Match '/'.<br/>
-   /// ○ Match a Unicode digit greedily at least once.<br/>
+   /// ○ Match a Unicode digit atomically at least once.<br/>
  /// ○ Zero-width positive lookahead.<br/>
  ///     ○ 3rd capture group.<br/>
  ///         ○ Match with 2 alternative expressions, atomically.<br/>
                  int capture_starting_pos = 0;
                  int capture_starting_pos1 = 0;
                  int capture_starting_pos2 = 0;
-                   int charloop_capture_pos = 0;
-                   int charloop_starting_pos = 0, charloop_ending_pos = 0;
                  ReadOnlySpan<char> slice = inputSpan.Slice(pos);
                  
                  // 1st capture group.
                      goto CaptureBacktrack;
                  }
                  
-                   // Match a Unicode digit greedily at least once.
-                   //{
+                   // Match a Unicode digit atomically at least once.
+                   {
                      pos++;
                      slice = inputSpan.Slice(pos);
-                       charloop_starting_pos = pos;
-                       
                      int iteration2 = 0;
                      while ((uint)iteration2 < (uint)slice.Length && char.IsDigit(slice[iteration2]))
                      {
                      
                      slice = slice.Slice(iteration2);
                      pos += iteration2;
-                       
-                       charloop_ending_pos = pos;
-                       charloop_starting_pos++;
-                       goto CharLoopEnd;
-                       
-                       CharLoopBacktrack:
-                       UncaptureUntil(charloop_capture_pos);
-                       
-                       if (Utilities.s_hasTimeout)
-                       {
-                           base.CheckTimeout();
-                       }
-                       
-                       if (charloop_starting_pos >= charloop_ending_pos)
-                       {
-                           goto CaptureBacktrack;
-                       }
-                       pos = --charloop_ending_pos;
-                       slice = inputSpan.Slice(pos);
-                       
-                       CharLoopEnd:
-                       charloop_capture_pos = base.Crawlpos();
-                   //}
+                   }
                  
                  // Zero-width positive lookahead.
                  {
                                  // Match if at the end of the string or if before an ending newline.
                                  if (pos < inputSpan.Length - 1 || ((uint)pos < (uint)inputSpan.Length && inputSpan[pos] != '\n'))
                                  {
-                                       goto CharLoopBacktrack;
+                                       goto CaptureBacktrack;
                                  }
                                  
                              }
"(((?<=\\W|^)-\\s*)|(?<=\\b))\\d+[/]\\d+(?=(\ ..." (338 uses)
[GeneratedRegex("(((?<=\\W|^)-\\s*)|(?<=\\b))\\d+[/]\\d+(?=(\\b[^/]|$))", RegexOptions.ExplicitCapture | RegexOptions.Singleline)]
  ///         ○ Match if at a word boundary.<br/>
  /// ○ Match a Unicode digit atomically at least once.<br/>
  /// ○ Match '/'.<br/>
-   /// ○ Match a Unicode digit greedily at least once.<br/>
+   /// ○ Match a Unicode digit atomically at least once.<br/>
  /// ○ Zero-width positive lookahead.<br/>
  ///     ○ Match with 2 alternative expressions, atomically.<br/>
  ///         ○ Match a sequence of expressions.<br/>
                  int matchStart = pos;
                  int alternation_branch = 0;
                  int alternation_starting_pos = 0;
-                   int charloop_starting_pos = 0, charloop_ending_pos = 0;
                  ReadOnlySpan<char> slice = inputSpan.Slice(pos);
                  
                  // Match with 2 alternative expressions.
                      goto AlternationBacktrack;
                  }
                  
-                   // Match a Unicode digit greedily at least once.
-                   //{
+                   // Match a Unicode digit atomically at least once.
+                   {
                      pos++;
                      slice = inputSpan.Slice(pos);
-                       charloop_starting_pos = pos;
-                       
                      int iteration2 = 0;
                      while ((uint)iteration2 < (uint)slice.Length && char.IsDigit(slice[iteration2]))
                      {
                      
                      slice = slice.Slice(iteration2);
                      pos += iteration2;
-                       
-                       charloop_ending_pos = pos;
-                       charloop_starting_pos++;
-                       goto CharLoopEnd;
-                       
-                       CharLoopBacktrack:
-                       
-                       if (Utilities.s_hasTimeout)
-                       {
-                           base.CheckTimeout();
-                       }
-                       
-                       if (charloop_starting_pos >= charloop_ending_pos)
-                       {
-                           goto AlternationBacktrack;
-                       }
-                       pos = --charloop_ending_pos;
-                       slice = inputSpan.Slice(pos);
-                       
-                       CharLoopEnd:
-                   //}
+                   }
                  
                  // Zero-width positive lookahead.
                  {
                              // Match if at the end of the string or if before an ending newline.
                              if (pos < inputSpan.Length - 1 || ((uint)pos < (uint)inputSpan.Length && inputSpan[pos] != '\n'))
                              {
-                                   goto CharLoopBacktrack;
+                                   goto AlternationBacktrack;
                              }
                              
                          }
"(((?<=\\W|^)-\\s*)|(?<=\\b))\\d+\\s+\\d+[/]\ ..." (196 uses)
[GeneratedRegex("(((?<=\\W|^)-\\s*)|(?<=\\b))\\d+\\s+\\d+[/]\\d+(?=(\\b[^/]|$))", RegexOptions.Singleline)]
  /// ○ Match a whitespace character atomically at least once.<br/>
  /// ○ Match a Unicode digit atomically at least once.<br/>
  /// ○ Match '/'.<br/>
-   /// ○ Match a Unicode digit greedily at least once.<br/>
+   /// ○ Match a Unicode digit atomically at least once.<br/>
  /// ○ Zero-width positive lookahead.<br/>
  ///     ○ 3rd capture group.<br/>
  ///         ○ Match with 2 alternative expressions, atomically.<br/>
                  int capture_starting_pos = 0;
                  int capture_starting_pos1 = 0;
                  int capture_starting_pos2 = 0;
-                   int charloop_capture_pos = 0;
-                   int charloop_starting_pos = 0, charloop_ending_pos = 0;
                  ReadOnlySpan<char> slice = inputSpan.Slice(pos);
                  
                  // 1st capture group.
                      goto CaptureBacktrack;
                  }
                  
-                   // Match a Unicode digit greedily at least once.
-                   //{
+                   // Match a Unicode digit atomically at least once.
+                   {
                      pos++;
                      slice = inputSpan.Slice(pos);
-                       charloop_starting_pos = pos;
-                       
                      int iteration4 = 0;
                      while ((uint)iteration4 < (uint)slice.Length && char.IsDigit(slice[iteration4]))
                      {
                      
                      slice = slice.Slice(iteration4);
                      pos += iteration4;
-                       
-                       charloop_ending_pos = pos;
-                       charloop_starting_pos++;
-                       goto CharLoopEnd;
-                       
-                       CharLoopBacktrack:
-                       UncaptureUntil(charloop_capture_pos);
-                       
-                       if (Utilities.s_hasTimeout)
-                       {
-                           base.CheckTimeout();
-                       }
-                       
-                       if (charloop_starting_pos >= charloop_ending_pos)
-                       {
-                           goto CaptureBacktrack;
-                       }
-                       pos = --charloop_ending_pos;
-                       slice = inputSpan.Slice(pos);
-                       
-                       CharLoopEnd:
-                       charloop_capture_pos = base.Crawlpos();
-                   //}
+                   }
                  
                  // Zero-width positive lookahead.
                  {
                                  // Match if at the end of the string or if before an ending newline.
                                  if (pos < inputSpan.Length - 1 || ((uint)pos < (uint)inputSpan.Length && inputSpan[pos] != '\n'))
                                  {
-                                       goto CharLoopBacktrack;
+                                       goto CaptureBacktrack;
                                  }
                                  
                              }
"(((?<=\\W|^)-\\s*)|(?<![/-])(?<=\\b))\\d+[/] ..." (176 uses)
[GeneratedRegex("(((?<=\\W|^)-\\s*)|(?<![/-])(?<=\\b))\\d+[/]\\d+(?=(\\b[^/]|$))", RegexOptions.ExplicitCapture | RegexOptions.Singleline)]
  ///             ○ Match if at a word boundary.<br/>
  /// ○ Match a Unicode digit atomically at least once.<br/>
  /// ○ Match '/'.<br/>
-   /// ○ Match a Unicode digit greedily at least once.<br/>
+   /// ○ Match a Unicode digit atomically at least once.<br/>
  /// ○ Zero-width positive lookahead.<br/>
  ///     ○ Match with 2 alternative expressions, atomically.<br/>
  ///         ○ Match a sequence of expressions.<br/>
                  int matchStart = pos;
                  int alternation_branch = 0;
                  int alternation_starting_pos = 0;
-                   int charloop_starting_pos = 0, charloop_ending_pos = 0;
                  ReadOnlySpan<char> slice = inputSpan.Slice(pos);
                  
                  // Match with 2 alternative expressions.
                      goto AlternationBacktrack;
                  }
                  
-                   // Match a Unicode digit greedily at least once.
-                   //{
+                   // Match a Unicode digit atomically at least once.
+                   {
                      pos++;
                      slice = inputSpan.Slice(pos);
-                       charloop_starting_pos = pos;
-                       
                      int iteration2 = 0;
                      while ((uint)iteration2 < (uint)slice.Length && char.IsDigit(slice[iteration2]))
                      {
                      
                      slice = slice.Slice(iteration2);
                      pos += iteration2;
-                       
-                       charloop_ending_pos = pos;
-                       charloop_starting_pos++;
-                       goto CharLoopEnd;
-                       
-                       CharLoopBacktrack:
-                       
-                       if (Utilities.s_hasTimeout)
-                       {
-                           base.CheckTimeout();
-                       }
-                       
-                       if (charloop_starting_pos >= charloop_ending_pos)
-                       {
-                           goto AlternationBacktrack;
-                       }
-                       pos = --charloop_ending_pos;
-                       slice = inputSpan.Slice(pos);
-                       
-                       CharLoopEnd:
-                   //}
+                   }
                  
                  // Zero-width positive lookahead.
                  {
                              // Match if at the end of the string or if before an ending newline.
                              if (pos < inputSpan.Length - 1 || ((uint)pos < (uint)inputSpan.Length && inputSpan[pos] != '\n'))
                              {
-                                   goto CharLoopBacktrack;
+                                   goto AlternationBacktrack;
                              }
                              
                          }
"(((?<=\\W|^)-\\s*)|(?<=\\b))\\d+[/]\\d+(?=(\ ..." (168 uses)
[GeneratedRegex("(((?<=\\W|^)-\\s*)|(?<=\\b))\\d+[/]\\d+(?=(\\b[^/]|$))", RegexOptions.Singleline)]
  ///             ○ Match if at a word boundary.<br/>
  /// ○ Match a Unicode digit atomically at least once.<br/>
  /// ○ Match '/'.<br/>
-   /// ○ Match a Unicode digit greedily at least once.<br/>
+   /// ○ Match a Unicode digit atomically at least once.<br/>
  /// ○ Zero-width positive lookahead.<br/>
  ///     ○ 3rd capture group.<br/>
  ///         ○ Match with 2 alternative expressions, atomically.<br/>
                  int capture_starting_pos = 0;
                  int capture_starting_pos1 = 0;
                  int capture_starting_pos2 = 0;
-                   int charloop_capture_pos = 0;
-                   int charloop_starting_pos = 0, charloop_ending_pos = 0;
                  ReadOnlySpan<char> slice = inputSpan.Slice(pos);
                  
                  // 1st capture group.
                      goto CaptureBacktrack;
                  }
                  
-                   // Match a Unicode digit greedily at least once.
-                   //{
+                   // Match a Unicode digit atomically at least once.
+                   {
                      pos++;
                      slice = inputSpan.Slice(pos);
-                       charloop_starting_pos = pos;
-                       
                      int iteration2 = 0;
                      while ((uint)iteration2 < (uint)slice.Length && char.IsDigit(slice[iteration2]))
                      {
                      
                      slice = slice.Slice(iteration2);
                      pos += iteration2;
-                       
-                       charloop_ending_pos = pos;
-                       charloop_starting_pos++;
-                       goto CharLoopEnd;
-                       
-                       CharLoopBacktrack:
-                       UncaptureUntil(charloop_capture_pos);
-                       
-                       if (Utilities.s_hasTimeout)
-                       {
-                           base.CheckTimeout();
-                       }
-                       
-                       if (charloop_starting_pos >= charloop_ending_pos)
-                       {
-                           goto CaptureBacktrack;
-                       }
-                       pos = --charloop_ending_pos;
-                       slice = inputSpan.Slice(pos);
-                       
-                       CharLoopEnd:
-                       charloop_capture_pos = base.Crawlpos();
-                   //}
+                   }
                  
                  // Zero-width positive lookahead.
                  {
                                  // Match if at the end of the string or if before an ending newline.
                                  if (pos < inputSpan.Length - 1 || ((uint)pos < (uint)inputSpan.Length && inputSpan[pos] != '\n'))
                                  {
-                                       goto CharLoopBacktrack;
+                                       goto CaptureBacktrack;
                                  }
                                  
                              }
"\\A(?:(?:http|https):\\/\\/)?([-a-zA-Z0-9.]{ ..." (117 uses)
[GeneratedRegex("\\A(?:(?:http|https):\\/\\/)?([-a-zA-Z0-9.]{2,256}\\.[a-z]{2,4})\\b(?:\\/[-a-zA-Z0-9@:%_\\+.~#?&//=]*)?")]
  /// ○ 1st capture group.<br/>
  ///     ○ Match a character in the set [\-.0-9A-Za-z] greedily at least 2 and at most 256 times.<br/>
  ///     ○ Match '.'.<br/>
-   ///     ○ Match a character in the set [a-z] greedily at least 2 and at most 4 times.<br/>
+   ///     ○ Match a character in the set [a-z] atomically at least 2 and at most 4 times.<br/>
  /// ○ Match if at a word boundary.<br/>
  /// ○ Optional (greedy).<br/>
  ///     ○ Match '/'.<br/>
                  char ch;
                  int capture_starting_pos = 0;
                  int charloop_capture_pos = 0;
-                   int charloop_capture_pos1 = 0;
                  int charloop_starting_pos = 0, charloop_ending_pos = 0;
-                   int charloop_starting_pos1 = 0, charloop_ending_pos1 = 0;
                  int loop_iteration = 0;
                  int loop_iteration1 = 0;
                  int stackpos = 0;
                          goto CharLoopBacktrack;
                      }
                      
-                       // Match a character in the set [a-z] greedily at least 2 and at most 4 times.
-                       //{
+                       // Match a character in the set [a-z] atomically at least 2 and at most 4 times.
+                       {
                          pos++;
                          slice = inputSpan.Slice(pos);
-                           charloop_starting_pos1 = pos;
-                           
                          int iteration1 = 0;
                          while (iteration1 < 4 && (uint)iteration1 < (uint)slice.Length && char.IsAsciiLetterLower(slice[iteration1]))
                          {
                          
                          slice = slice.Slice(iteration1);
                          pos += iteration1;
-                           
-                           charloop_ending_pos1 = pos;
-                           charloop_starting_pos1 += 2;
-                           goto CharLoopEnd1;
-                           
-                           CharLoopBacktrack1:
-                           UncaptureUntil(charloop_capture_pos1);
-                           
-                           if (Utilities.s_hasTimeout)
-                           {
-                               base.CheckTimeout();
-                           }
-                           
-                           if (charloop_starting_pos1 >= charloop_ending_pos1)
-                           {
-                               goto CharLoopBacktrack;
-                           }
-                           pos = --charloop_ending_pos1;
-                           slice = inputSpan.Slice(pos);
-                           
-                           CharLoopEnd1:
-                           charloop_capture_pos1 = base.Crawlpos();
-                       //}
+                       }
                      
                      base.Capture(1, capture_starting_pos, pos);
                      
                      goto CaptureSkipBacktrack;
                      
                      CaptureBacktrack:
-                       goto CharLoopBacktrack1;
+                       goto CharLoopBacktrack;
                      
                      CaptureSkipBacktrack:;
                  //}
"(((?<=\\W|^)-\\s*)|(?<=\\b))\\d+\\s+(e\\s+)? ..." (65 uses)
[GeneratedRegex("(((?<=\\W|^)-\\s*)|(?<=\\b))\\d+\\s+(e\\s+)?\\d+[/]\\d+(?=(\\b[^/]|$))", RegexOptions.ExplicitCapture | RegexOptions.Singleline)]
  ///     ○ Match a whitespace character atomically at least once.<br/>
  /// ○ Match a Unicode digit atomically at least once.<br/>
  /// ○ Match '/'.<br/>
-   /// ○ Match a Unicode digit greedily at least once.<br/>
+   /// ○ Match a Unicode digit atomically at least once.<br/>
  /// ○ Zero-width positive lookahead.<br/>
  ///     ○ Match with 2 alternative expressions, atomically.<br/>
  ///         ○ Match a sequence of expressions.<br/>
                  int alternation_branch = 0;
                  int alternation_starting_pos = 0;
                  int charloop_starting_pos = 0, charloop_ending_pos = 0;
-                   int charloop_starting_pos1 = 0, charloop_ending_pos1 = 0;
                  int loop_iteration = 0;
                  int stackpos = 0;
                  ReadOnlySpan<char> slice = inputSpan.Slice(pos);
                      goto LoopIterationNoMatch;
                  }
                  
-                   // Match a Unicode digit greedily at least once.
-                   //{
+                   // Match a Unicode digit atomically at least once.
+                   {
                      pos++;
                      slice = inputSpan.Slice(pos);
-                       charloop_starting_pos1 = pos;
-                       
                      int iteration5 = 0;
                      while ((uint)iteration5 < (uint)slice.Length && char.IsDigit(slice[iteration5]))
                      {
                      
                      slice = slice.Slice(iteration5);
                      pos += iteration5;
-                       
-                       charloop_ending_pos1 = pos;
-                       charloop_starting_pos1++;
-                       goto CharLoopEnd1;
-                       
-                       CharLoopBacktrack1:
-                       
-                       if (Utilities.s_hasTimeout)
-                       {
-                           base.CheckTimeout();
-                       }
-                       
-                       if (charloop_starting_pos1 >= charloop_ending_pos1)
-                       {
-                           goto LoopIterationNoMatch;
-                       }
-                       pos = --charloop_ending_pos1;
-                       slice = inputSpan.Slice(pos);
-                       
-                       CharLoopEnd1:
-                   //}
+                   }
                  
                  // Zero-width positive lookahead.
                  {
                              // Match if at the end of the string or if before an ending newline.
                              if (pos < inputSpan.Length - 1 || ((uint)pos < (uint)inputSpan.Length && inputSpan[pos] != '\n'))
                              {
-                                   goto CharLoopBacktrack1;
+                                   goto LoopIterationNoMatch;
                              }
                              
                          }

For more diff examples, see https://gist.github.com/MihuBot/f6df666f613e9b662419df86b5cb07e9

JIT assembly changes
Total bytes of base: 53918456
Total bytes of diff: 53914034
Total bytes of delta: -4422 (-0.01 % of base)
Total relative delta: -1.58
    diff is an improvement.
    relative diff is an improvement.

For a list of JIT diff regressions, see Regressions.md
For a list of JIT diff improvements, see Improvements.md

Sample source code for further analysis
const string JsonPath = "RegexResults-1321.json";
if (!File.Exists(JsonPath))
{
    await using var archiveStream = await new HttpClient().GetStreamAsync("https://mihubot.xyz/r/E22R6vvA");
    using var archive = new ZipArchive(archiveStream, ZipArchiveMode.Read);
    archive.Entries.First(e => e.Name == "Results.json").ExtractToFile(JsonPath);
}

using FileStream jsonFileStream = File.OpenRead(JsonPath);
RegexEntry[] entries = JsonSerializer.Deserialize<RegexEntry[]>(jsonFileStream, new JsonSerializerOptions { IncludeFields = true })!;
Console.WriteLine($"Working with {entries.Length} patterns");



record KnownPattern(string Pattern, RegexOptions Options, int Count);

sealed class RegexEntry
{
    public required KnownPattern Regex { get; set; }
    public required string MainSource { get; set; }
    public required string PrSource { get; set; }
    public string? FullDiff { get; set; }
    public string? ShortDiff { get; set; }
    public (string Name, string Values)[]? SearchValuesOfChar { get; set; }
    public (string[] Values, StringComparison ComparisonType)[]? SearchValuesOfString { get; set; }
}

@stephentoub stephentoub enabled auto-merge (squash) July 30, 2025 02:36
@stephentoub stephentoub requested a review from danmoseley July 30, 2025 13:20
@stephentoub stephentoub merged commit b5f8e98 into dotnet:main Jul 30, 2025
88 checks passed
@stephentoub stephentoub deleted the improveboundary branch August 4, 2025 16:56
@github-actions github-actions bot locked and limited conversation to collaborators Sep 4, 2025
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants