Skip to content

Earlier fail fast for fixed-length regex patterns with anchors #118489

@jnyrup

Description

@jnyrup

Description

When having a regex pattern which contains both leading+trailing anchors and the pattern is of fixed length, it seems to me that we might be missing an opportunity to fail fast earlier.

Take e.g. the regex pattern ^1234$.
Currently the emitted TryFindNextPossibleStartingPosition checks that the input has at least 4 characters to continue.

if (pos <= inputSpan.Length - 4 && pos == 0)

When invoking IsMatch("12345") on the generated regex, the flow will be:

  • pass TryFindNextPossibleStartingPosition as the input is longer than 4 characters,
  • enter TryMatchAtCurrentPosition and call .StartsWith("1234"),
  • and first then fail on 5 < slice.Length because the input was too long.

If we tightened the check in TryFindNextPossibleStartingPosition to "exactly 4 characters", we wouldn't have to enter TryMatchAtCurrentPosition.

If I understand the code correctly, we can detect this situation in EmitTryFindNextPossibleStartingPosition when:

  • rm.Tree.FindOptimizations.LeadingAnchor is RegexNodeKind.Beginning and
  • rm.Tree.FindOptimizations.FindMode is FindNextStartingPositionMode.TrailingAnchor_FixedLength_LeftToRight_End or FindNextStartingPositionMode.TrailingAnchor_FixedLength_LeftToRight_EndZ

Am I missing something?

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions